MINLP formulations for continuous piecewise linear function fitting

We consider a nonconvex mixed-integer nonlinear programming (MINLP) model proposed by Goldberg et al. (Comput Optim Appl 58:523–541, 2014. https://doi.org/10.1007/s10589-014-9647-y) for piecewise linear function fitting. We show that this MINLP model is incomplete and can result in a piecewise linear curve that is not the graph of a function, because it misses a set of necessary constraints. We provide two counterexamples to illustrate this effect, and propose three alternative models that correct this behavior. We investigate the theoretical relationship between these models and evaluate their computational performance.


Introduction
Piecewise linear function fitting, also known as linear-spline regression, is a classical problem of determining a piecewise-linear function f ∶ ℝ → ℝ , with a given number of pieces m, that best fits the given predictor data x ∈ ℝ n and response data y i = f (x i ) + i for unbiased and independent error terms i where i = 1, … , n . The problem has been mostly addressed in the case that the candidate breakpoints are given as a part of the input. A dynamic program (DP) given such a fixed set of candidate breakpoints has been proposed in [1]. The regression problem becomes more challenging if the breakpoints are themselves variables and are not fixed to a choice from a finite set. Historically, in this more general setting, solutions have been computed by heuristics.
Goldberg et al. [5] propose exact computation schemes using mixed-integer nonlinear programming (MINLP) and a sophisticated adaptive-refinement DP-based heuristic as a more tractable approach for handling larger problems. We show below that their MINLP model can result in a piecewise linear curve that is not a graph of a function (i.e., there is an x that is associated with more than one expected response value). We start by presenting the model from [5]. In the next section we present two counterexamples that illustrate that this model is incomplete, and we develop three alternative MINLP models that overcome this defect and study their theoretical relationship. In Sect. 3 we present numerical results suggesting that one formulation is superior.
Given data x, y ∈ ℝ n , with x 1 < x 2 < … < x n , the incomplete model in [5] aims to determine the piecewise linear function, given by breakpoint horizontal coordinates b 0 , b 1 , … , b m ∈ ℝ , by solving the following nonconvex MINLP, where b 0 = x 1 is fixed and = (b 1 , … , b m ) is a tuple of decision variables, for every c, that may be a part of an optimal solution (see the analysis in the following sections as well as [9] for details and bounds in terms of the input data for M i ). Special Cases of (1). In [5], the authors apply (1) only in the special case of concave (or convex) data fitting. In that case, the slopes of the segments are constrained to be decreasing (or increasing), which ensures that the resulting curve is the graph of a concave (or convex) function. In particular, it can be shown that the slope inequalities imply some of the inequalities that are proposed in order to correct formulation (1) in the following section. Similar arguments are used to prove the correctness of the convex MINLP for convex function fitting that appears in [10], which is also studied in [9]. In that paper, the authors also propose another (convex) The y-intercept of the jth line segment.
The slope of the jth line segment.
Indicating whether the error of the ith data point should contribute to the error of the jth line segment, i.e., , the difference between the y value of the ith data point and the estimated value of the corresponding (assigned) jth line segment. mixed integer linear program (MILP) formulation for the case where p = 1 for fitting general piecewise linear functions. In the general case, however, the MINLP (1) is incomplete, and so we present counterexamples and remedies in the next section.

Counterexample and correction of (1)
The MINLP (1) is incomplete, and we now present two illustrating examples where it fits a polygonal curve that is not a graph of a function. We begin with a small synthetic example.
Example 1 Consider the data x = (1, 1.01, 1.02, 1.03, 1.04) and y = (0, 0, 1, 0, 1) and p = 2 . The optimal solution of (1), with m = 3 is illustrated in the Figure 1(a), where it is evident that it has a zero optimal objective value by fitting a polygonal curve, which is not a graph of a function, and whose breakpoints are given by b 0 = 1.00 and = (1.04, 1.00, 1.03) . This curve's second segment, from (1.04, 1.0) to (1.0, 0.0), is not assigned to any of the data points (i.e., 12 = ⋯ = 52 = 1 ). An optimal piecewise linear function that best fits the data, obtained by requiring that the breakpoint horizontal coordinate values are in increasing order. It is depicted in Figure 1(b), with a nonzero optimal objective value of 0.167.
In many real datasets including all of those experimented with in [5] and [9], the relaxed formulation (1) coincidentally computes a correct solution that does correspond to a piecewise linear function. An exception is the next real data example that shows that the defect in MINLP (1) can affect a real problem in fitting a curve that may not be graph of a continuous piecewise linear function.

Example 2
This example is based on the classic (scaled) Titanium data set from [3] and [6] with n = 49 data points. The number of segments m = 3 and applying formulation (1) with p = 2 . The optimal objective value is 1.003 and an optimal solution (the dashed line segment) is not assigned any data points. An optimal piecewise linear function found, when the breakpoints are required to maintain an increasing order as shown in Figure 2(b), has b = (850.2, 885.0, 1075.0) and the corresponding optimal objective value is 2.129.
We next describe three alternative formulations that correct MINLP (1) and ensure that the solution curve is a function, by either implicitly or explicitly maintaining an increasing order of the breakpoints: 1. We can include the missing assignment-type constraints requiring that each segment is assigned to at least one data point, These are referred as assignment-type constraints because they require that ij = 0 for some i = 1, … , n , and, if satisfied, then segment j is assigned to at least one data point. 2. We can explicitly enforce the order of the breakpoints by adding constraints, These constraints are similar to ones used in the piecewise linear function approximation model proposed in [8].
3. We can also use a set of constraints proposed for the convex MILP reformulation in [9], which in the current setting amount to requiring that The Titanium dataset example where formulation (1) with m = 3 results in fitting a curve with three segments ( m = 3 ) that is not a piecewise linear function. Note that the dashed line segment is not assigned any of the data points These inequalities, in particular (4a), require that for each data point i, if it is not assigned to one of two consecutive line segments j and j + 1 , then the next data point i + 1 cannot be assigned to segment j + 1 . It follows that segment j is not assigned any data points only if segment j + 1 is not assigned any data points. The following theorem establishes the correctness of the formulation (1) together with inequality (3) in estimating a continuous piecewise linear function with m line segments and minimum sum of (pth power) errors.
Proof First we establish that every feasible solution of (1) with (3) corresponds to a continuous piecewise linear function with m segments and that the sum of (pth power) errors is given by (1a). Let b * , c * , * , * , * be an optimal solution to (1) with (3). Constraints (1c) ensure that each data point i is assigned to exactly one line segment j. Together with constraints (1d) and (1e) it is ensured that for each , are assigned if and only if * ij = 0 . The constraints (3) ensure that the breakpoints are nondecreasing and that accordingly the estimated f is a function (and none of the intervals [b j−1 , b j ) overlap for j = 0, … , m . Finally, the constraints (1f) ensure that for each j = 1, … , m − 1 , pair of line segments j and j + 1 must intersect at (b * j , c * j + * j b * j ) so the estimated function is continuous. To see that the sum of (pth power) errors is given by (1a) first note that constraints (1b) must hold as equalities by optimality of b * , c * , * , * , * . Then, for each pair (i, j) ∈ {1, … , n} × {1, … , m} , * i equals i's deviation absolute value if and only if segment j is assigned to point i, that is if * ij = 0. Now we establish that every continuous piecewise linear function on D ⊆ ℝ , where L = inf D and U = sup D , , with m ′ line segments, corresponds to a solution that is feasible for (1) with (3) for some m ≤ m ′ . Suppose that f ∶ D → ℝ is a continuous piecewise linear function with m ′ line segments and m � − 1 corresponding breakpoints b j , for j = 1, … , m − 1 , each of which joining two segments, satisfying and that minimizes the sum of squared (pth power) errors for input data x, y ∈ ℝ n .

MINLP formulations for continuous piecewise linear function…
The construction (7) together with the fact that f is a function on D ⊇ [x 1 , x n ] and setting b 0 = x 1 and b m = x n , assures that each data point i ∈ {1, … , n} is assigned to exactly one segment, implying that constraints (1c) are satisfied and that letting is well defined. For i = 1, … , n and j = 1, … , m , if ij = 0 then (1b) follows from (8) and (1d) and (1e) follow from (7). Otherwise if ij = 1 , then by construction the constraints (1d) and (1e) hold for Then, for each i = 1, … , n , constraints (1b) are satisfied also for j such that ij = 1 , as long as The last inequality followed from the optimality of f; the converse would imply that there is no line connecting any pair k, l that crosses line segment j implying the suboptimality of f. (8) implies that the solution b, c, , , has an objective value that equals the sum of (pth power) errors. Further, by the definitions (6a) together with (5), constraints (1h) and (3) are satisfied. Constraints (1f) follow from the continuity of f; for each j = 1, … , m − 1 , the pair of consecutive line segments j and j + 1 must intersect at (b j , c j + j b j ) . So b, c, , , corresponds to f, in particular it is a function whose graph coincides with that of f restricted to [x 1 , x n ] ⊆ D , and it is feasible for (1) with (3). ◻ We now derive relationships between the resulting optimization problems when the different constraints (2), (3) and (4) are added to (1). The obtained results enables us to prove the correctness of formulation (1), extended by either of these constraints. To this end, let • z R denote an optimal objective function value of (1), • z S denote an optimal objective function value of (1) with (2), • z 3 denote an optimal objective function value of (1) with (3), • z 4 denote an optimal objective function value of (1) with (4).
Finally, let X R , X 3 , X 4 , X S ⊆ ℝ 3m × ℝ n × {0, 1} nm denote the feasible regions corresponding to these formulations with optimal objective values z R , z 3 , z 4 and z S , (6b) c j =c j+q , j =̃j +q j = 1, … , m respectively. The following proposition establishes the relations between the feasible regions and objective function values of these formulations.

Theorem 2
Suppose that M i ,M i , for i = 1, … , n, are sufficiently large constants satisfying the hypothesis of Theorem 1. Then, feasible regions X R , X 3 , X 4 , X S satisfy X R ⊃ X 3 , X 4 ⊃ X S . Further, suppose that m ≤ n. Then, these formulations' optimal objective values satisfy z R ≤ z 3 = z 4 = z S . Proof X R ⊃ X 3 , X 4 is straightforward given that (1) is a relaxation of the formulations with either constraints (3) or (4), which require the breakpoints to be ordered. Further, X 3 , which admits only solutions whose breakpoints are ordered as feasible, is a relaxation of the formulation (1) with (2), which requires each segment to be assigned at least one data point, and whose constraints (1d) and (1e) together with (2), imply (3). To see that X S ⊂ X 4 , observe that the constraints (4) enforce that for each i = 1, … , n − 1 and j = 2, … , m a data point i + 1 can be assigned to segment j only if i is assigned to either j or to j − 1 . Evidently, constraints (4) are satisfied if there exists some integer 1 ≤ j ′ ≤ m for which all segments in {1, … , j � } are each assigned at least one data point and segments {j � + 1, … , m} are not assigned any data points. In particular, constraints (2) being satisfied imply that j � = m . Otherwise, a solution in X 4 with m > j ′ is also a solution in X S where m = j � . It follows that z 4 ≥ z S since the optimal solutions using m ≥ j ′ line segments cannot have an objective value that is worse than optimal solutions using only j ′ line segments. Equality follows from the fact that X 4 is a relaxation, that is To see that z 3 = z 4 = z S consider an optimal solution of (1) with (3) having optimal objective z 3 and a minimal number of line segments that is not assigned any data points. Then, consider a line segment along [b j−1 , b j ] that is not assigned any data points for some j = 1, … , m , where b 0 = min i=1,…,n x i , and a closest data point (for convenience suppose from the right) with horizontal coordinate x * (note that if no such line segment exists then clearly z 3 = z S ). Then, consider a solution with b j replaced by x * (so that segments j and j + 1 now intersect at horizontal coordinate x * and only the slope of line segment j that is previously not assigned any data points is adjusted accordingly). Evidently, this solution is feasible for (1) with (2) and has the same objective value. This shows that z 3 = z S and the proof of the equality with z 4 is similar. ◻ Now, Theorems 1 and 2 immediately imply the following Note that all of the proposed formulations require the breakpoints to be ordered, so they satisfy the requirement that the fitted curve is a graph of a function (including the formulations that require each line segment to be assigned at least one data point). We next compare the performance of a MINLP solver in solving the different alternative formulations.

Computational experiments
In our computational experiments we solve the formulation (1) with p = 2 and appending one of the constraints (2), (3) and (4), all or none of them-a relaxation of the intended problem. All formulations have been implemented using the Julia optimization package JuMP [4] and the state-of-the-art MINLP solver Couenne [2] version 0.5.6 run on a machine with a 2.5 GHz 4 MB cache CPU. The results of the experiments are shown in Table 2. In these experiments the time limit is set to 2 h. We experimented with the Titanium ( n = 49 ) dataset [3], a dataset based on the Concrete data [11] and the NHTemp dataset [7]. For the Concrete data we selected fly ash as the predictor variable and averaged the response variable value for entries with the same predictor variable value (so following that it had n = 58 ). NHTemp [7] is a dataset containing New Haven's average annual temperatures between the years 1912-1971 ( n = 60).
The results of Table 2 indicate that the formulation (1) with constraints (3) is usually solved faster and with the fewest number of branch-and-bound nodes compared with the other formulations. This is despite the fact that this formulation admits Table 2 Computational experiments with the alternative/different formulations using one or more of the constraints (2)-(4) and the given number of line segments m.
For each experiment we show the number of branch-and-bound nodes and CPU seconds. LIMIT indicates runs that exceeded the 2-h time limit. For each dataset and value of m, the best results in terms of running time and branch-and-bound nodes appear in bold many feasible solutions that would otherwise be infeasible (but with a similar objective value) for the formulations that append the constraints (2) or (4). A close second in the computational performance is the formulation (1) with (4). On the other hand, the formulation (1) with (2) failed to determine an optimal solution in many cases within the given 2-h time limit. Interestingly, formulation (1) by itself, which amounts to a relaxation of the other formulations, usually takes even longer to solve than (1) with (2), other than in the case of the Concrete instance with m = 2.

Conclusions
We have pointed out an error in the nonconvex MINLP for fitting piecewise linear functions proposed in [5]. This error may result in some cases in fitting a polygonal curve that is not the graph of a function. We present three different formulations that add inequalities in order to correct the MINLP. Formulations with constraints that explicitly enforce the ordering of breakpoints appear to perform better than assignment-type constraints that require that each line segment is assigned a data point. These assignment constraints implicitly imply the ordering of breakpoints when combined with other inequalities of the model, and they are shown to preserve at least one solution that is optimal for the function-fitting problem (which requires the breakpoints to be ordered but does not require each line segment to be assigned a data point).
Finally, the study of these inequalities and alternative formulations may also suggest and support the use of particular ordering-based constraints rather than assignment-type constraints for other formulations have been recently modeled to effectively solve larger problems such as the one proposed in [9].
Funding Open Access funding enabled and organized by Projekt DEAL. Matthew Turner and funding by the MISTI MIT-Israel program are acknowledged for a Julia/JuMP implementation of models [5], which have been extended for the computational work in the current paper. Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)-445857709.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.