Piecewise parametric structure in the pooling problem: from sparse strongly-polynomial solutions to NP-hardness

The standard pooling problem is a NP-hard subclass of non-convex quadratically-constrained optimization problems that commonly arises in process systems engineering applications. We take a parametric approach to uncovering topological structure and sparsity, focusing on the single quality standard pooling problem in its p-formulation. The structure uncovered in this approach validates Professor Christodoulos A. Floudas’ intuition that pooling problems are rooted in piecewise-defined functions. We introduce dominant active topologies under relaxed flow availability to explicitly identify pooling problem sparsity and show that the sparse patterns of active topological structure are associated with a piecewise objective function. Finally, the paper explains the conditions under which sparsity vanishes and where the combinatorial complexity emerges to cross over the P / NP boundary. We formally present the results obtained and their derivations for various specialized single quality pooling problem subclasses.


Introduction
The standard pooling problem represents a NP-hard subclass [3] of non-convex quadraticallyconstrained optimization problems with bilinear terms and may have a multiplicity of local minima [33]. Pooling problems model the computational difficulties associated with intermediate blending of heterogeneous feedstocks and therefore have direct application in This manuscript is dedicated, with deepest respect, to the memory of Professor Christodoulos A. Floudas. . . . [1,. . . ,I] . . . Directs [1,. . . ,H] . . . Pools [1,. . . ,L] . . .

Inputs
Outputs [1,. . . ,J] in 1 di 1 y LJ Fig. 1 Standard pooling network. Note the feeds layer is separated into two groups of nodes: input nodes [1,…,I] that send flows to pools only and direct nodes [1,…,H] that send flows to outputs only. In general, one feed can send flows both to pools and outputs. In this case, any of the feed's flows is assigned, based on the layer they are sent towards, to either an input or direct node corresponding to the feed. This explicit input/direct separation helps build a clear understanding of the problem sparsity/structure as related to flows to pools versus flows to outputs. The separation is generally adopted in the paper's figures, discussions or proofs, as needed with a piecewise objective function and we take advantage of these structures. Lastly, we explain the conditions under which such sparsity vanishes by reintroducing constraints on flow availability and, together with them, the combinatorial complexity needed to cross over the P/N P-time boundary. The paper proceeds as follows: Section 2 introduces the single quality formulation of the standard pooling problem and the assumptions (flow constraint relaxations) used throughout this paper; Sect. 3 analyzes the one pool, one output subclass and uncovers both a piecewisemonotone structure and a strongly-polynomial time algorithm for solving it; Sect. 4 extends the results in Sect. 3 to the subclass with multiple outputs via additive decomposition over outputs; Sect. 5 extends Sect. 3 results to the subclass with multiple pools using problem sparsity; Sect. 6 discusses the implications and possible extensions of the results. The source code implementation of the results discussed in this paper is available on Github [8].

Standard pooling p-formulation and assumptions used
This manuscript unpacks single quality standard pooling problem solutions by parameterizing with respect to pool concentrations. To effectively do so, the paper employs a concentrationbased formulation, i.e. the p-formulation shown in Problem P-2.1 [5]. Table 1 introduces the notation used for indices, sets, variables, parameters, as well as for the problem subclass types analyzed in the following sections. Fig. 1 shows the topological structure of a standard pooling network, represented by a feed-forward flow network of 3 node layers.
Different flows pass between the three layers, having different concentrations of various qualities, e.g. crude oil chemical compositions. Input feeds in 1 − in I send flows denoted by x variables to be linearly blended in L pools ( p 1 − p L ), that further distribute y flows Table 1 Standard pooling problem notation [43] Type Notation Description Indices i ∈ {i | (i, ·) ∈ T X ∪ T Z } Input streams (raw materials or feed stocks) l ∈ 1, L Pools (blending facilities) j ∈ 1, J Output streams (end products) k ∈ 1, K Attributes (qualities monitored) Acceptable composition range of quality k in product j The standard pooling problem is a flexible formulation used for many real-world applications with different measure units for variables/parameters, e.g. in [35]: flows(ton/hr), quality level/concentration(ppm), objective/unit cost/unit revenue($), capacity (tons) to J outputs (o 1 − o J ) to create blended products. Additionally, H direct feeds di 1 − di H send z flows directly to the outputs layer. The standard pooling Problem P-2.1 consists of maximizing a profit function with profits and costs associated to each network flow, subject to flow constraints, e.g. feed availability, pool capacity, output demands, flow balance at pools, and quality concentration constraints, e.g. quality balance at pools, product quality bounds at outputs. This manuscript addresses a somewhat more complex pooling problem than previous work analyzing the P/N P boundary [3,15,31,32] by considering the direct feeds di 1 − di H .
(Objective) max (P-2. 2) The following sections analyze bottom-up several subclasses of Problem P-2.2 based on topological restrictions, proving in each case strongly-polynomial time complexity coupled with finding sparse piecewise structures. The analysis contours the P/N P-hard boundary for Problem P-2.2 subclasses.

Subclass I+H−1−1: one pool, one output
This section analyzes the topological restriction of Problem P-2.2 with I +H feeds (I inputs, H directs), one pool, and one output. For simplicity of notation, single indices l and j are dropped from variables and parameters via the notation transformations T Z ← {i : (i, j) ∈ T Z }, T X ← {i : (i, l) ∈ T X } leading to the restricted Problem P-3.3. max Variables p, y can be eliminated from Problem P-3.3 by directly substituting all y, p · y terms from their constraints. Thus, f can be rewritten: In Problem P-3.4, we eliminate y but retain p as a parameter controlling flows x i , ∀i ∈ T X relative to each other. In addition, eliminating p from Problem P-3.3 produces a linear program (LP) in the x, z variables. Since at optimality the LP has at most three tight constraint bounds (product quality can only have one tight bound), the cardinality of the optimal basis is at most three, but the basic variables among x, z can not be identified directly in this manner. Consequently, retaining parameter p allows us to analytically understand the optimal solutions for Problem P-3.4 and identify basic variables among x, z across p-intervals, together with any problem structure. In particular, the objective function p-parametric form may be used to break the p-interval [min i C i , max i C i ] into sub-intervals where special properties of f arise. Section 4 uses this parametric approach for solving a non-convex/nonlinear problem subclass in strongly-polynomial time.
Active sets, dominance relations and breakpoints are essential building blocks to find the structure of f * ( p) in Problem P-3.4 and are all introduced in Definitions 3.1-3.4.

Definition 3.1 (Active sets, objective function and cost function) Set
A of nodes from the feed layer is: For an active set A in Problem P-3.4, the objective function f is given by, where f is not p-parametric in the second case of no input flow to the pool since pool concentration p is undefined. Let h = d · D − f denote the cost function associated with objective function f .

Definition 3.2 (Feasibility with respect to product quality constraints)
A Problem P-3.4 active set is feasible if the product quality bounds [P L , P U ] are met, i.e. the second constraint holds. An infeasible active set is not a valid Problem P-3.4 solution and is therefore strictly dominated by any feasible active set (see Definition 3.3).

Definition 3.3 (Dominance and breakpoints between active sets)
Let A 1 , A 2 be feasible input/mixed active sets. Let f * A ( p) be the optimal objective function value to Problem P-3.4 and h * A ( p) its corresponding optimal cost function value, assuming active set A and fixed p. • Set A 1 dominates A 2 at p (in the sense of maximized objective function profitability) when, • Pool concentration p is a breakpoint between A 1 and A 2 if: • The dominance relation also extends to direct active sets, but in this case f is not parametric on p. Consequently, when comparing two direct active sets, dominance is established similarly via Eq. (2) but independent of p, and as such no breakpoints exist. Thus, for fixed p, a total order can be established over the set of all possible active sets.

Definition 3.4 (Dominant active sets and dominance breakpoints) Let
and the optimal objective solution of Problem P-3.4 is f * = max p f * A * ( p) ( p), where: A dominance breakpoint represents a p value where the dominant active set changes, i.e. ∀ 0 < < 0 , where 0 is a sufficiently small positive number, Input and mixed dominance breakpoints are similarly defined as in Eq. (5) but with A * replaced by A I and A M , respectively. Let the sets of input and mixed dominance breakpoints be denoted by B I and B M , respectively.
The input, direct and mixed active sets have different dominance properties and thus the analysis proceeds in Sects. 3.1-3.3 by active set type. Section 3.1 ignores directs and product quality constraints and focuses only on inputs. Since directs are ignored, the pool concentration p represents the output concentration, and hence p is assumed free of product quality bounds. The analysis of the p-parametric optimal objective f * ( p) reveals a piecewise-linear structure associated with pairs of inputs acting as the dominant input active set. Section 3.2 treats the complementary case, ignoring inputs and focusing only on directs while assuming product quality constraints. Since inputs and therefore the pool are assumed to send no flow in this case, the optimal objective f * and the dominant direct active set are found independently of p. Finally, Sect. 3.3 integrates the Sects. 3.1-3.2 results, combining both inputs and directs under assumed product quality constraints to reveal a sparse, piecewise-monotone structure of the p-parametric optimal objective f * ( p).
Sections 3.1-3.3 analytically and parametrically identify all (dominance) breakpoints, sparse dominant active sets and associated p-parametric solutions for Problem P-3.4. This analysis leads to a strongly-polynomial algorithm in Sect. 3.3 for solving the I+H−1−1 subclass formalized in Problem P-3.4. Furthermore the full structure of the p-parametric optimal objective function f * ( p) developed in Sect. 3.3 is vital for Sects. 4-5.

Remark 3.5 For any
precedence is given to the node i with cheaper flow, or at cost equality a random choice is made. After pre-filtering all feeds of equal concentrations on cost criteria, we are assured ∀i, j ∈ T X ∪ T Z , i = j that and vice versa. We apply the enumerated precedence rules throughout Sect. 3. This pre-filtering avoids undefined expressions, e.g. denominators with value zero in the subsequent sections.

Inputs-only analysis (I−1−1 sub-case)
This subsection considers the Problem P-3.4 restriction with no direct flows to the output (z i = 0, ∀i ∈ T Z ) and no product quality constraints on the pool concentration p. The resulting Problem P-3.1.5 is p-parametric, and thus we seek to find both the optimal solution and the full p-parametric structure. While Remark 3.1.1 observes the cardinality of the dominant input active set at any p, Proposition 3.1.5 explicitly identifies A I ( p). Theorem 3.1.6 then finds all input dominance breakpoints B I , the optimal solution and more importantly, the full piecewise-linear structure of the p-parametric optimal objective f * ( p) motivated by Lemma 3.1.2. The piecewise structure is expanded in Sect. 3.3 in the presence of directs and product quality constraints, but remains fundamental to all analytical solutions found in the paper, including Sects. 4-5.
with flows Proof The flows in Eq. (7) result from where the second equality for γ i j ( p) follows from Eq. (7). Note that γ i j ( p) is a weighted average of γ i , γ j , uniquely determined at a fixed p, which in turn is a weighted average of C i , C j .
Proof Follows from Remark 3.1.1 coupled with Proposition 3.1.4.

which requires I (number of inputs) evaluations. (ii) A full description of f * ( p) can be obtained in strongly-polynomial time O(I 3 ), with the set B I of input dominance breakpoints,
Between any two consecutive elements of B I , the dominant input active set remains constant, i.e.
Proof (i) Since Proposition 3.1.5 implies |A I ( p)| = 2, let two such dominant active input pairs, {i, j} and {k, l}, and w.l.o.g. assume C i < C j , C k < C l . Assume, to achieve a contradiction, that an input dominance breakpoint occurs As a result, in the geometric construction of Fig. 3 values obtained on the side i − l (dashed green) are higher than the optimal objective values obtained by going through the breakpoint b (lines in bold blue), contradiction. Therefore no input dominance breakpoint can occur at a pool concentration b / ∈ {C i | i ∈ T X }. Since Lemma 3.1.2 implies f * ( p) is linear between any two input dominance breakpoints when an input pair is active, the assertion made follows. (ii) To fully describe f * ( p), if C i for fixed i ∈ T X is an input dominance breakpoint, then, according to Eq. (10), node i must strictly dominate at C i any input pair not containing it. Eq. (11b) follows via the definitions of Optimal objective function f * ( p) versus pool concentration p for a one pool, one output network with five inputs (parametrized with concentrations/costs). The objective is a piecewise-linear function of the pool concentration Remark 3.1.7 If product quality constraints are re-added to Problem P-3.1.5, then Theorem 3.1.6 still applies, with valid input dominance breakpoints (B I ∩(P L , P U ))∪{P L , P U }.
We conclude the I−1−1 sub-case analysis with a numerical example showcasing the implications of Theorem 3.1.6. For the Fig. 4 example with five inputs and no quality constraints, the function f * ( p) reveals breakpoints at concentrations C 2 , C 3 and C 4 in a piecewiselinear structure, as expected via Lemma 3.1.2. Furthermore, each p-interval between two breakpoints identifies the sparse dominant input active sets and their corresponding pair of active flow variables. The coupling of sparsity with piecewise-linear structure matches the Beale et al. [9] intuition. These special structures provide motivation to further explore pparametric optimal objective structure on progressively more general problem subclasses in the remaining sections. . . . .

Directs-only analysis (H−1−1 sub-case)
This subsection restricts Problem P-3.4 to disallow flows from the input nodes to the pool node (x i = 0, ∀i ∈ T X ). The resulting Problem P-3.2.6, which is complementary to Sect. 3.1, also incorporates the product quality constraints. Optimal solutions implying direct-only flows are not parametric on p, since the problem is independent of pool concentration. This subsection therefore seeks the unique, dominant direct active set and its solution. max Remark 3.2.1 (Dominant direct active set of maximum cardinality 2) Problem P-3.2.6 can be rewritten as a standard LP in the z variables with at most two active constraints at the optimal basis, implying the dominant direct active set has at most cardinality 2, i.e. |A D | ≤ 2 ( Fig. 5).

Lemma 3.2.2 (Simple feasibility conditions)
can be obtained. However, this case implicitly assumed i ∈ T Z s.t. P L ≤ C i ≤ P U , and consequently C j > P U , concluding the proof.
Then, P(i, j) represents the output concentration at optimality, reducing Problem P-3.2.6 to Proof Proof in "Appendix A". Note that due to Remark 3.5 and Lemmas 3.
The result implies that for a feasible direct active pair {i, j} with {i, j} i, j, Problem P-3.2.7 is analogous to the input-only Problem P-3.1.5, with input flows x i replaced by direct flows z i and fixed pool concentration p replaced by a product quality limit P(i, j) (either lower or upper). Thus, the flow and dominance results for pairs in Sect. 3.1 are mirrored via Corollaries 3.2.5-3.2.6. Moreover, any pair viable as the dominant direct active set needs to first dominate both its individual nodes, so only such pairs and their solutions are of interest.
Proof Analogous to Lemma 3.1.2, but for Problem P-3.2.7 rather than Problem P-3.1.5.

Corollary 3.2.6 (Domination condition between active direct pairs)
which can be found in strongly-polynomial time O(H 2 ), then either: Proof Eq. (15) follows directly from Corollary 3.2.6.

.1 imply that
with a linearly weighted cost would dominate i, contradiction, and therefore γ j > γ i . If more restrictively,

Inputs and directs analysis (I+H−1−1 subclass)
This subsection considers the original p-parametric Problem P-3.4, allowing mixed active sets of both input and direct nodes. Theorem 3.3.1 uses the interplay of earlier results for both input (Sect. 3.1) and direct (Sect. 3.2) active sets to pinpoint mixed active sets that can be dominant (overall) active sets as triples of two inputs and one direct. This section focuses on mixed triples not dominated by the dominant input active set A I . Definition 3.3.2 first extends the feasibility conditions from Sect. 3.2 for mixed triples viewed as direct pairs to p-intervals by partitioning any p-interval Φ around {P L , P U } if necessary and building Q(Φ), the set of directs making the mixed active set feasible. Definition 3.3.2e also extends the directs domination result in Lemma 3.2.3 to p-intervals. Lastly, Definition 3.3.2f splits p-intervals Φ into sub-intervals Φ I and Φ M based on whether the dominant mixed active triple is dominated or not by A I over the sub-intervals, respectively. Using the breakpoints between mixed and input active sets identified in Lemma 3.3.3, Lemma 3.3.4 then implements the Φ I /Φ M split of any interval Φ.
Based on the latter results, Proposition 3.3.5 finds the dominant mixed active set for fixed p and all mixed dominance breakpoints, while Proposition 3.3.6 finds all dominance breakpoints. Moreover, Proposition 3.3.7 finds the p-parametric optimal objective function to be monotone convex/concave for mixed active sets. Consequently, Theorem 3.3.8 summarizes all cases of optimal objective monotonicity. Finally, Theorem 3.3.9 uses all objective monotonicity results to find the optimal solution at a breakpoint dominance point in stronglypolynomial time.
Proof Fixing p, the concentration the active input set delivers via the pool towards the output concentration, implies product quality constraints on output concentration become irrelevant when considering only the active inputs part of a mixed active set. This observation allows to first pre-solve the inputs-only sub-Problem P-3.3.8, where the objective function f A ( p, x A ( p)) is now also parametric on variable total input flow x A ( p) for an active input set A. According to Proposition 3.1.5, A = i j = {i, j} at optimality for Problem P-3.3.8, with the solution in Eq. (18).
Now the optimal input-only parametric solution in Eq.

Definition 3.3.2 (Feasibility/domination extensions to p-intervals around quality bounds)
(a) Let sub-intervals of the partition {P L , P U } of inputs/directs concentrations be denoted by: (c) Let Q(Φ) be the set of directs with concentration outside Φ's partition around {P L , P U }, i.e.: Since γ i j ( p), as defined in Eq. (8), is a linear function of p with extremes at Φ endpoints, Thus, Θ(Φ) extends Lemmas 3.2.2 and 3.2.3 from fixed p to interval Φ (see proof of Lemma 3.3.4.i).

Lemma 3.3.3 (Domination/breakpoints between dominant input and mixed active sets)
Proof Proof in "Appendix A".

Lemma 3.3.4 For a given p-interval
(ii) (∀Φ) Sub-intervals Φ I , Φ M can be found explicitly as: q} then P(i j, q) reduces to: independent of specific p ∈Φ.
Proof (i) The restriction Q(Φ) ∈ T Z , and its subset R(Φ) ∈ T Z , enforces Lemma 3.2.2 feasibility for {i j, q}, ∀ p ∈ Φ viewed as a direct active set. Set Θ(Φ) also enforces the When R(Φ) = ∅ and Φ I LU then i j infeasible, and since a direct needs to be active for feasibility, in this case (iii) Assume firstΦ = Φ I = Φ M . Then, Eq. (21) implies (∀q ∈ Θ(Φ))(∀ p ∈ Φ) γ i j ( p) ≤ γ q . Consequently, the expression for P(i j, q) according to Eq. (12) reduces to Eq. (25). Now assumeΦ = Φ M = Φ I . Then, Eq. (21) implies (∀ p ∈Φ) A M ( p) = {i j, q} p A I ( p) and hence (∀ p ∈Φ) γ i j ( p) ≥ γ q . Thus, again, the expression for P(i j, q) reduces to Eq. (25). For both possibleΦ ∈ {Φ I , Φ M }, due to the construction of Φ and Θ(Φ) with q ∈ Θ(Φ), (∀ p ∈Φ) (C q < p) = (C q < b u ), and thus the comparison becomes independent of a specific p ∈Φ. Furthermore, to ensure feasibility of {i j, q} according to Lemma 3.2.2.ii, (P(i j, q) − C q )(P(i j, q) − b u ) < 0 is enforced explicitly in Eq. (25). (ii) "Appendix A" proves Eq. (27) and introduces function Γ p, P(i j, q), P(i j, r )) which is quadratic in p and linear in P(i j, q), P(i j, r ). To solve the function Γ p, P(i j, q), P(i j, r ) as a quadratic of p, (∀q ∈ Θ(Φ)) P(i j, q) has to be independent of p via the form in Eq. (25).  P(i j, q), q ∈ Θ(Φ), then feasible {i j, q} = A M ( p). Consequently, the conjunctive condition q, r ∈ A M ( p) eliminates not only those mixed breakpoints that are not dominant, but also those calculated on potentially incorrect P(i j, q) from Eq. (25). Note that, unlike the inputs-only case of Sect. 3.1, two breakpoints between any {i j, q} and {i j, r } can occur, because f {i j,q} ( p) and f {i j,r } ( p) are convex or concave functions (see Proposition 3.3.7) which can intersect at two points. The endpoints ofΦ also represent mixed dominance breakpoints, since for p ∈ {b l , b u }, given A M ( p) = {i j, q}, either i j, q or P(i j, q) change at p, creating a breakpoint.

Proposition 3.3.6 For a p-interval
P(i j, q) as in Eq. (25).
which implies f A ( p) is monotone convex/concave:

and f A ( p) is concave decreasing or convex increasing.
Proof Proof in "Appendix A". Theorem 3.3.8 ( p-Parametric structure of the optimal objective function f * ( p)) Consider a given p-interval Φ between two consecutive dominance breakpoints for Problem P-3.4. Functions f * , A * , A I , A M are p-parametric with the following cases, with the optimal objective value f * A D for the dominant direct active set acting as a threshold.

Both f * and a full description of f * ( p) via all dominance breakpoints B (B I M plus all dominance breakpoints with A D ) can be obtained in strongly-polynomial time O(I 3 + H 3 ).
Proof Denoting by T (·) the time-complexity of calculating result (·), Eq. (27)   Optimal objective function f * ( p) versus pool concentration p for a one pool, one output network with five inputs and three directs (parametrized with concentrations/costs). The objective is a piecewise-monotone convex/concave/linear function of the pool concentration in Fig. 4) and three directs with quality constraints, the p-parametric function f * ( p) reveals additional breakpoints compared to Fig. 4 between mixed and input active sets and between mixed and direct active sets. The structure is still piecewise-monotone, but is extended to piecewise p-intervals exhibiting convexity or concavity, e.g. when 3 ≤ p ≤ 4 where mixed active set {in 3 , in 4 , di 1 } is dominant. Similar to the results in Sect. 3.1, the coupling between the piecewise structure and sparse active sets (up to a mixed node triple) is still present, allowing a full description of the p-parametric optimal solution space. Section 4 explicitly uses this full description to find optimal solutions in strongly-polynomial time for a multiple outputs instance. • Introducing multiple qualities (K > 1) keeps the problem as an LP, but its polynomial complexity increases in line with K as the dominant active set cardinality becomes K +1 (this extension is possible for one pool, one output topologies). • Relaxing the fixed product demand assumption implies the same solution with product demand reaching its upper limit if the problem is (assumed) feasible.

Subclass I + H−1−J: one pool, multiple outputs
This section extends the analysis in Sect. 3 with Assumption 2.2 to I +H feeds (I inputs, H directs), one pool and multiple J outputs. Again, for simplicity of notation, single index l is dropped via the notation transformations T X ← {i : This leads to Problem P-4.10, where for each output only connected to directs surplus variables y j ∀ j ∈ 1, J \ T Y are introduced and set to 0 as a surplus condition. Note that eliminating p and y j ∀ j ∈ T Y from Problem P-4.10 does not produce a linear program as in Sect. 3, but instead a bilinear problem that can be non-convex.
To analyze Problem P-4.10, Theorem 4.1 first proves its equivalence to Problem P-4.11. The result allows additively decomposing Problem P-4.11 over outputs into sub-problems P-4.11j, which are all p-parametric and equivalent to the subclass I+H-1-1 studied in Sect. 3. Definition 4.2 then extends dominance breakpoints and dominant active sets for a multiple outputs problem setting. Proposition 4.3 shows that the composed master Problem P-4.11 can present p-parametric non-monotonicity or non-convexity on specific p-intervals. This hurdle is cleared by Theorem 4.4, which finds in polynomial time all stationary points on non-monotone breakpoint intervals by solving a univariate rational polynomial. Finally, Corollary 4.5 offers a strongly-polynomial worst-case time complexity for solving Problem P-4.11 to optimality.
The section concludes by Remark 4.6, showing the I+H-1-J subclass lies on the P / N P boundary due to the fact that any relaxation of the assumptions made leads to an NP-hard problem.
(P-4.10) Fig. 7, Problem P-4.10 can be reformulated as a Problem P-4.11 of maximizing the total sum of J p-parametric optimal objectives over p, each associated to a one output sub-problem P-4.11-j (same type as Problem P-3.4).
Proof Each x i ∀i ∈ T X can be split into a sum of flows x i, j ∀ j ∈ 1, J , with each x i, j representing the flow output j gets via the pool from input i. This allows similar outputbased splits for the two constraints in Problem P-4.10 that apply jointly over all outputs, i.e. flow and quality balance: Furthermore, the objective f as a function of p in Problem P-4.10 can be rewritten as, where each f j contains a different set of variables and parameters for a fixed j except for common variable p which it is parametrized on. In the context of maximizing f ( p), we have: Finally, separating out both the objective and constraints parts for all j ∈ 1, J from Problem P-4.10, we obtain a collection of J sub-problems P-4.11j, all parametric on and therefore linked via a common p but otherwise independent. However, as shown in Sect. 3.3, each p-parametric sub-problem P-4.11j can be solved analytically using the piecewise structure of f * j ( p), and each solution can be used directly towards solving the master Problem P-4.10 linking all J sub-problems. Note that, for each sub-problem P-4.11j ∀ j ∈ 1, J , y j is eliminated as in Problem P-3.4, with the surplus condition enforced through the x variables set to 0; when j / ∈ T Y , the sub-problem is thus a directs-only one.

Definition 4.2 (Dominance breakpoints and dominant active sets)
• Let B J = j∈1,J B j be the joint dominance breakpoint set for Problem P-4.11j over all J outputs/sub-problems, with B j the set of all dominance breakpoints for the j-th sub-problem P-4.11j, found as in Sect. 3. • Let Φ J denote any closed interval with two consecutive elements in B J as endpoints.
• Let A * j ( p) denote the dominant active set at p for the j-th sub-problem P-4.11j, as found in Theorem 3.3.8; by construction,
Proof From Theorem 3.3.8, for a given Φ J , can be either constant, linear or monotone convex/concave depending on dominant active set A * j (Φ J ). Consider a two-output Problem P-4.11 with, and f * j 1 ( p) concave increasing and f * j 2 ( p) concave decreasing. Then meaning f * ( p) has a local maximum at p 1 and is non-monotone. Now consider a four-output Problem P-4.11 with f * ( p) = f * 1 ( p)+ f * 2 ( p) with f * 2 ( p) constructed analogously as f * 1 ( p) but having a local maximum at p 2 ∈ Φ J , p 2 = p 1 . In this case, f * ( p) is multi-modal with at least two local maxima and therefore non-convex.

Theorem 4.4
Assuming Problem P-4.11 has all parameters rational, given interval Φ J , then finding f * | Φ J = max p∈Φ J ( f * ( p)) requires finding all stationary points of f * ( p) over Φ J by solving a univariate rational polynomial of maximum degree 2 · |T Y |, i.e.: where C is a constant and P j (·, ·) is defined as in Eq. (25) but for bounds P L j , P U j . The polynomial in Eq. (34) can be solved in strongly-polynomial time with respect to T Y , |T Y | ≤ J .
Proof Section 3 implies that for Problem P-4.11, The value f * | Φ J corresponds to the maximum objective function value evaluated at all stationary points. To find all roots/stationary points of f * ( p) ∀ p ∈ Φ J , the common denominator can be multiplied out in Eq. (34) to form a polynomial. The resulting polynomial from Eq. j), (q 2 , j) ∈ T Z . Furthermore, by assumption, the polynomial in Eq. (34) has rational coefficients. Consequently, which, as a rational univariate polynomial, can be solved deterministically in stronglypolynomial time, using for example the algorithm in [38] with worst-case time bound T (UnivPoly),
Proof In the worst case, on all intervals between two joint dominance breakpoints in B J the additive decomposition over outputs in Problem P-4.11 requires solving a univariate rational polynomial as in Theorem 4.4. Denoting by T (·) the time-complexity of calculating result (·), this implies since on each interval between two input dominance breakpoints there can be mixed dominance breakpoints between a pair of mixed sets, therefore between a pair of directs (see Proposition 3.3.5). Finally, combining Eqs. (39) and (40) results in Eq. (38).  • Reinstating constraints on feed availability/pool capacity for any particular j ∈ T Y implies the mixed active set term in Eq. (34) gets split into a hierarchy of potential mixed active sets (see Remark 3.3.10) each with total flow an unknown proportion of D j . This hierarchy of sets leads to a bivariate rational polynomial which is NP-hard to solve [25]. • Introducing multiple qualities (K > 1) implies variables p k , k ∈ 1, K , are not independent -a specific pool concentration in one quality restricts the concentration range in another. Consequently, Eq. (34) becomes an NP-hard multivariate polynomial system. • Relaxing the fixed product demand assumption -if any of D j ∀ j ∈ T Y are not fixed but unknown, then Eq. (34) becomes an NP-hard bivariate polynomial system. • Extending to the full topology I +H −L−J , again Eq. (34) translates to a coupled multivariate polynomial system when two pools send non-zero flow to the same output (as in Sect. 5, Theorem 5.2) and one of the two pools also sends non-zero flow to a different output.
In summary, when feed to output connections (directs) are considered, the I+H−1−J class following Assumption 2.2 lies on the P/N P boundary, and can be solved analytically as shown in this section despite f * ( p) being piecewise non-monotone or non-convex. [33] is strongly-polynomially solvable) The Haverly [33] instances, i.e. the first set of three pooling problems in the literature, are part of the single-quality I+H-1-J class following Assumption 2.2. We can obtain their exact solutions analytically in strongly-polynomial time! [60] report similar, piecewise-monotone structure in a central processor queueing problem where workstation files are assigned to file servers. The Woodside and Tripathi [60] proofs cannot be directly applied to standard pooling, but the similarity recalls the deep connection between pooling and queueing.

Subclass I+H−L−1: multiple pools, one output
This section extends the analysis in Sect. 3 with Assumption 2.2 to I +H feeds (I inputs, H directs), L pools and one output. Again, for simplicity of notation, single index j is dropped via the notation transformations T Z ← {i : (i, j) ∈ T Z }, T Y ← {l : (l, j) ∈ T Y }. This leads to Problem P-5.12, where eliminating variables p l , y l ∀l ∈ T Y results in an LP as for Problem P-3.4 in Sect. 3, limited to a maximum cardinality of four in terms of the x, z variables. We further identify this solution analytically, understanding pool sparsity and the parametric structure of the optimal objective in the process. To analyze Problem P-5.12, Definition 5.1 first introduces active pools. Theorem 5.2 then finds a maximum of two active pools contribute to the optimal solution and further shows all cases induced are parametric on pools concentrations. Furthermore, Theorem 5.3 proves all cases involved in Theorem 5.2 in fact reduce to the I+H-1-1 subclass studied in Sect. 3. Finally, Corollary 5.4 offers a stronglypolynomial bound on solving Problem P-5.12 and the section concludes with an illustrative numerical example.

Definition 5.1 (Active pools)
• Let the L = |T Y | pools in Problem P-5.12 be denoted by i with concentration p i , ∀i ∈ 1, L.
• An active pool has incoming and outgoing flows strictly non-zero. A non-active pool l ∈ T Y has well-defined concentration by assuming only y l = 0 but x i,l = 0, ∀i : (i, l) ∈ T X . However, any non-active pool is disconnected via y l = 0 from the output, does not influence objective function f , and can be removed along with any of its flow connections from Problem P-5.12. (P-5.12)

Theorem 5.2 (Pool sparsity and pool-parametric objective function)
For fixed p l ∀l ∈ T Y , at optimality, Problem P-5.12 has a maximum of two active pools with the optimal objective function parametric on their concentrations: Proof Variables p l ∀l ∈ T Y and y l ∀l ∈ T Y can be substituted out from Problem P-5.12 using the penultimate (quality balances) and second (flow balances) constraint types, respectively. Hence, Problem P-5.12 becomes parametric on p l ∀l ∈ T Y with optimal objective For fixed pool concentrations p l ∀l ∈ T Y , all pools have fixed optimal cost γ l (obtained as γ l = γ i j ( p l ) with A I ( p l ) = i j for the I-1-1 subclass according to Sect. 3.1) and thus behave like additional directs sending flow directly to the output. Consequently, Theorem 3.2.7 for the directs-only subclass from Sect. 3.2 applies, implying that maximum two nodes among pools and directs are active. The case with two pools active (see Fig. 9) is treated separately in Eq. (41) as a two pool parametric restriction of Eq. (42) where no directs are active (therefore associated flow variables {z i } can be eliminated). The second case in Eq. (41) aggregates the cases with maximum one pool active, and corresponds directly to the class of Problems P-3.4 solved in Sect. 3.3.

Theorem 5.3 (Solution for two active pools at an input dominance breakpoint)
Proof Suppose concentration p n is fixed which implies pool n acts as an additional direct with fixed concentration and cost (optimal). Since p m , p n are independent, parametric optimal objective f * (   Remark 5.5 (Sparsity results extend to a multi-layered network) Theorem 5.2 extends the sparsity results from the input layer to the pool layer. These sparsity results would also hold for networks with more layers.
Remark 5.6 (From analytical solutions/sparse piecewise structure to non-sparse LP/NPhardness) Section 5 finds analytically the optimal solution for a I+H−L−1 pooling topology with Assumption 2.2. Since the I+H−L−1 subclass is an LP extension of the I+H−1−1 instance, relaxing any constraint assumption, as described in Remark 3.3.10, leads to intractable analytical solutions and vanishing sparse structure. As explained in Remark 4.6, expanding to full topology I+H−L−J results in NP-hardness. also outputs (direct bypass flows) and certain flow assumptions. Patterns and hierarchies of dominating topologies are used to find active network structure. The sparsity identified in the active network structure at optimality is then linked to a pool parametric piecewise structure of the objective function. The result reveals analytically Professor Floudas' intuition of piecewise structure in pooling problem instances.
The parametric objective function is then shown to be piecewise-monotone for instances with one output, allowing exact global solutions in strongly-polynomial time as alternatives to black-box linear programming. The insights are further used for non-linear instances with multiple outputs and one pool to overcome piecewise non-monotonicity via stationary points found in strongly-polynomial time. This result introduces a new reference point on the P/N P boundary for standard pooling subclasses, as any relaxation of assumptions or full topology (multiple pools and outputs) are shown to reach NP-hardness. The multiple outputs subclass and its assumptions includes the Haverly [33] pooling problems, showing for the first time they have exact, analytical solutions.
The position on the P/N P boundary of the multiple outputs and one pool subclass is thus an ideal starting point for approximating algorithms that cross into NP-hardness. Moreover, this paper developed intuition around sparse solutions and the conditions under which sparsity vanishes. This encourages future research in building disjunctive cuts based on the structures identified to partition feasible space in the non-sparse NP-hard subclasses, an approach taken by the state-of-the-art heuristic developed in [21].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A Omitted proofs
Proof of Proposition 3.2.4 Lemma 3.2.3 implies w.l.o.g. that C i > P U (the other case C i < P L is analogous). Then C j < P U , otherwise (C i z i + C j z j )/(z i + z j ) > P U implying {i, j} infeasible. Two cases now arise: 1. C j > P L : Then γ i < γ j , otherwise j {i, j}. Therefore increasing the cheaper flow z i maximizes f , but since C i > C j , doing so also increases concentration (C i z i + C j z j )/(z i + z j ), which should not exceed P U for feasibility. Consequently, (C i z i + C j z j )/(z i + z j ) = P U . 2. C j < P L : If γ i < γ j then, as in the previous case, (C i z i + C j z j )/(z i + z j ) = P U .
Otherwise, if γ i > γ j , decreasing output concentration till bound P L and therefore increasing weight of lower cost γ j maximizes f , so (C i z i + C j z j )/(z i + z j ) = P L .
Analogously when C i < P L , if P L < C j < P U then (C i z i + C j z j )/(z i + z j ) = P L , or else if C j > P U then (C i z i + C j z j )/(z i + z j ) = P L or P U (depending on whether γ i < γ j or γ i > γ j , respectively). Aggregating all the cases analyzed, (C i z i + C j z j )/(z i + z j ) = P L if (C i − C j )(γ i − γ j ) > 0 and (C i z i + C j z j )/(z i + z j ) = P U if (C i − C j )(γ i − γ j ) < 0, corresponding to P(i, j) defined in Eq. (12). x i γ i − x j γ j − z q γ q . Therefore, i j p {i j, q} ⇔ x i γ i + x j γ j ≥ x i γ i + x j γ j + z q γ q , which, by replacing all flows with their solutions from Eq. (7) and Eq. (16), can be rewritten as: Since constant demand D > 0 and output concentration P(i j, q) is a linear combination of p, C q , then D( p − P(i j, q)) ( p − C q ) ≥ 0 and the dominance condition becomes: with the breakpoint achieved by solving for p at equality. ⇔ γ i x i + γ j x j + γ q z q = γ i x i + γ j x j + γ r z r , which after replacing the flow variables on both sides from Eq. (16), becomes, Multiplying out the common denominator of all terms and then grouping the first two terms from each side and factoring out p results in: p 2 (γ i − γ j ) (P(i j, q) − C q ) − (P(i j, r ) − C r ) + p P(i j, r )C q − P(i j, q)C r (γ i − γ j ) − (γ i C j − γ j C i ) (P(i j, q) − C q ) − (P(i j, r ) − C r ) − (γ i C j − γ j C i )(P(i j, r )C q − P(i j, q)C r ) + p 2 (C i − C j )(γ q − γ r ) + p(C i − C j ) γ r C q + γ r P(i j, r ) − γ q C r − γ q P(i j, q) + (C i − C j ) γ q C r P(i j, q) − γ r C q P(i j, r ) = 0.
Assuming P(i j, q) and P(i j, r ) are independent of p results in a quadratic equation in terms of p, Γ p, P(i j, q), P(i j, r ) = p 2 (a 1 b 1 + a 3 c 1 ) + p(a 1 b 2 − a 2 b 1 + a 3 c 2 ) + (a 3 c 3 − a 2 b 2 ) = 0, with the following notation: , c 2 = γ r (C q + P(i j, r )) − γ q (C r + P(i j, q)), a 3 = C i − C j , b 2 = P(i j, r )C q − P(i j, q)C r , c 3 = γ q C r P(i j, q) − γ r C q P(i j, r ).

Proof of Proposition 3.3.7
The first constraint of Problem P-3.4 implies z q = D − x i − x j , which in turn implies, By deriving to further orders and due to ∂ 2 f A ∂ p 2 = −2 ( p−C q ) ∂ f A ∂ p , the desired assertions are obtained.