1 Introduction

Discrete–continuous optimization is one of the main modeling approaches to address design, planning, and scheduling problems in process systems engineering (PSE) (Grossmann 2012). Raman and Grossmann (1994) present a powerful modeling paradigm that extends the work by Balas (1985) on disjunctive programming. This new paradigm, called generalized disjunctive programming (GDP), has been further developed by others in the PSE community over the years to account for additional features, such as nonlinearities and nonconvexities encountered in the problems (Grossmann and Trespalacios 2013). GDP relies on the intersection of disjunctions of algebraic constraints (equality and inequality constraints with continuous variables) to model the feasible space. Boolean variables are used as indicatr variables for each disjunct (set of algebraic constraints), enforcing the constraints in the disjunct when True. Logic constraints are also included to describe the relationships between the Boolean indicator variables via propositional logic.

GDP is a valuable modeling abstraction for optimization problems for two main reasons. Firstly, modeling systems from the basis of their underlying logical relationships aids the development and formulation of optimization models by making them easier to interpret, reducing the likelihood of modeling errors due to logical fallacies. Secondly, GDP makes available a broad array of solution methods, ranging from mixed-integer reformulations to logic-based search methods (Chen et al. 2022).

The present work extends the GDP theory to allow modeling hierarchical systems, which are commonly encountered in PSE, and more particularly in enterprise-wide optimization (EWO) (Grossmann 2012; van den Heever and Grossmann 1999), and flowsheet superstructure optimization (Türkay and Grossmann 1996a). Hierarchical systems involve multiple levels of decision making, which can be concisely modelled via nested disjunctions. However, traditional GDP does not consider such formulations. Existing GDP literature suggests reformulating nested disjunctions into equivalent single-level disjunctions (Vecchietti and Grossmann 2000). Such an approach requires introducing additional Boolean variables and logical propositions. Industrial examples of this approach in scheduling include that of Castro et al. (2014) and Castro (2017). An alternate approach is used in the work by van den Heever and Grossmann (1999), in which a direct or inside-out reformulation to MI(N)LP is performed. We formalize these two approaches and provide theoretical proofs on the tightness of their continuous relaxations. The model tightness and computational performance of the different approaches are compared. A series of examples are used to show the modeling and computational advantages obtained by explicitly modeling nested disjunctions.

The paper is organized as follows, Sect. 2 provides a background on the GDP modeling paradigm. Section 3 extends this formulation to account for hierarchical systems, and discusses the alternatives for modeling such systems. The equivalent mixed-integer programming reformulations for these alternatives are presented, along with two theorems on the tightness of the resulting models. Section 4 provides several numerical use cases for hierarchical GDPs. Section 5 presents concluding remarks.

2 Background: generalized disjunctive programming (GDP)

The classical GDP formulation is given below (GDP), where \(x\) is the set of continuous variables (bounded between \({x}^{LB}\) and \({x}^{UB}\)), \(f(x)\) is the objective function, \(r\left(x\right)\le 0\) is the set of global constraints, \({g}_{ij}\left(x\right)\le 0\) is the set of constraints applied when the indicator Boolean \({Y}_{ij}\) is True for disjunct \(j\) in disjunction \(i\). \(f(x)\), \(r(x)\), and \({g}_{ij}(x)\) are assumed to be continuous and differentiable over \(x\). \(\Omega (Y)\) defines the set of logic constraints, which are described via propositional logic on a subset of Boolean variables. These constraints describe the relations between the Boolean variables via clauses that contain with one or more of the following logic operators: AND (\(\wedge \)), OR (\(\vee \)), implication (\(\Rightarrow \)), equivalence (\(\iff \)), and negation (\(\neg \)). The set of logic constraints may also include cardinality clauses of the form choose exactly (or at least or at most) \(m\) Boolean variables from a subset of Booleans to be True (Yan and Hooker 1999). We leverage predicate logic to extend the notation used by Yan and Hooker for cardinality clauses by defining the following predicates: \({\varvec{\Xi}}\left(m,{Y}_{s} \quad \,\forall\, s\in S\right)\) enforces that exactly \(m\) of the Boolean variables \({Y}_{s}\) are True, \({\varvec{\Lambda}}\left(m, {Y}_{s} \quad \,\forall\, s\in S\right)\) enforces that at least \(m\) of the variables are True, and \({\varvec{\Gamma}}\left(m, {Y}_{s} \quad \,\forall\, s\in S\right)\) enforces that at most \(m\) are True.

$$ \begin{gathered} \min z = f\left( x \right) \hfill \\ s.t.\;r\left( x \right) \le 0 \hfill \\ \begin{array}{*{20}c} { \vee _{{j \in J_{i} }} \left[ {\begin{array}{*{20}c} {Y_{{ij}} } \\ {g_{{ij}} \left( x \right) \le 0~} \\ \end{array} } \right]} & {\forall i \in I} \\ {\Xi \left( {1,Y_{{ij}} ~\forall j \in J_{i} } \right)} & {\forall i \in I} \\ {\Omega \left( Y \right)} & {} \\ {x^{{LB}} \le x \le x^{{UB}} } & {} \\ {x \in \mathbb{R}^{n} } & {} \\ {Y_{{ij}} \in \left\{ {True,False} \right\}} & {\forall i \in I,j \in J_{i} } \\ \end{array} \hfill \\ \end{gathered} $$
(GDP)

GDP models typically include a cardinality clause to enforce that exactly 1 disjunct in each disjunction is selected, i.e., \({\varvec{\Xi}}\left(1,{Y}_{ij} \quad \,\forall\, j\in {J}_{i}\right) \quad \,\forall\, i\in I\). The GDP literature often uses the exclusive OR (XOR) operator, \(\underset{\_}{\vee }\), to define this constraint. However, such an operator is only correct for proper disjunctions (those with non-overlapping disjuncts) and poses issues in GDP when there are overlapping disjuncts (improper disjuncts). This is because XOR is an n-ary operator that returns True when an odd number of propositions in the operator are True. This can create problems when transforming the GDP into a MIP via the Hull reformulation because an odd number of disaggregated variables will be active (non-zero) for any feasible point at the intersection of an odd number of disjuncts. As a result, the projection of the disaggregated variables onto the original space will result in a value that is an odd integer multiple of the disaggregated variable values, which is incorrect and may exclude valid solutions by making them infeasible (see “Appendix A”). Thus, to avoid these issues, we use the predicate logic notation, \({\varvec{\Xi}}(1,Y)\), here instead.

To illustrate the elements of a GDP model, consider the model below (GDP-example). The projection of this model on the \({x}_{1},{x}_{2}\)-plane is given in Fig. 1, where the quadratic objective function is shown in the colored contours, the global constraints are given by the region under the black curves (one linear and the other nonlinear), and the disjunction constraint is given by the three colored rectangles. The feasible space of such a system is given by the disjoint regions in the orange, blue, and green rectangles that satisfy the global constraints.

Fig. 1
figure 1

Sample GDP graphical representation for GDP-example model.

One of the main advantages of modeling discrete–continuous problems using GDP is the collection of methods that are available for optimizing such systems. These include, (1) reformulating to mixed-integer (non)linear models (MI(N)LP) via either Big-M (Trespalacios and Grossmann 2015) or Hull reformulations (Agarwal 2015; Bernal and Grossmann 2021; Furman et al. 2020; Grossmann and Lee 2003), (2) logic-based decomposition methods such as Logic-based Outer Approximation (LOA) (Türkay and Grossmann 1996b), (3) disjunctive branch-and-bound (Lee and Grossmann 2000), (4) basic steps (Ruiz and Grossmann 2012), and (5) hybrid cutting planes (Sawaya and Grossmann 2005; Trespalacios and Grossmann 2016). The reader is referred to the above references for a detailed understanding of each of these solution methods.

3 Extended formulation for multi-level hierarchies

Decision hierarchies are present in most decision-making applications. These include for instance supply chain and enterprise-wide optimization, where different levels of decision-making exist depending on the time scales considered: planning (months/years), scheduling (hours/days), and control (seconds/minutes). According to Brunaud and Grossmann (2017), integrating different decision levels enables better coordination and communication between functional areas, which increases agility in response to disturbances and makes it possible to attain benefits for the company that are not possible with a siloed approach. Figure 2 illustrates the notion of the synergistic benefits that can be obtained by an integrated approach, rather than siloed or aggregated approaches. Accounting for the relationships between different levels of decision-making can aid in finding the true optimum, which differs from that of the aggregated model (i.e., the model obtained by summing the siloed costs). Integrated approaches to hierarchical decision-making systems have been addressed in the literature. Some examples of these integrations are the integration between design and planning (operational and expansion) (van den Heever and Grossmann 1999), planning and scheduling (Maravelias and Sung 2009), and scheduling and control (Muñoz et al. 2011; Sokoler et al. 2017). The following subsections formalize how GDP can be used to model hierarchical systems, along with theoretical proofs on the differences between the approaches.

Fig. 2
figure 2

Illustartion of the different optimas for siloed, aggregated, and integrated approaches.

3.1 Hierarchical GDP

We propose extending the GDP paradigm to include multi-level decisions by means of nested disjunctions. Although the notion of nesting disjunctions to represent hierarchical decisions is not new, the limitations in the traditional GDP notation have made it difficult to exploit the benefits of using such structures. One of the first references to nested disjunctions is found in the work by Vecchietti and Grossmann (2000), which describes the transformations required to conform to the current GDP notation. It is interesting to note that several works have relied on the nested GDP representation due to its compact representation. In one of these (Rodriguez and Vecchietti 2009), the following statement is made,

Although the expressiveness of the hierarchical decisions by means of nested disjunctions, they cannot be implemented directly. These disjunctions must be transformed into GDP form. For that purpose, the disjunctions…must be rewritten as single disjunctions, and some additional constraints must also be included in the model.


Therefore, from a model development point of view, the use of disjunction nesting is shown to add value. However, its implementation has often required breaking the explicit hierarchical structure. An exception is the work by van den Heever and Grossmann (1999), which does not transform the nested GDP into a logically equivalent single-level GDP, but rather suggests performing the Hull reformulation on the inner disjunction and then reformulating the outer disjunction. We now build upon this concept to formally extend the GDP notation for hierarchical systems that generalizes to multi-disjunct disjunctions, rather than the on/off disjunctions used by van den Heever and Grossmann (1999). We also provide theoretical proofs on the advantages of modeling system hierarchies via nested disjunctions, and highlight the computational performance gains obtained using this explicit notation.

The proposed extension to the classical GDP notation for hierarchical systems is given below for a 2-Level nested GDP (2L-GDP), where the upper-level decisions, \(Y\), enforce the constraints \(g\left(x\right)\le 0\) and the nested decisions, \(W\), which have constraints \(h\left(x\right)\le 0\). Here the cardinality clause of selecting exactly one disjunct from the upper-level decisions, \(Y\), is expressed explicitly, along with a new set of cardinality rules that enforce selecting exactly one of the lower-level decisions, \(W\), if and only if the upper-level decision has been selected, and selecting no lower-level decisions when the upper-level decision is not selected. This constraint is expressed as the conjunction of two cardinality rules: \(\left[{Y}_{ij}\Rightarrow{\varvec{\Xi}}\left(1,{W}_{ijkl} \quad \,\forall\, l\in {L}_{ijk}\right)\right]\wedge \left[{\neg Y}_{ij}\Rightarrow{\varvec{\Xi}}\left(0,{W}_{ijkl} \quad \,\forall\, l\in {L}_{ijk}\right)\right] \quad \,\forall\, i\in I,j\in {J}_{i},k\in {K}_{ij}\). In the GDP literature, this constraint has been traditionally written as \({Y}_{ij}\iff \underline{\vee}_{l\in {L}_{ijk}}{W}_{ijkl} \quad \,\forall\, i\in I,j\in {J}_{i},k\in {K}_{ij}\). However, such a logic proposition is incomplete because it would allow the following to occur: \({Y}_{ij}=False\) and \({W}_{ijkl}=True\) for more than 1 index \(l\in {L}_{ijk}\) (i.e., False \(\iff \) (True \(\underline{\vee }\) True) is valid because the exclusive OR makes the right-hand side False). If all disjunctions are proper, then this will not occur. However, since there can be a disjunction with overlapping disjuncts, the cardinality rule \({\varvec{\Gamma}}\left(1,{W}_{ijkl} \quad \,\forall\, l\in {L}_{ijk}\right) \quad \,\forall\, i\in I,j\in {J}_{i}, k\in {K}_{ij}\) would need to be added to such a system to ensure that no more than 1 literal, \({W}_{ijkl}\), is set to True. A more compact form would be to use the predicate constraint, \({\varvec{\Xi}}\left({1}_{\left\{True\right\}}\left({Y}_{ij}\right),{W}_{ijkl} \quad \,\forall\, l\in {L}_{ijk}\right)\), where \({1}_{\left\{True\right\}}\left(\cdot \right)\) is the indicator function that returns 1 when the input is True and 0 otherwise. In other words, the indicator function maps a Boolean variable to its binary counterpart. For simplicity, we make a slight abuse of notation by dropping the indicator function and using the expression \({\varvec{\Xi}}\left({Y}_{ij},{W}_{ijkl} \quad \,\forall\, l\in {L}_{ijk}\right)\) instead.

$$\begin {aligned}\min z = f\left( x \right) \\ s.t.\;r\left( x \right) \le 0 \end {aligned}$$
(2L-GDP)
$$ \begin{gathered} \begin{array}{*{20}c} { \mathop \bigvee \limits_{{j \in J_{i} }} \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {Y_{{ij}} } \\ {g_{{ij}} \left( x \right) \le 0~} \\ \end{array} } \\ { \mathop \bigvee \limits_{{l \in L_{{ijk}} }} \left[ {\begin{array}{*{20}c} {W_{{ijkl}} } \\ {h_{{ijkl}} \left( x \right) \le 0} \\ \end{array} } \right]~~\forall k \in K_{{ij}} } \\ \end{array} } \right]} & {\forall i \in I} \\ {\Xi \left( {1,Y_{{ij}} ~\forall j \in J_{i} } \right)} & {\forall i \in I} \\ {\Xi \left( {Y_{{ij}} ,W_{{ijkl}} ~\forall l \in L_{{ijk}} } \right)} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} } \\ {\Omega \left( {Y,W} \right)} & {} \\ {x^{{LB}} \le x \le x^{{UB}} } & {} \\ {x \in \mathbb{R}^{n} } & {} \\ {Y_{{ij}} \in \left\{ {True,False} \right\}} & {\forall i \in I,j \in J_{i} } \\ {W_{{ijkl}} \in \left\{ {True,False} \right\}} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} ,l \in L_{{ijk}} } \\ \end{array} \hfill \\ \end{gathered} $$

This model can be generalized to a multi-level nested GDP (ML-GDP) with \(n\) levels, where the superscript on the Boolean variables, constraints, and sets indicates the level \(k\in \{1,\dots ,n\}\) of the hierarchy that these belong to.

It should be noted that nested disjunctions should generally not include negations of Boolean variables (see “Appendix B”).

3.2 Equivalent single-level GDP

Previous references to GDP with nested disjunctions in literature have proposed transforming the 2L-GDP model into the equivalent single-level GDP (2E-GDP) given below (Grossmann and Trespalacios 2013; Vecchietti and Grossmann 2000). Here, the nested disjunction is extracted and a dummy or “slack” disjunct is added to preserve feasibility. Thus, if none of the nested disjuncts is selected, the slack disjunct is selected, which contains the entire feasible set for \(x\). The exclusive cardinality rule on the inner Boolean variables, \(W\), is also augmented to include the slack Boolean variable, \({W}_{ijk0}\). This slack variable is, however, not included in the linking logic constraint for the upper and lower-level decisions. This ensures that the nested decisions are only selected if their master Boolean is True. This method for transforming a nested disjunction can also be applied to the multi-level system ML-GDP.

$$\begin {aligned}\min z = f\left( x \right) \\ s.t.\;r\left( x \right) \le 0 \end {aligned}$$
(2E-GDP)
$$\begin{gathered} \begin{array}{*{20}c} { \mathop \bigvee \limits_{{j \in J_{i} }} \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {Y_{{ij}} } \\ {g_{{ij}} \left( x \right) \le 0~} \\ \end{array} } \\ \end{array} } \right]} & {\forall i \in I} \\ {\left( { \mathop \bigvee \limits_{{l \in L_{{ijk}} }} \left[ {\begin{array}{*{20}c} {W_{{ijkl}} } \\ {h_{{ijkl}} \left( x \right) \le 0} \\ \end{array} } \right]} \right) \bigvee \left[ {\begin{array}{*{20}c} {W_{{ijk0}} } \\ {x^{{LB}} \le x \le x^{{UB}} } \\ \end{array} } \right]} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} } \\ {\Xi \left( {1,Y_{{ij}} ~\forall j \in J_{i} } \right)} & {\forall i \in I} \\ {\Xi \left( {1,W_{{ijkl}} ~\forall l \in L_{{ijk}} \cup \left\{ 0 \right\}} \right)} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} } \\ {\Xi \left( {Y_{{ij}} ,W_{{ijkl}} ~\forall l \in L_{{ijk}} } \right)} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} } \\ {\Omega \left( {Y,W} \right)} & {} \\ {x^{{LB}} \le x \le x^{{UB}} } & {} \\ {x \in \mathbb{R}^{n} } & {} \\ {Y_{{ij}} \in \left\{ {True,False} \right\}} & {\forall i \in I,j \in J_{i} } \\ {W_{{ijkl}} \in \left\{ {True,False} \right\}} & {\forall i \in I,j \in J_{i} ,k \in K_{{ij}} ,l \in L_{{ijk}} } \\ \end{array} \hfill \\ \end{gathered} $$

Although the above formulation, allows modeling hierarchical systems in the standard GDP notation, it has two major drawbacks: (1) the explicit hierarchical structure is lost, and (2) although the Equivalent Single-Level GDP model is logically equivalent to the nested GDP model, it requires introducing additional disjuncts and Boolean variables. Introducing “slack” disjuncts and “slack” Boolean variables results in models whose continuous relaxations are less tight, as described in the next section.

3.3 Tightness of continuous relaxations

The following two theorems and their associated proofs establish the advantages of modeling multi-level decisions problems via nested GDP, rather than the Equivalent Single-Level GDP approach. The advantages are shown by discussing the tightness of the continuous relaxations of both the Hull reformulation (HR) and Big-M reformulation (BM) of these two GDP models.

Theorem 1

Let rML-GDP-HR denote the continuous relaxation of the mixed-integer program (MIP) obtained from a Multi-Level nested GDP via the Hull reformulation, and let rME-GDP-HR denote the continuous relaxation of the MIP obtained from its respective Equivalent Single-Level GDP representation via the Hull reformulation. The feasible space of the former is contained within the feasible space of the latter, namely, rML-GDP-HR \(\subseteq \) rME-GDP-HR.

Proof

Without loss of generality, the above theorem is proved by establishing that the Hull reformulation of the 2-Level nested GDP model (r2L-GDP-HR) is contained in the Hull reformulation of its equivalent single-Level GDP representation (r2E-GDP-HR):

$$ r2L\text{-}GDP\text{-}HR \subseteq r2E\text{-}GDP\text{-}HR $$

The Hull reformulation for 2L-GDP is given below, where the continuous variable \(x\) is disaggregated in each disjunct (\(x\) is disaggregated into \({u}_{ij}\) for each upper-level disjunct, and \({u}_{ij}\) is disaggregated into \({v}_{ijkl}\) for each lower-level disjunct) and the Boolean variables are replaced by their corresponding binary variable (\(Y\) becomes \(y\), and \(W\) becomes \(w\)). \(A\) and \(B\) are matrices of scalars, and \(c\) is a vector of scalars. These are used to map the logic constraints into their algebraic counterparts obtained after converting the logic propositions into conjunctive normal form (CNF) and transforming each clause into its equivalent algebraic constraint (Williams 1985). Note that the disaggregated variables are bounded between \(\mathrm{min}\left(0,{x}^{LB}\right)\) and \(\mathrm{max}\left(0,{x}^{UB}\right)\) instead of the traditional bounds of \(0\) and \({x}^{UB}\) because we do not assume that \(x\) is nonnegative. As a result, the min and max operators in these bounds are required to guarantee that the domain of the disaggregated variables contains the origin (0). This is necessary to ensure that the disaggregation constraints remain feasible when the disaggregated variables are forced to 0 for the disjuncts that are not selected.

The Hull reformulation for 2E-GDP is given below, where \(x\) is disaggregated into \({u}_{ij}\) for the upper-level disjunctions, and is also disaggregated into \({v}_{ijkl}\) for the lower-level disjuncts, which are extracted when transforming the model into an Equivalent Single-Level GDP.

The difference between 2L-GDP-HR and 2E-GDP-HR is in the highlighted constraints in the variable disaggregation and cardinality rules sections. The proof for the Hull reformulation case is given by applying Fourier–Motzkin elimination (Dantzig 1972) to eliminate the slack Binary variable (\({w}_{ijk0}\)) and its corresponding disaggregated variable (\({v}_{ijk0}\)) from 2E-GDP-HR. We first combine the last two cardinality rules in 2E-GDP-HR to obtain (1).

$${w}_{ijk0}=1-{y}_{ij}$$
(1)

Equating the two variable aggregation constraints in 2E-GDP-HR and solving for \({v}_{ijk0}\) gives (2).

$${v}_{ijk0}=\sum\limits_{j\in {J}_{i}}{u}_{ij}-\sum\limits_{l\in {L}_{ijk}}{v}_{ijkl}$$
(2)

Substituting (1) and (2) into the bounding constraint for \({v}_{ijk0}\) gives (3), which can be rearranged into (4).

$${x}^{LB}\cdot \left(1-{y}_{ij}\right)\le \mathop \sum \limits_{j\in {J}_{i}}{u}_{ij}- \mathop \sum \limits _{l\in {L}_{ijk}}{v}_{ijkl}\le {x}^{UB}\cdot \left(1-{y}_{ij}\right)$$
(3)
$$\begin{aligned}{u}_{ij}+\mathop \sum \limits _{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}-{x}^{UB}\cdot \left(1-{y}_{ij}\right)&\le \mathop \sum \limits _{l\in {L}_{ijk}}{v}_{ijkl}\le {u}_{ij}\\ &+\mathop \sum \limits _{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}-{x}^{LB}\cdot \left(1-{y}_{ij}\right)\end{aligned}$$
(4)

Summing the bounding constraint for \({x}_{ij}\) over \({j}^{\prime}\in {J}_{i}\) for \({j}^{\prime}\ne j\), results in (5). Using the cardinality rule \({\sum }_{j\in {J}_{i}}{y}_{ij}=1\), (5) can be written as given in (6), which has two parts, (6a) and (6b). Substituting these into (4) proves that (4) is a relaxation of the disaggregation constraint in 2L-GDP-HR (\({\sum }_{l\in {L}_{ijk}}{v}_{ijkl}={u}_{ij}\), which can be written as \({u}_{ij}\le {\sum }_{l\in {L}_{ijk}}{v}_{ijkl}\le {u}_{ij}\)).

$$\mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{x}^{LB}\cdot {y}_{i{j}^{\prime}}\le \mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}\le \mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{x}^{UB}\cdot {y}_{i{j}^{\prime}}$$
(5)
$${x}^{LB}\cdot \left(1-{y}_{ij}\right)\le \mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}\le {x}^{UB}\cdot \left(1-{y}_{ij}\right)$$
(6)
$$\mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}-{x}^{UB}\cdot \left(1-{y}_{ij}\right)\le 0$$
(6a)
$$\mathop \sum \limits_{{j}^{\prime}\in {J}_{i}:{j}^{\prime}\ne j}{u}_{i{j}^{\prime}}-{x}^{LB}\cdot \left(1-{y}_{ij}\right)\ge 0$$
(6b)

It should also be noted that the cardinality rule on the extracted lower-level decisions in 2E-GDP-HR (\({w}_{ijk0}+{\sum }_{l\in {L}_{ijk}}{w}_{ijkl}=1\)) is redundant with respect to the other two cardinality rules. This can be shown by noting that \({w}_{ijk0}\) acts like a slack variable, which allows writing the mentioned cardinality rule as \({\sum }_{l\in {L}_{ijk}}{w}_{ijkl}\le 1\). This expression is contained in the first two cardinality rules since \({y}_{ij}\le 1\) and \({\sum }_{l\in {L}_{ijk}}{w}_{ijkl}={y}_{ij}\). Therefore, the Hull reformulation of the Equivalent Single-Level GDP produces constraints with continuous relaxations that are weaker than those resulting from the Hull reformulation of the nested GDP, proving that 2L-GDP-HR \(\subseteq \) 2E-GDP-HR. QED

Theorem 2.

Let rML-GDP-BM denote the continuous relaxation of the mixed-integer program (MIP) obtained from a Multi-Level nested GDP via the Big-M reformulation, and let rME-GDP-BM denote the continuous relaxation of the MIP obtained from its respective Equivalent Single-Level GDP representation via the Big-M reformulation. The feasible space of the former is contained within the feasible space of the latter, namely, rML-GDP-BM \(\subseteq \) rME-GDP-BM, if tight values for the M parameters are used.

Proof.

Without loss of generality, the above theorem is proved by establishing that the Big-M reformulation of the 2-Level nested GDP model (r2L-GDP-BM) is contained in the Big-M reformulation of its Equivalent Single-Level GDP representation (r2E-GDP-BM), when tight M values are used:

$$r2L\text{-}GDP\text{-}BM \subseteq r2E\text{-}GDP\text{-}BM $$

The Big-M reformulation for the nested GDP model is given in 2L-GDP-BM, where \({M}_{ij}\) is the Big-M value for the constraints in the \({j}^{th}\) disjunct in disjunction \(i\), \({M}_{ijkl}^{\prime}\) is the Big-M value associated with the upper-level decision on the nested constraints, and \({m}_{ijkl}^{\prime}\) is the Big-M value associated with the lower-level decision on the nested constraints. The Big-M reformulation for the equivalent single-level GDP is given in 2E-GDP-BM, where \({M}_{ij}\) is the same as in 2L-GDP-BM, and \({m}_{ijkl}\) is the Big-M value associated with the extracted lower-level decisions.

Finding the tightest Big-M values requires solving multiple optimization problems to maximize the value of each constraint function over the complete model’s feasible region, or over the corresponding feasible region of the disjunction (Grossmann and Trespalacios 2013). For the proof we calculate tight Big-M values using only the global constraints or upper-level constraints in the case of the nested constraints. The following mathematical optimization problems are solved to obtain tight \(M\) values: (7) for \({M}_{ij}\), (8a) for \({m}_{ijkl}^{\prime}\), (8b) for \({M}_{ijkl}^{\prime}\), and (9) for \({m}_{ijkl}\). It should be noted that \({m}_{ijkl}^{\prime}\) accounts for the upper-level constraints \({g}_{ij}\left(x\right)\le 0\), meaning it is localized to the parent disjunct that it belongs to. \({M}_{ijkl}^{\prime}\) subtracts \({m}_{ijkl}^{\prime}\) from the traditional Big-M value to ensure that when both upper and lower-level decisions are not selected (\({y}_{ij}=0\) and \({w}_{ijkl}=0\)), the resulting Big-M value is equivalent to the global Big-M value for that constraint.

$${M}_{ij}=\mathrm{max}\left\{{g}_{ij}\left(x\right) | r\left(x\right)\le 0, {x}^{LB}\le x\le {x}^{UB}, x\in {\mathbb{R}}^{n}\right\}$$
(7)
$${m}_{ijkl}^{\prime}=\mathrm{max}\left\{{h}_{ijkl}\left(x\right) | r\left(x\right)\le 0,{g}_{ij}\left(x\right)\le 0,{x}^{LB}\le x\le {x}^{UB},x\in {\mathbb{R}}^{n}\right\}$$
(8a)
$${M}_{ijkl}^{\prime}=\mathrm{max}\left\{{h}_{ijkl}\left(x\right) | r\left(x\right)\le 0, {x}^{LB}\le x\le {x}^{UB}, x\in {\mathbb{R}}^{n}\right\}-{m}_{ijkl}^{\prime}$$
(8b)
$${m}_{ijkl}=\mathrm{max}\left\{{h}_{ijkl}\left(x\right) | r\left(x\right)\le 0, {x}^{LB}\le x\le {x}^{UB}, x\in {\mathbb{R}}^{n}\right\}$$
(9)

The proof lies in establishing that the feasible space of 2L-GDP-BM is contained in 2E-GDP-BM. The difference between these two models is shown in the highlighted constraints above. It was previously shown that the cardinality rule \({w}_{ijk0}+{\sum }_{l\in {L}_{ijk}}{w}_{ijkl}=1\) is redundant (see Theorem 1). Thus, the proof is given by establishing that the right-hand-sides of the highlighted Big-M constraints satisfy (10), meaning that the Big-M constraint from 2L-GDP-BM is contained in the Big-M constraint from 2E-GDP-BM. Substituting (9) in (8b), results in (11). Substituting (11) in (10) and simplifying the resulting expression produces (12). From the cardinality constraint \({\sum }_{l\in {L}_{ijk}}{w}_{ijkl}={y}_{ij}\), it is clear that \({w}_{ijkl}\le {y}_{ij}\), meaning that the expressions in parenthesis in (12) can be dropped without changing the sign on the inequality. Thus, \({m}_{ijkl}^{\prime}\le {m}_{ijkl}\), which is true considering that (9) is a relaxation of (8a). Therefore, 2L-GDP-BM \(\subseteq \) 2E-GDP-BM.

$${m}_{ijkl}^{\prime}\cdot \left(1-{w}_{ijkl}\right)+{M}_{ijkl}^{\prime}\cdot \left(1-{y}_{ij}\right)\le {m}_{ijkl}\cdot \left(1-{w}_{ijkl}\right)$$
(10)
$${M}_{ijkl}^{\prime}={m}_{ijkl}-{m}_{ijkl}^{\prime}$$
(11)
$${m}_{ijkl}^{\prime}\cdot \left({y}_{ij}-{w}_{ijkl}\right)\le {m}_{ijkl}\cdot \left({y}_{ij}-{w}_{ijkl}\right)$$
(12)

QED

4 Examples

Each of the examples in this section are implemented in the Julia programming language (version 1.9.0) (Bezanson et al. 2017) using various packages within the ecosystem. These include JuMP (version 1.11.0) (Dunning et al. 2017) for modeling mathematical programs, DisjunctiveProgramming (version 0.3.6) (Perez et al. 2023) for reformulating GDPs (both nested and single-level) into MIPs, and Polyhedra (version 0.7.6) (Legat et al. 2021) for projecting mathematical programming models onto 2D space (see Sect. 4.1). For the numerical examples (Sects. 4.2 and 4.3), the reformulated MI(N)LP models are solved on an Ubuntu Server with 82 GB of RAM and an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz processor. CPLEX (version 22.1.1) is used as the MILP solver and BARON (version 23.1.5) as the MINLP solver.

4.1 Illustrative example

Consider the nested GDP constraint system given in (13), which can be expressed as the Equivalent Single-Level GDP in (14), where \({W}_{3}\) is the slack Boolean variable associated with the dummy disjunct. Each of these models is reformulated into a MIP using the Big-M reformulation, with both a loose (large) M value and a tight M value, and the Hull reformulation. Their continuous relaxations are then projected onto the \({x}_{1},{x}_{2}\) plane in Fig. 3.

Fig. 3
figure 3

Projections of the continuous relaxations of (13) and (14) onto the \({x}_{1},{x}_{2}\) plane. Three reformulations are shown (Big-M = Big-M Reformulation, Tight-M = Big-M Reformulation with tight M values, Hull = Hull Reformulation). The 2-Level nested GDP given in (13) is indicated with nested. The Equivalent Single-Level GDP given in (14) is indicated with equivalent. Projection areas, relative to the Big-M case are indicated in %.

$$\left[\begin{array}{c}{Y}_{1}\\ \begin{array}{c}1\le {x}_{1}\le 3\\ 4\le {x}_{2}\le 6\end{array}\\ \left[\begin{array}{c}{W}_{1}\\ \begin{array}{c}1\le {x}_{1}\le 2\\ 5\le {x}_{2}\le 6\end{array}\end{array}\right]\vee \left[\begin{array}{c}{W}_{2}\\ \begin{array}{c}2\le {x}_{1}\le 3\\ 4\le {x}_{2}\le 5\end{array}\end{array}\right]\end{array}\right]\vee \left[\begin{array}{c}{Y}_{2}\\ \begin{array}{c}8\le {x}_{1}\le 9\\ 1\le {x}_{2}\le 2\end{array}\end{array}\right]$$
(13.a)
$${\varvec{\Xi}}\left(1,\left\{{Y}_{1},{Y}_{2}\right\}\right)$$
(13.b)
$${\varvec{\Xi}}\left({Y}_{1},\left\{{W}_{1},{W}_{2}\right\}\right)$$
(13.c)
$$\left[\begin{array}{c}{Y}_{1}\\ \begin{array}{c}1\le {x}_{1}\le 3\\ 4\le {x}_{2}\le 6\end{array}\end{array}\right]\vee \left[\begin{array}{c}{Y}_{2}\\ \begin{array}{c}8\le {x}_{1}\le 9\\ 1\le {x}_{2}\le 2\end{array}\end{array}\right]$$
(14.a)
$$\left[\begin{array}{c}{W}_{1}\\ \begin{array}{c}1\le {x}_{1}\le 2\\ 5\le {x}_{2}\le 6\end{array}\end{array}\right]\vee \left[\begin{array}{c}{W}_{2}\\ \begin{array}{c}2\le {x}_{1}\le 3\\ 4\le {x}_{2}\le 5\end{array}\end{array}\right]\vee \left[\begin{array}{c}{W}_{3}\\ \begin{array}{c}1\le {x}_{1}\le 9\\ 1\le {x}_{2}\le 6\end{array}\end{array}\right]$$
(14.b)
$${\varvec{\Xi}}\left(1,\left\{{Y}_{1},{Y}_{2}\right\}\right)$$
(14.c)
$${\varvec{\Xi}}\left(1,\left\{{W}_{1},{W}_{2},{W}_{3}\right\}\right)$$
(14.d)
$$ {\varvec{\Xi}}\left({Y}_{1},\left\{{W}_{1},{W}_{2}\right\}\right)$$
(14.e)

Explicitly preserving the hierarchical relationship in the nested GDP representation reduces the feasible region of the continuous relaxation more than when the equivalent single-Level GDP representation is used. This is observed in both the tight-M (Big-M reformulation with a tight M) and Hull reformulation cases. Furthermore, in this example the tight-M reformulation of the nested GDP model produces the same relaxation as the Hull reformulation of the equivalent single-level GDP model with only a fraction of the model size (see Table 1). It should also be noted that the convex hull of the system is obtained when either the hull reformulation is applied to the nested GDP or when it is applied to the flattened GDP. As a result, the continuous relaxation of either formulation will yield the optimum.

Table 1 Model sizes and projection areas for Illustrative Example

4.2 Example 1: linear model

Consider the superstructure optimization problem with technology selection and scheduling for a plant that is to produce and sell material D (see Fig. 4). Material D can be produced from material C (reaction: C → D), which can be purchased from a third party or produced from material B (reaction: B → C), which can in turn be purchased or produced from material A (reaction: A → B). The plant has two types of multipurpose reactors, each with a backup unit, that can be used for the material transformation steps (see Fig. 5). Each of these has a maximum installed capacity of 100 kg. Up to one tank for each material in the system can be installed for storage with a maximum installed capacity of 300 kg. There are two candidate chemical processes to perform each material transformation step, giving a total of six processes in the process superstructure. There are two potential technologies (catalysts) that can be used in each process, each with a unique cost and yield, giving a total of 12 candidate process-catalysts combinations in the system. The plant process and equipment superstructures are given in Figs. 4 and 5, respectively. The former illustrates the candidate processes in the superstructure in the state-task network representation (Kondili et al. 1993). The latter depicts the equipment options (reactor type and units, and tanks) in the superstructure.

Fig. 4
figure 4

Process superstructure for Example 1 with 4 materials, 6 processes, 4 tanks, and 16 streams.

Fig. 5
figure 5

Equipment superstructure (process flow diagram) for Example 1 with 4 tanks and 2 reactor types, each with 2 identical units.

The objective of the optimization problem is to maximize system profit over a 30-day schedule by making the following decisions:

  • Which material storage tanks to install.

  • How many shared reactors to install.

  • Which processes to install for each material transformation step.

  • Which technologies (catalysts) to use in each of the selected processes.

  • Which reactor type to use in each of the selected processes.

  • How many reactors to operate in each time period.

  • How much to produce in each batch of material.

  • How much material to purchase for A, B, and C in each time period.

The hierarchy of these decisions is indicated by the bullet indentation above. Thus, the technology and reactor type selections are second-level decisions, and the operating schedule and batch sizes are third-level decisions. For simplicity, any changeover or setup times are not considered.

Model: The model for this system consists of the following linear constraints. Resource balances are enforced around each resource \(k\) at timepoint \(t\) with the global constraints in (15) and (16). The level of material at each tank, \({L}_{k,t}\), is updated based on the material flowing in and out of the tank (material balance). The availability of each reactor, \({R}_{k,t}\), is updated based on the reactor usage, \(\Delta {R}_{i,k,t}\). A reactor unit is locked (unavailable) when it begins a processing task \(i\) at time \(t\). At time \(t+{\tau }_{i}\), the processing task ends (\({\tau }_{i}\) is the duration), and the reactor unit is released (becomes available). The values used for the task durations, \({\tau }_{i}\), are \({\tau }_{i}=5 \,\forall\, i\in \left\{\mathrm{1,4},\mathrm{5,6}\right\}\), \({\tau }_{2}=3\), and \({\tau }_{3}=4\) (days). For greater detail on resource balances, the reader is referenced to the review paper on the resource-task network by Perez et al. (2022).

$$ \overbrace {{L_{{k,t}} }}^{{tank\,~level}} = L_{{k,t - 1}} + \overbrace {{\sum\limits_{{s \in S_{k}^{{in}} }} {F_{{s,t}} } }}^{{inflow}} - \overbrace {{\sum\limits_{{s \in S_{k}^{{out}} }} {F_{{s,t}} } }}^{{outflow}}\quad \,\forall\, k \in K^{{tank}} ,t \in T $$
(15)
$$ \overbrace {{R_{{k,t}} }}^{\begin{subarray}{l} reactor \\ availability \end{subarray} } = R_{{k,t - 1}} + \sum\limits_{{i \in I}} {\left[ {\overbrace {{{{\Delta }}R_{{i,k,t - \tau _{i} }} }}^{\begin{subarray}{l} reactors \\ released \end{subarray} } - \overbrace {{{{\Delta }}R_{{i,k,t}} }}^{\begin{subarray}{l} reactors \\ locked \end{subarray} }} \right]} $$
(16)

The decision to install a resource (tank or reactor) is governed by the disjunctions in (17) and (18), where the decision is to determine how many units \(u\) to install. In this example, \({U}_{k}=\{\mathrm{0,1},2\}\) for each reactor type (at most 2 identical units can be installed for each reactor type \(k\)), and \({U}_{k}=\{\mathrm{0,1}\}\) for each tank (at most 1 tank can be installed for each material). The installation cost, \({CI}_{k}\), is calculated as the sum of a fixed charge, \({\alpha }_{k}\), and a variable cost coefficient, \({\beta }_{k}\), times the total resource capacity. If no units are installed (\(u=0\)), the installation cost and resource capacity, \({Q}_{k}\), drop to zero. (17) and (18) also set the initial condition for the resource availability, \({L}_{k,0}\) and \({R}_{k,0}\): if installed, tanks are full, and all reactor units are available, respectively. (17) also tracks the slack on the tank level at the final timepoint \(\left|T\right|\)\({\widehat{L}}_{k}\), which refers to the amount below the full tank capacity, and is penalized in the objective function to reduce the likelihood of depleting the inventory at the end of the scheduling horizon (see (41)). These constraints ensure that the schedule obtained is a feasible schedule for normal operation with monthly cycles. For startup operations the optimal schedule can be obtained by fixing the design decisions and rerunning the model with the initial tank levels set to zero. The cardinality constraint (19) ensures that exactly one of the disjuncts is selected. The values for the cost coefficients are given in Table 2. Since the plant lifetime is greater than the scheduling horizon, resource installation costs coefficients have been scaled to the appropriate order of magnitude. Installation costs for pipelines between tanks and reactors are assumed to be negligible.

$$ \mathop \bigvee \limits_{{u \in U_{k} \backslash \left\{ 0 \right\}}} \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{{k,u}} } \\ {\begin{array}{*{20}c} {CI_{k} = \alpha _{k} + \beta _{k} \cdot \overbrace {{u \cdot Q_{k} }}^{{\begin{array}{*{20}c} {installed} \\ {capacity} \\ \end{array} }}} \\ {~~\begin{array}{*{20}c} {L_{{k,0}} = u \cdot Q_{k} } \\ {L_{{k,\left| T \right|}} + \hat{L}_{k} = u \cdot Q_{k} } \\ \end{array} } \\ \end{array} ~} \\ \end{array} } \\ \end{array} } \right] \bigvee \left[ {\begin{array}{*{20}c} {X_{{k,0}} } \\ {\begin{array}{*{20}c} {CI_{k} = 0} \\ {Q_{k} = 0} \\ {\begin{array}{*{20}c} {L_{{k,t}} = 0~\forall t \in \left\{ 0 \right\} \cup T} \\ {\hat{L}_{k} = 0} \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\quad \forall k \in K^{{\tan k}} $$
(17)
$$ \mathop \bigvee \limits_{{u \in U_{k} \backslash \left\{ 0 \right\}}} \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{{k,u}} } \\ {\begin{array}{*{20}c} {CI_{k} = \alpha _{k} + \beta _{k} \cdot \overbrace {{u \cdot Q_{k} }}^{{\begin{array}{*{20}c} {installed} \\ {capacity} \\ \end{array} }}} \\ {R_{{k,0}} = u} \\ \end{array} ~} \\ \end{array} } \\ \end{array} } \right] \bigvee \left[ {\begin{array}{*{20}c} {X_{{k,0}} } \\ {\begin{array}{*{20}c} {CI_{k} = 0} \\ {Q_{k} = 0} \\ {R_{{k,0}} = 0~~\forall t \in \left\{ 0 \right\} \cup T} \\ \end{array} } \\ \end{array} } \right] $$
(18)
$${\varvec{\Xi}}\left(1,{X}_{k,u} \,\forall\, u\in {U}_{k}\right) \,\forall\, k\in K$$
(19)
Table 2 Fixed and variable cost coefficients for the installation cost of each resource

The multi-level disjunction in (20) represents the decision to install process \(i\) or not. When installed, the total batch size, \({B}_{i,t}\), is equal to the flow entering the process at time \(t\). There are two nested disjunctions if a process is installed. The first of these relates to which reactor type \(k\) is assigned to the process, \({W}_{i,k}\). The second one pertains to which technology (catalyst) is used for that particular process, \({\widehat{W}}_{i,j}\). Once a reactor type is assigned, the per unit batch size, \({\widehat{B}}_{i,t}\), is bounded by the installed capacity of each unit, \({Q}_{k}\), and the operating cost, \(C{O}_{i,t}\), is proportional to the total batch size with a cost coefficient \({\gamma }_{i,k}\) (given in Table 3). The nested technology selection disjunction specifies the amount of material leaving the process when the batch is completed. This is governed by the yield, \(\nu \), which is specific to the technology \(j\) (given in Table 4). There is then a third-level set of disjunctions inside the reactor type assignment disjunction, which determines the number of units, \(u\), that are used for a batch at time, \(t\), \({N}_{i,k,t,u}\). The number of units selected indicates the number of units that are locked at time \(t\) and is also used to determine the total batch size from the per unit batch size. Note that for this system, it is assumed that if multiple units are used, their loads are equally distributed. Finally, when a process is not installed (\(\neg {Y}_{i}\)), all pertinent variables are set to zero, and the reactor capacity is only bounded by the maximum allowed capacity. The cardinality rules in (21–23) are the linking constraints between the different levels of this multi-level disjunction.

(20)
$${\varvec{\Xi}}\left({Y}_{i},{W}_{i,k} \,\forall\, k\in {K}^{react}\right) \,\forall\, i\in I$$
(21)
$${\varvec{\Xi}}\left({W}_{i,k},{N}_{i,k,t,u} \,\forall\, u\in {U}_{k}\right) \,\forall\, i\in I,k\in {K}^{react},t\in T$$
(22)
$${\varvec{\Xi}}\left({Y}_{i},{\widehat{W}}_{i,j} \,\forall\, j\in J\right) \,\forall\, i\in I$$
(23)
Table 3 Operating cost parameter, \({\gamma }_{i,k} (\$/\text{kg})\), for each process \(i\) and reactor type \(k\) combination
Table 4 Production yield parameter, \({\nu }_{i,j}\), for each process \(i\) and technology \(j\) combination

An additional logic proposition must be included to ensure that if a process \(i\) is triggered on reactor type \(k\) at time \(t\) with \(u\) units (\({N}_{i,k,t,u}=True\)), the reactor type \(k\) must have been installed with at least \(u\) units (\(\exists {u}^{\prime}\in {U}_{k}:{u}^{\prime}\ge u,{X}_{k,{u}^{\prime}}=True\)). For example, if \({N}_{i,k,t,1}=True\), then either \({X}_{k,1}=True\) or \({X}_{k,2}=True\) (one or two units must have been installed when the plant was built). This condition is enforced with the at least predicate in (24), which is equivalent to the propositional logic constraint \({N}_{i,k,t,u}\Rightarrow {\bigvee }_{{u}^{\prime}\in {U}_{k}:{u}^{\prime}\ge u}{X}_{k,{u}^{\prime}}\).

$${\varvec{\Lambda}}\left({N}_{i,k,t,u},{X}_{k,{u}^{\prime}}\,\,\forall\,\,{u}^{\prime}\in {U}_{k}:{u}^{\prime}\ge u\right) \,\forall\, i\in I,k\in {K}^{react},t\in T,u\in {U}_{k}\setminus \left\{0\right\}$$
(24)

The variable bounds and domains are given in (25)-(27) and (29)-(40). The upper bound resource capacities are, \({Q}_{k}^{UB}=300kg \,\forall\, k\in {K}^{tank}\) and \({Q}_{k}^{UB}=100kg \,\forall\, k\in {K}^{react}\). The term \(\left|{U}_{k}\right|-1\) represents the maximum number of units available to install since we consider the option of not installing a tank or reactor \(k\). The initialization constraint in (28) is used to ensure that there is no flow leaving a reactor in the first \({\tau }_{i}\) periods since it is assumed that all reactors are idle at the beginning of the scheduling horizon. Thus, if production starts at \(t=1\), the first batch of product is produced at \(t={\tau }_{i}+1\).

$$0\le {B}_{i,t}\le {\sum }_{k\in {K}^{react}}\left(\left|{U}_{k}\right|-1\right)\cdot {Q}_{k}^{UB} \,\forall\, i\in I,t\in T$$
(25)
$$0\le {\widehat{B}}_{i,t}\le \mathrm{max}\left({Q}_{k}^{UB} \,\forall\, k\in {K}^{react}\right) \,\forall\, i\in I,t\in T$$
(26)
$$0\le {F}_{s,t}\le {F}_{s}^{UB} \,\forall\, s\in S,t\in T$$
(27)
$${F}_{s,t}=0 \,\forall\, i\in I,s\in {S}_{i}^{out},t\in \left\{1,\dots ,{\tau }_{i}\right\}$$
(28)
$$0\le {CI}_{k}\le {\alpha }_{k}+{\beta }_{k}\cdot \left(\left|{U}_{k}\right|-1\right)\cdot {Q}_{k}^{UB} \,\forall\, k\in K$$
(29)
$$0\le {L}_{k,t}\le {\left(\left|{U}_{k}\right|-1\right)\cdot Q}_{k}^{UB} \,\forall\, k\in {K}^{tank},t\in T$$
(30)
$$0\le {\widehat{L}}_{k}\le {\left(\left|{U}_{k}\right|-1\right)\cdot Q}_{k}^{UB} \,\forall\, k\in {K}^{tank}$$
(31)
$$0\le {CO}_{i,t}\le {\sum }_{k\in {K}^{react}}{\gamma }_{i,k}\cdot \left(\left|{U}_{k}\right|-1\right)\cdot {Q}_{k}^{UB} \,\forall\, i\in I,t\in T$$
(32)
$$0\le {Q}_{k}\le {Q}_{k}^{UB} \,\forall\, k\in {K}^{react}$$
(33)
$$0\le {R}_{k,t}\le \left|{U}_{k}\right|-1 \,\forall\, k\in {K}^{react},t\in T$$
(34)
$$0\le\Delta {R}_{i,k,t}\le \left|{U}_{k}\right|-1 \,\forall\, i\in I,k\in {K}^{react},t\in T$$
(35)
$${N}_{i,k,t,u}\in \left\{True,False\right\} \,\forall\, i\in I,k\in {K}^{react},t\in T,u\in {U}_{k}$$
(36)
$${W}_{i,k}\in \left\{True,False\right\} \,\forall\, i\in I,k\in {K}^{react}$$
(37)
$${\widehat{W}}_{i,j}\in \left\{True,False\right\} \,\forall\, i\in I,j\in J$$
(38)
$${X}_{k,u}\in \left\{True,False\right\}\,\forall\, k\in K,u\in {U}_{k}$$
(39)
$${Y}_{i}\in \left\{True,False\right\}\,\forall\, i\in I$$
(40)

The objective of this optimization problem is to maximize profit, as given by (41), where \({p}_{s}\) is the price/cost of each external flow \(s\in {S}^{ext}\) (\({p}_{13}=-\$1/kg A\), \({p}_{14}=-\$7/kg B\), \({p}_{15}=-\$8/kg C\), and \({p}_{16}=\$10/kg D\)). The tank level slacks are penalized with a penalty coefficient equal to the absolute value of the material price.

$$ \max \sum\limits_{{t \in T}} {\left[ {\overbrace {{\sum\limits_{{s \in S^{{ext}} }} {p_{s} \cdot F_{{s,t}} } }}^{\begin{subarray}{l} material \\ sales/purchases \end{subarray} } - \overbrace {{\sum\limits_{{i \in I}} {CO_{{i,t}} } }}^{\begin{subarray}{l} operating \\ costs \end{subarray} }} \right]} - \overbrace {{\sum\limits_{{k \in K}} {CI_{k} } }}^{\begin{subarray}{l} installation \\ costs \end{subarray} } - \overbrace {{\sum\limits_{{k \in K^{{tank}} }} {|p_{k} | \cdot \hat{L}_{k} } }}^{\begin{subarray}{l} tank\,~slack \\ penalties \end{subarray} } $$
(41)

The resulting model is the linear nested GDP given in (15–41). This hierarchical model is reformulated into a mixed-integer linear program (MILP) using both Big-M (with both loose and tight M values) and Hull reformulations. The hierarchical GDP model is also transformed into its Equivalent Single-Level GDP and reformulated with both Big-M and Hull methods.

The optimum solution yields a cumulative profit of $2,085. The process network and equipment network designs are given in Figs. 6 and 7, respectively. The Gantt charts for procurement/sales and production are shown in Figs. 8 and 9, respectively. The tank levels are displayed in Fig. 10. The optimal design requires the installation of Processes 1, 3–5; Tanks B and C; and both reactor types, each with two units available. Reactors of type 1 focus almost exclusively on Process 1 with Technology 2, with one batch of Process 3 (Technology 1). Rectors of type 2 are used for Processes 4 and 5, each using Technology 1. Procurement of A occurs every 5 days, with sales of D typically spaced out every 10 days. By the end of the scheduling horizon, both tank levels have been restored to their initial levels (full).

Fig. 6
figure 6

Optimal process network design (edge thickness is proportional to the maximum flow on that line).

Fig. 7
figure 7

Optimal equipment network design

Fig. 8
figure 8

Material procurement and sales schedule

Fig. 9
figure 9

Plant operations schedule (text in each bar, i-j, indicates process number i and technology number j for that event)

Fig. 10
figure 10

Amount of material in each tank (and maximum tank level) throughout the scheduling horizon

The model sizes and computational statistics for each of the reformulated MILP models are given in Table 5, where the continuous (LP) relaxation gap is calculated with respect to the optimal MIP solution. Two additional scenarios are evaluated, where the sales price of material D is increased or decreased by 10%. The computational results for these cases are given in Tables 6 and 7. As can been observed, all formulations, except the hull reformulation of the nested GDP, have poor continuous relaxations with very large relaxation gaps. The hull reformulated nested GDP, on the other hand, has a tight relaxation with an 8–9% relative gap. In this example, both the Big-M and Tight-M reformulations have similar performance, with the equivalent single-level models solving faster than the nested models (except for the case with a 10% decrease in the sales price). For these models, the weak relaxations annul any potential advantage from using nested disjunctions. The MILP model obtained by applying the Hull reformulation to the nested GDP model outperforms the other models, finding the optimum in approximately half of the time required relative to its equivalent single-level counterpart. Compared to the Big-M models, this model solves faster by one order of magnitude, with significantly fewer cuts and nodes explored. This superior performance is due to the tighter LP relaxation and reduced model size. The Hull reformulated nested GDP has fewer binary variables (25% and 4% less, before and after presolve, respectively), continuous variables (10% and 26% less, before and after presolve, respectively), and constraints (5% and 25% less, before and after presolve, respectively) than its equivalent single-level counterpart. Although it seems surprising that a model with fewer variables and constraints is tighter than an equivalent model of greater size, this occurs because of the absence of slack disjuncts in the nested formulation, which make the equivalent formulation less tight.

Table 5 Model sizes and computational statistics of the MILP models resulting from the Big-M reformulations (using both loose and tight M values) and Hull reformulations of the GDP models in Example 1
Table 6 Model sizes and computational statistics of the MILP models resulting from the Big-M reformulations (using both loose and tight M values) and Hull reformulations of the GDP models in Example 1 with a 10% increase in the sales price for material D
Table 7 Model sizes and computational statistics of the MILP models resulting from the Big-M reformulations (using both loose and tight M values) and Hull reformulations of the GDP models in Example 1 with a 10% decrease in the sales price for material D

4.3 Example 2: nonlinear model

Example 2 is based on Example 4.1 in the work by van den Heever and Grossmann (1999), which consists of an integrated superstructure optimization problem with long term operational and expansion planning. The problem has three potential processes (1, 2, and 3), each with its dedicated processing unit, and three materials (A, B, and C) as shown in Fig. 11. Material C is the final product (price: $10,800/ton) and is produced from Material B in Process 1. Material B can be purchased externally (cost: $7,000/ton) or produced from Material A (cost: $1,800/ton) in either Process 2 or Process 3. It is assumed that each process includes any required separation steps, such that the respective exit streams are single-component streams containing the pure product of each process. The objective here is to minimize cost (maximize profit) by making the following decisions:

  • Which processes should be used.

  • Which processes to operate in each period.

  • Which processes to undergo a capacity expansion in each period.

  • How much new processing capacity to install in each period.

Fig. 11
figure 11

Process superstructure for Example 2

The hierarchical GDP model is given as follows. The material balance constraints in the two stream junction points are given in (42) and (43), where \({F}_{s,t}\) is the flow (tons) in stream \(s\) in period \(t\) (where \(t\) is in years). The amount of imported B and exported C are constrained by (44) and (45), respectively.

$${F}_{1,t}={F}_{2,t}+{F}_{3,t} \,\forall\, t\in T$$
(42)
$${F}_{7,t}={F}_{4,t}+{F}_{5,t}+{F}_{6,t} \,\forall\, t\in T$$
(43)
$${F}_{6,t}\le 5 \,\forall\, t\in T$$
(44)
$${F}_{8,t}\le 1 \,\forall\, t\in T$$
(45)

The installation and planning decisions are made in the nested disjunction given in (46), where the top-level decision is to install Process \(i\) or not (\({Y}_{i}\) or \(\neg {Y}_{i}\)). If a process is installed, the respective nonlinear production yield constraint is enforced, where \({g}_{1}\left({F}_{7,t}\right)=0.9\cdot {F}_{7,t}\), \({g}_{2}\left({F}_{2,t}\right)=\mathrm{ln}\left(1+{F}_{2,t}\right)\), and \({g}_{3}\left({F}_{3,t}\right)=1.2\cdot \mathrm{ln}\left(1+{F}_{3,t}\right)\). A process capacity balance is also applied to update the current capacity, \({Q}_{i,t}\), with the capacity in the previous period and the current capacity expansion, \(Q{E}_{i,t}\). The secondary level decision is to operate the installed process, \({N}_{i,t}^{\left(1\right)}\), or not, \({N}_{i,t}^{\left(2\right)}\). If the process is operated in period \(t\), the exit flow is bounded by the process capacity, and the operating cost, \(C{O}_{i,t}\), is determined with the parameter \({\gamma }_{i}\) (\({\gamma }_{1}=\$900\), \({\gamma }_{2}=\$\mathrm{1,000}\), and \({\gamma }_{3}=\$\mathrm{1,200}\)). The tertiary level decision is to expand the process capacity, \({Z}_{i,t}^{\left(1\right)}\), or not, \({Z}_{i,t}^{\left(2\right)}\). The expansion cost, \(C{E}_{i,t}\), is calculated with the fixed cost parameter, \({\alpha }_{i}\) (\({\alpha }_{1}=\$\mathrm{3,500}\), \({\alpha }_{2}=\$\mathrm{1,000}\), and \({\alpha }_{3}=\$\mathrm{1,500}\)), and the variable cost parameter, \({\beta }_{i}\) (\({\beta }_{1}=\$\mathrm{1,200}/ton\), \({\beta }_{2}=\$700/ton\), and \({\beta }_{3}=\$\mathrm{1,100}/ton\)). It should be noted that each of the parameters used can also be indexed by time period if desired.

$$\begin{aligned} &\left[\begin{array}{c}{Y}_{i}\\ \begin{array}{c}{F}_{s,t}={g}_{i}\left({F}_{{s}^{\prime},t}\right) \,\forall\, t\in T\\ {Q}_{i,t}={Q}_{i,t-1}{\left.\right|}_{t>1}+Q{E}_{i,t} \,\forall\, t\in T\end{array}\\ \left[\begin{array}{c}{N}_{i,t}^{\left(1\right)}\\ {F}_{s,t}\le {Q}_{i,t}\\ \begin{array}{c}C{O}_{i,t}={\gamma }_{i}\\ \left[\begin{array}{c}{Z}_{i,t}^{\left(1\right)}\\ C{E}_{i,t}={\alpha }_{i}+{\beta }_{i}\cdot Q{E}_{i,t}\end{array}\right]\bigvee \left[\begin{array}{c}{Z}_{i,t}^{\left(2\right)}\\ \begin{array}{c}Q{E}_{i,t}=0\\ C{E}_{i,t}=0\end{array}\end{array}\right]\end{array}\end{array}\right]\bigvee \left[\begin{array}{c}{N}_{i,t}^{\left(2\right)}\\ {F}_{s,t}=0\\ \begin{array}{c}C{O}_{i,t}=0\\ Q{E}_{i,t}=0\\ C{E}_{i,t}=0\end{array}\end{array}\right] \,\forall\, t\in T\end{array}\right]\\ & \bigvee \left[\begin{array}{c}\neg {Y}_{i}\\ \begin{array}{c}{F}_{s,t}=0 \,\forall\, t\in T\\ {F}_{{s}^{\prime},t}=0 \,\forall\, t\in T\end{array}\\ \begin{array}{c}{Q}_{i,t},Q{E}_{i,t}=0 \,\forall\, t\in T\\ {CO}_{i,t},C{E}_{i,t}=0 \,\forall\, t\in T\end{array}\end{array}\right] \,\forall\, i\in I,s\in {S}_{i}^{out},{s}^{\prime}\in {S}_{i}^{in}\end{aligned}$$
(46)
$${\varvec{\Xi}}\left(1,\left\{{Y}_{i},\neg {Y}_{i}\right\}\right) \,\forall\, i\in I$$
(47)
$${\varvec{\Xi}}\left({Y}_{i},\left\{{N}_{i,t}^{\left(1\right)},{N}_{i,t}^{\left(2\right)}\right\}\right) \,\forall\, i\in I,t\in T$$
(48)
$${\varvec{\Xi}}\left({N}_{i,t}^{\left(1\right)},\left\{{Z}_{i,t}^{\left(1\right)},{Z}_{i,t}^{\left(2\right)}\right\}\right) \,\forall\, i\in I,t\in T$$
(49)

Additional logic constraints are given in (5053). The cardinality clause in (50) allows installing at most 1 of Process 2 or Process 3. This is equivalent to the proposition \(\neg {Y}_{2}\vee \neg {Y}_{3}\) used in the original paper, but generalizes for cases in which there are more than two potential processes in parallel. The implication in (51) ensures that Process 1 is installed if either Process 2 or Process 3 are installed. (52) and (53) enforce that process \(i\) operate at least once if installed, with at least one expansion event scheduled between the beginning of the planning horizon (period 1) and each period \(t\) in which the process is operated, respectively.

$${\varvec{\Gamma}}\left(1,\{{Y}_{2},{Y}_{3}\}\right)$$
(50)
$${Y}_{i}\Rightarrow {Y}_{1} \,\forall\, i\in \left\{\mathrm{2,3}\right\}$$
(51)
$${\varvec{\Lambda}}\left({Y}_{i}, {N}_{i,t}^{\left(1\right)} \,\forall\, t\in T\right) \,\forall\, i\in I$$
(52)
$${\varvec{\Lambda}}\left({N}_{i,t}^{\left(1\right)}, {Z}_{i,{t}^{\prime}}^{\left(1\right)} \,\forall\, {t}^{\prime}\in \left(1,\dots ,t\right)\right) \,\forall\, i\in I,t\in T$$
(53)

The variable domains are given in (54–60), where \({F}_{s}^{UB}=5 ton \,\forall\, s\in S\), \(Q{E}_{1}^{UB}=0.4 ton\), \(Q{E}_{2}^{UB}=0.3 ton\), and \(Q{E}_{3}^{UB}=0.3 ton\).

$$0\le {F}_{s,t}\le {F}_{s}^{UB} \quad \,\forall\, s\in S,t\in T$$
(54)
$$0\le {Q}_{i,t}\le {QE}_{i}^{UB}\cdot t \quad \,\forall\, i\in I,t\in T$$
(55)
$$0\le {QE}_{i,t}\le {QE}_{i}^{UB} \quad \,\forall\, i\in I,t\in T$$
(56)
$$0\le {CE}_{i,t}\le {\alpha }_{i}+{\beta }_{i}\cdot {QE}_{i}^{UB} \quad \,\forall\, i\in I,t\in T$$
(57)
$$0\le {CO}_{i,t}\le {\gamma }_{i} \quad \,\forall\, i\in I,t\in T$$
(58)
$${N}_{i,t}^{\left(n\right)},{Z}_{i,t}^{\left(n\right)}\in \left\{True,False\right\} \quad \,\forall\, i\in I,t\in T,n\in \left\{\mathrm{1,2}\right\}$$
(59)
$${Y}_{i}\in \left\{True,False\right\}\quad \,\forall\, i\in I$$
(60)

The objective function is to minimize the system cost, as given in (61), where the stream costs, \({p}_{s}\), are given in Table 8. The model for Example 2 is thus given by (4261).

$$\mathrm{min} \mathop \sum \limits _{t\in T}\left(\mathop \sum \limits_{s\in S}{p}_{s}\cdot {F}_{s,t}+\mathop \sum \limits_{i\in I}\left(C{O}_{i,t}+C{E}_{i,t}\right)\right)$$
(61)
Table 8 Stream costs, \({p}_{s}\), in $/ton

There are some differences between this formulation and the one in the original paper by van den Heever and Grossmann (1999). The original formulation has the process capacity evolution constraint in the disjunct governed by \({Z}_{i,t}^{\left(1\right)}\). This requires specifying a new constraint, \({Q}_{i,t}={Q}_{i,t-1}\), for the disjunct governed by \({Z}_{i,t}^{\left(2\right)}\), which would also be required for the disjunct governed by \({N}_{i,t}^{\left(2\right)}\). This is avoided by moving the process capacity balance to the upper-level constraints in \({Y}_{i}\). The same is true for the yield constraint, which we move from the \({N}_{i,t}^{\left(1\right)}\) disjunct to the \({Y}_{i}\) disjunct constraints. This requires that we only constrain the flow exiting the process in the secondary level disjunction, rather than both the entrance and exit flows. It is also more intuitive to specify the yield constraints when the processes are selected. Another major difference is that the original model does not use the cardinality constraints in (48) and (49). Instead, it uses the logic propositions (62) and (63). These propositions are contained in (48) and (49), but do not establish a proper hierarchical relationship since there is no link between \({N}_{i,t}^{\left(2\right)}\) and \({Y}_{i}\), and \({Z}_{i,t}^{\left(2\right)}\) and \({N}_{i,t}^{\left(1\right)}\).

$${N}_{i,t}^{\left(1\right)}\Rightarrow {Y}_{i} \quad \,\forall\, i\in I,t\in T$$
(62)
$${Z}_{i,t}^{\left(1\right)}\Rightarrow {N}_{i,t}^{\left(1\right)} \quad \,\forall\, i\in I,t\in T$$
(63)

An important thing to note is that the model in Example 2 is an example of a type of hierarchical GDP, that need not be hierarchical at all. This occurs when every disjunction has only two disjuncts, representing an on and an off state, where the off state has all relevant variables set to zero. When this occurs, (46) can actually be split into three sets of disjunctions without adding the “slack” disjunct observed in the Equivalent Single-Level GDP model. These three sets of disjunctions are given in (6466). The cardinality constraints in (4849) can be replaced by (6263), and (6768). The model composed of (4245), (47), and (5068) is referred to here as the Non-hierarchical formulation.

$$\left[\begin{array}{c}{Y}_{i}\\ {F}_{s,t}={g}_{i}\left({F}_{{s}{\prime},t}\right) \quad \forall t\in T\\ {Q}_{i,t}={Q}_{i,t-1}{\left.\right|}_{t>1}+Q{E}_{i,t} \quad \forall t\in T\end{array}\right]\bigvee \left[\begin{array}{c}\neg {Y}_{i}\\ \begin{array}{c}{F}_{s,t}=0 \quad \forall t\in T\\ {F}_{{s}{\prime},t}=0 \quad \forall t\in T\end{array}\\ \begin{array}{c}{Q}_{i,t}=0 \quad \forall t\in T\\ Q{E}_{i,t}=0 \quad \forall t\in T\end{array}\end{array}\right] \forall i\in I,s\in {S}_{i}^{out},{s}{\prime}\in {S}_{i}^{in}$$
(64)
$$\left[\begin{array}{c}{N}_{i,t}^{\left(1\right)}\\ {F}_{s,t}\le {Q}_{i,t}\\ C{O}_{i,t}={\gamma }_{i}\end{array}\right]\bigvee \left[\begin{array}{c}{N}_{i,t}^{\left(2\right)}\\ {F}_{s,t}=0\\ C{O}_{i,t}=0\end{array}\right] \quad \,\forall\, i\in I,s\in {S}_{i}^{out},t\in T$$
(65)
$$\left[\begin{array}{c}{Z}_{i,t}^{\left(1\right)}\\ C{E}_{i,t}={\alpha }_{i}+{\beta }_{i}\cdot Q{E}_{i,t}\end{array}\right]\bigvee \left[\begin{array}{c}{Z}_{i,t}^{\left(2\right)}\\ \begin{array}{c}Q{E}_{i,t}=0\\ C{E}_{i,t}=0\end{array}\end{array}\right] \quad \,\forall\, i\in I,t\in T$$
(66)
$${\varvec{\Xi}}\left(1,\left\{{N}_{i,t}^{\left(1\right)},{N}_{i,t}^{\left(2\right)}\right\}\right) \quad \,\forall\, i\in I,t\in T$$
(67)
$${\varvec{\Xi}}\left(1,\left\{{Z}_{i,t}^{\left(1\right)},{Z}_{i,t}^{\left(2\right)}\right\}\right) \quad \,\forall\, i\in I,t\in T$$
(68)

The nested GDP model is compared against its equivalent single-level formulation, and the Non-hierarchical formulation, by reformulating each of these into mixed-integer nonlinear programs (MINLPs) using the Hull reformulation. Since the models are nonlinear, the perspective functions were reformulated using the \(\epsilon \)-approximation from Furman et al. (2020), with \(\epsilon ={10}^{-9}\), which is the default nonlinear Hull reformulation method in the disjunctive programming library. As in Example 1, two additional scenarios are run where the product (stream 8) sales price is increased and decreased by 10%. The model statistics are given in Tables 9 (nominal case), 10 (10% increase), and 11 (10% decrease). The Nested formulation is faster than the Equivalent Single-Level formulation by a factor of 1.8–4.2. When local search and range reduction are disabled in BARON, the difference in CPU time becomes more significant (one order of magnitude difference). The continuous relaxations for the Nested and Non-hierarchical formulations are equal (23—37% gap) and tighter than that of the Equivalent Single-Level formulation (57—89% gap). The performance of the Nested formulation is comparable to that of the Non-hierarchical one, with the latter having less continuous variables and constraints. This example highlights the fact that models with on/off disjunctions do not require a hierarchical representation to attain the same performance gains of the nested models.

Table 9 Model sizes and computational results of the MINLP models resulting from the Hull reformulations of the Equivalent Single-Level, Nested, and Non-hierarchical GDP models
Table 10 Computational statistics of the MILP models resulting from the Hull reformulations of the GDP models in Example 2 with a 10% increase in the sales price for stream 8
Table 11 Computational statistics of the MILP models resulting from the Hull reformulations of the GDP models in Example 2 with a 10% decrease in the sales price for stream 8

The optimal expansion profile for the nominal case is given in Fig. 12, where it can be seen that Process 2 is not installed, but Processes 1 and 3 are, where the capacity in Process 1 increases to 1 ton/year by the third year, and Process 3 increases to 1.11 ton/year by the fourth year. The optimal system cost is − $95 thousand, meaning that plant generates profit.

Fig. 12
figure 12

Capacity expansion profiles for each of the processes in Example 2

5 Conclusions

Two main contributions are made in this paper to the generalized disjunctive programming (GDP) modeling framework. The first one is to add cardinality rules to the logic constraints to allow for constraints of the form choose exactly m Boolean variables to be True (or at least m, or at most m). For more than two Boolean variables, modeling these types of constraints via propositional logic (zeroth-order logic) is cumbersome. Thus, introducing predicate logic (first-order logic) to express this new constraint form in GDP adds more expressiveness to logic-based models. The second contribution is to extend GDP for modeling hierarchical systems via nested disjunctions. Such an approach results in more intuitive models, but had not been formalized in the past, as classical GDP does not consider disjunction nesting. The notation and logic constraints for such structures are provided, along with theoretical proofs to the tightness of such models, versus equivalent single-level GDP models. It is shown that mixed-integer programming reformulations of nested GDP models have continuous relaxations that are as tight or tighter than the reformulations of their single-level counterparts in both the Hull reformulation, as well as the Big-M reformulation when tight M values are used. In some cases, the nested models result in tighter continuous relaxations, as shown in the illustrative and numerical examples presented. It was also observed that when large M values are used, the reformulated nested models show worse performance due to the presence of multiple large M parameters in the nested constraints. Finding tight M values requires additional work, and can be done by applying interval arithmetic when the models are linear. However, for nonlinear models, a separate optimization model must be solved for each constraint to find the tightest M values.

Three examples are presented to show the advantages of using nested structures. In the illustrative example, the tightness of the continuous relaxations of nested linear models are compared geometrically with the relaxations of equivalent single-level models. In this example, the models that preserve nested structures have smaller continuous relaxations than their single-level counterparts. This is promising as it may result in computational savings when optimizing nested models. Example 1, a linear GDP, and Example 2, a nonlinear GDP, illustrate the computational advantages of nested GDP models for problems that integrate superstructure design, technology selection, and operations scheduling, and superstructure design, long-term operations planning, and capacity expansion planning, respectively. It is also shown that for systems with bi-disjunct constraints (disjunctions with only two disjunctions), where one disjunct represents an off state with all pertinent variables set to zero (e.g., zero flow), there is no advantage to modeling such systems as hierarchical, even when there may be several levels of decisions. Such systems can be modelled more simply with single-level disjunctions and the necessary linking constraints.

Future work includes investigating how explicit hierarchical structures can be exploited for informed model decomposition methods and branching strategies. Exploring applications of hierarchical GDP to other fields, such as decision trees and stochastic optimization with event constraints, is another potential area for development.