1 Introduction

In the literature, linear multiplicative programs have been widely studied due to their importance from both theoretical and applicative point of views. These problems, strictly related to quadratic programming and bilinear programming, are used in plant layout design, portfolio and financial optimization, VLSI chip design, robust optimization, network flows (see, e.g., Cambini and Martein 2009; Cambini and Sodini 2008; Cambini and Salvi 2010; Gupta 1995; Horst and Pardalos 1995; Horst and Tuy 1996; Horst et al. 2001; Konno and Kuno 1992; McCarl et al. 1977; Mjelde 1983; Ryoo and Sahinidis 2003; Tuy 2016 and references therein).

From the computational point of view, various approaches have been proposed. For instance, Wang et al. (2012) proposed a branch-and-bound algorithm for global minimization of a generalized linear multiplicative programming; Jiao et al. (2012) proposed a branch and bound algorithm based on the computation of subsequent solutions of the series of linear relaxation programming problems; in a very recent paper, Jiao et al. (2023) proposed a branch-reduction-bound algorithm, based on outer space search and branch-and-bound framework.

A second field of literature develops methods based on an eigenvectors approach. The eigenvectors approach has been widely used despite actually it has some drawbacks. It is well known that the eigen-decomposition of a quadratic function is an heavy task from both a computational and a numerical point of view and it does not care about the particular structure of linear multiplicative functions. For all these reasons, in the recent years, various papers have introduced new procedures developed without the use of eigen-decompositions (see, e.g., Shen et al. 2020, 2022; Wang et al. 2012; Zhou et al. 2015).

In this paper, various underestimation functions to be used in the branch and bound procedure are introduced and discussed both with the eigenvectors approach and without it. Then, a full computational test is provided to highlight the best performing functions.

In addition, a motivating example is presented in the last part of the paper. Special structures of multiplicative problems, in facts, arise in several bilevel programming problems of leader-follower type (see Dempe 2020, for example). Some underestimation functions introduced in the paper can be used to improve the algorithmic procedure adopted to tackle this class of problems.

The main contributions of this paper are:

  • To describe a unified framework for dealing with linear multiplicative programs from both a theoretical and computational point of view;

  • To propose various underestimation functions to be used in the branch and bound procedure: quadratic, linear, difference of two convex functions (D.C. functions), and eigenvectors based underestimation functions;

  • To perform a detailed computational test that compares the different underestimation functions under various partition methods to identify the most promising ones;

  • To characterize a special case of multiplicative programs strictly related to bilevel optimization and to define the more appropriate underestimation function to solve them.

The paper is organized as follows. In Sect. 2, the main definitions and preliminary results are given. In addition, the criteria for the splitting process of the branch-and-bound approach are analyzed. On this basis, Sect. 3 is devoted to the study of quadratic, eigenvectors based and linear underestimation functions. Then, in Sect. 4, the detailed results of a wide computational experience are provided and fully discussed, giving a detailed view of the computational aspects of the solution method and improving some of the results of the current literature. Furthermore, Sect. 5 points out the behavior of the proposed underestimation functions in a particular class of linear multiplicative programs very useful in applicative bilevel programming. Finally, a section with the conclusions is given.

2 Definitions and preliminary results

The aim of this section is to define the problem and provide the main preliminary results which will allow the development of the paper. In this light, firstly the problem is defined and then the concept and properties of underestimation functions are given. On this basis, a detailed description of a branch-and-bound scheme to solve the problem is provided, in order to approach it in a unifying framework with respect to different underestimation functions and different splitting criteria. Finally, the quadratic form associated to a linear multiplicative function is recalled as well as the eigenvector-based decomposition of such a quadratic form.

2.1 Definition of the problem

From now on, \({\mathbb {R}}\) will denote the set of real numbers while \(\overline{{\mathbb {R}}}={\mathbb {R}}\cup \{-\infty ,+\infty \}\) will be the affinely extended real number system.

Definition 1

Let P be the following minimization problem:

$$\begin{aligned} P \ : \ \min _{x\in S} \ f(x)\,. \end{aligned}$$

where \(S \subseteq {\mathbb {R}}^n\) is a nonempty polyhedron defined as

$$\begin{aligned} S=\left\{ x\in {\mathbb {R}}^n: \ A_{in} x\underline{\le } b_{in}, \ A_{eq} x = b_{eq},\ l_b\underline{\le } x\underline{\le } u_b \right\} , \end{aligned}$$

with \(A_{in}\in {\mathbb {R}}^{m\times n}\), \(b_{in}\in {\mathbb {R}}^{m}\), \(A_{eq}\in {\mathbb {R}}^{r\times n}\), \(b_{eq}\in {\mathbb {R}}^{r}\), and \(l_b,u_b\in \overline{{\mathbb {R}}}^{n}\), while \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is a linear multiplicative function defined as

$$\begin{aligned} f(x)=\sum _{i=1}^{p} (c_i^Tx+{c_0}_i)(d_i^Tx+{d_0}_i) + a^Tx+a_0\,, \end{aligned}$$

with \(c_i,d_i\in {\mathbb {R}}^{n}\), \({c_0}_i,{d_0}_i\in {\mathbb {R}}\) for all \(i\in 1,\dots ,p\) and \(a\in {\mathbb {R}}^{n}\), \(a_0\in {\mathbb {R}}\).

2.2 Underestimation functions

The use of suitable underestimation functions of f is needed to solve Problem P by means of a branch-and-bound approach.

Definition 2

Let \(S \subseteq {\mathbb {R}}^n\) be nonempty. Let \(f:S \rightarrow {\mathbb {R}}\) and \(\Phi :S \rightarrow {\mathbb {R}}\). Then, \(\Phi\) is an underestimation function of f if:

$$\begin{aligned} f(x)\ge \Phi (x)\ \ \ \ \ \ \ \forall x\in S\,; \end{aligned}$$

moreover, \(Er: S \rightarrow {\mathbb {R}}\) is the corresponding error function defined as \(Er(x)=f(x)- \Phi (x)\).

The following useful properties hold.

Lemma 1

Let \(S \subseteq {\mathbb {R}}^n\) be nonempty. Let \(\Phi _{1}:S \rightarrow {\mathbb {R}}\) and \(\Phi _{2}:S \rightarrow {\mathbb {R}}\) be underestimation functions of \(f:S\rightarrow {\mathbb {R}}\), with \(Er_1:S \rightarrow {\mathbb {R}}\) and \(Er_2:S \rightarrow {\mathbb {R}}\) the corresponding error functions. Then, the following properties hold:

  1. (i)

    For all \(\lambda \in [0,1]\), \(\Phi _\lambda\) defined as \(\Phi _\lambda (x)=\lambda \Phi _1(x)+(1-\lambda )\Phi _2(x)\) is an underestimation function of f, with error function defined as \(Er_\lambda (x)=\lambda Er_1(x)+(1-\lambda )Er_2(x)\);

  2. (ii)

    \(\Phi\) defined as \(\Phi (x)=\max \left\{ \Phi _1(x),\Phi _2(x)\right\}\) is an underestimation function of f, with error function defined as \(Er(x)=\min \left\{ Er_1(x),Er_2(x)\right\}\).

Proof

For any \(x \in S\), it results:

(i) for any \(\lambda \in [0,1]\),

$$\begin{aligned} \Phi _\lambda (x)=\lambda \Phi _1(x)+(1-\lambda )\Phi _2(x)\le \lambda f(x)+(1-\lambda )f(x)=f(x) \end{aligned}$$

and

$$\begin{aligned} Er_\lambda (x)&= f(x)-(\lambda \Phi _1(x)+(1-\lambda )\Phi _2(x))\\&= (\lambda f(x)+(1-\lambda )f(x))-(\lambda \Phi _1(x)+(1-\lambda )\Phi _2(x))\\&= \lambda Er_1(x)+(1-\lambda )Er_2(x)\,; \end{aligned}$$

(ii) according to Definition 2, one has \(f(x)\ge \max \left\{ \Phi _1(x),\Phi _2(x)\right\}\); hence, it follows that

$$\begin{aligned} Er(x)=f(x)-\max \left\{ \Phi _1(x),\Phi _2(x)\right\} =\min \left\{ f(x)-\Phi _1(x),f(x)-\Phi _2(x)\right\} \,. \end{aligned}$$

\(\square\)

The following further result will be used in the next subsection as an algorithmic stopping criterium.

Lemma 2

Let \(S \subseteq {\mathbb {R}}^n\) be nonempty. Let \(\Phi :S \rightarrow {\mathbb {R}}\) be an underestimation function of \(f:S\rightarrow {\mathbb {R}}\), with \(Er:S\rightarrow {\mathbb {R}}\) its error function. If \(\bar{x} \in \arg \min _{S} \Phi\) and \(Er(\bar{x})=0\), then \(\bar{x} \in \arg \min _{S} f\).

Proof

For all \(x \in S\), it results:

$$\begin{aligned} f(\bar{x})= \Phi (\bar{x})+Er(\bar{x})= \Phi (\bar{x}) \le \Phi (x)\le f(x)\,; \end{aligned}$$

hence, the thesis follows. \(\square\)

Finally, it is worth noticing that some underestimation functions of f will be obtained by first rewriting f in D.C. form. Let us recall that a function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is said to be in D.C. form if it is expressed as \(f(x)=q_1(x)-q_2(x)\), with \(q_{1}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) and \(q_{2}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) convex functions (see, e.g., Cambini and Salvi 2009, 2010 and references therein).

In this light, the following result will be useful to deduce underestimation functions of f in the case f is rewritten in D.C. quadratic form.

Lemma 3

Let \([y^L,y^U]\subset {\mathbb {R}}\) with \(-\infty<y^L<y^U<+\infty\): for all \(y\in [y^L,y^U]\), it results

$$\begin{aligned} -y^2\ge -(y^L+y^U)y+y^Ly^U. \end{aligned}$$

Then, for all \(y\in [y^L,y^U]\), the error given by the use of \(-(y^L+y^U)y+y^Ly^U\) instead of \(-y^2\) is

$$\begin{aligned} Er(y)=(y-y^L)(y^U-y)=\frac{1}{4}(y^U-y^L)^2 -\left( y-\frac{y^L+y^U}{2}\right) ^2, \end{aligned}$$

with \(\frac{1}{4}(y^U-y^L)^2\) the maximum error obtained at \(y=\frac{y^L+y^U}{2}\).

Proof

For all \(y\in [y^L,y^U]\), the error given by the use of \(-(y^L+y^U)y+y^Ly^U\) instead of \(-y^2\) is:

$$\begin{aligned} Er(y)=(-y^2)-(-(y^L+y^U)y+y^Ly^U). \end{aligned}$$

Being

$$\begin{aligned}{} & {} 0\le (y-y^L)(y^U-y)=-y^2+(y^L+y^U)y-y^Ly^U\ \ \text {and}\\{} & {} \qquad -y^Ly^U=\frac{1}{4}(y^U-y^L)^2-\frac{1}{4}(y^L+y^U)^2 \end{aligned}$$

it yields \((y-y^L)(y^U-y)=\frac{1}{4}(y^U-y^L)^2 -\left( y-\frac{y^L+y^U}{2}\right) ^2\) and the thesis follows. \(\square\)

2.3 A branch-and-bound scheme

In order to solve Problem P by using a branch-and-bound approach (see, e.g., Bajaj and Faruque Hasan 2020; Cambini and Sodini 2005, 2008; Cambini and Salvi 2009, 2010; Fampa et al. 2017; Gerard et al. 2017; Shen et al. 2020 and references therein), the following operative scheme will be considered:

  • the feasible region will be iteratively partitioned in multidimensional rectangles,

  • the function f will be “relaxed” over the single partitions by means of a convex underestimation function \(\Phi\),

  • the convex “relaxed” subproblems will be solved

  • the feasible solution having the smallest value of f will be maintained.

Specifically speaking, such a branch-and-bound approach will be implemented by means of:

  • a Priority Queue (PQ) used to store the single partitions sorted with respect to the Lower Bound (LB) found minimizing the convex “relaxed” subproblems over the partitions;

  • a feasible point Sol and a value UB corresponding, respectively, to the incumbent best feasible solution found and its image \(UB=f(Sol)\);

  • a convex underestimation function \(\Phi\) needed to solve the subproblems over the various partitions.

The priority queue PQ is used to speed up the solution method (at the cost of memory usage of course) since the partition with smaller LB is always known and since it does not need of any periodic “pruning” process (in the “pruning” process of a branch-and-bound scheme, the stored partitions having a LB not smaller than UB are cancelled since they cannot improve the incumbent best feasible solution).

The following commands are aimed to describe the way the priority queue PQ is managed:

  • PQisempty();

  • \(LB:=PQsmallest()\);

  • PQadd(LBpartitionopt);

  • \([partition, opt]:=PQextract()\).

The command PQisempty tells whether the PQ is empty or not; the command PQsmallest provides the smallest LB of the partitions stored in the PQ; the command PQadd adds to the PQ a partition as well as the corresponding Optimal Solution (opt) and its optimal value \(LB=\Phi (opt)\) obtained minimizing \(\Phi\) over the partition itself. Clearly, PQ is such that smaller is the value of the LB, the higher the priority of the partition in the PQ will be. The last command PQextract removes from the PQ the partition having the smallest LB and provides as the output all the data stored.

Remark 1

If the smallest LB is such that \(LB\ge UB\), then the partitions in the PQ are not able to improve the incumbent best feasible solution Sol and hence PQ can be emptied (being PQ a priority queue, the “pruning” process becomes just a final stopping condition and it is no more periodic).

The following subprocedure “CheckPartition()” minimizes the underestimation function \(\Phi\) over a given partition \(\Pi\), improves the value of UB and stores the data in the PQ when the potential relative improvement is such that

$$\begin{aligned} Relimp(LB,UB) > relTol\,, \end{aligned}$$
(1)

with

$$\begin{aligned} Relimp(LB,UB)=\frac{abs(UB-LB)}{\max \{abs(UB),abs(LB)\}} \end{aligned}$$

and \(relTol>0\) be a chosen tolerance. According to Remark 1, if \(LB\ge UB\) then the partition is not added to the queue and discarded (“pruning”).

$$\begin{aligned}{} & {} {\varvec{Subprocedure}}\ {\textbf {CheckPartition}}(\textit{inputs}:\ \Pi )\\{} & {} \qquad \textit{if}\ not(isempty(S\cap \Pi ))\ \textit{then}\\{} & {} \qquad \quad opt:=\arg \min \limits _{x\in S\cap \Pi } \Phi (x); \ LB:=\Phi (opt);\\{} & {} \qquad \quad \textit{if}\ LB<UB\ \textit{then}\\{} & {} \qquad \qquad val:=f(opt);\\{} & {} \qquad \qquad \textit{if}\ val<UB\ \textit{then} \ UB:=val, Sol:=opt\ \textit{end if};\\{} & {} \qquad \qquad \textit{if}\ Relimp(LB,UB)>relTol\ \textit{then} \\{} & {} \qquad \qquad \qquad PQadd(LB, partition, opt);\\{} & {} \qquad \qquad \textit{end if};\\{} & {} \qquad \quad \textit{end if};\\{} & {} \qquad \textit{end if};\\{} & {} {\varvec{end}}\ {\varvec{subproc}}. \end{aligned}$$

The overall solution branch-and-bound scheme is described by the procedure “Solve()”.

$$\begin{aligned}{} & {} {\varvec{Procedure}}\ {\textbf {Solve}}(\textit{inputs}:\ P; \ \textit{outputs}:\ Opt, Val)\\{} & {} \qquad \text {Let}\ PQ:=\emptyset , UB:=+\infty , Sol:=[];\\{} & {} \qquad \text {Determine the smallest partition} \ \Pi _0\ \text {containing}\ S;\\{} & {} \qquad CheckPartition(\Pi _0);\\{} & {} \qquad \textit{while}\ not(PQisempty())\ \text {and} \ PQsmallest()<UB \textit{do}\\{} & {} \qquad \quad [\Pi , opt]:=PQextract();\\{} & {} \qquad \quad \text {Choose the branching variable to be used in the splitting process};\\{} & {} \qquad \quad \text {Split partition}\ \Pi \ \text {accordingly to the chosen branching variable}\ : \ \Pi =\Pi _1\cup \Pi _2;\\{} & {} \qquad \quad CheckPartition(\Pi _1), CheckPartition(\Pi _2);\\{} & {} \qquad \textit{end while};\\{} & {} \qquad \text {Let}\ Opt :=Sol, Val := UB;\\{} & {} {\varvec{end}}\ {\varvec{proc}}. \end{aligned}$$

Firstly, variables PQ, UB and Sol are initialized and the starting smallest partition \(\Pi _0\) containing S is found and checked. Then, the iterative phase starts and continues up to either the PQ is emptied or the stored partitions cannot improve anymore UB (“pruning”). At each iteration, the partition with the smaller LB is extracted from the PQ, a branching variable is selected, and the partition split accordingly (multiway branching has been shown to provide poor results, see for example Gerard et al. 2017). The two new partitions are then checked. Finally, at the end of the iterative process, outputs are set.

Remark 2

The value of UB is fundamental to improving the performance of the algorithm since it is used to discard the not useful partitions. For this reason, the feasible points found while looking for \(\Pi _0\) should be used to improve values UB and Sol.

Notice that the convergence of the proposed method has been widely discussed in the literature (see, e.g., Bajaj and Faruque Hasan 2020; Cambini and Salvi 2009, 2010; Fampa et al. 2017; Gerard et al. 2017; Shen et al. 2020 and references therein). Specifically speaking, since the partitions will be split with respect to values not “close” to its boundaries (see Sect. 4), then the tolerance parameter \(relTol>0\) guarantees that condition (1) in subprocedure “CheckPartition()” will become false after a sufficiently large number of iterations. The correctness of the method follows since just feasible solutions are evaluated to improve the incumbent best solution and since the whole feasible region is analyzed. This is known to be an NP-Hard problem and in the worst case many local but not global optimal solutions can be found.

Some further choices are finally needed to complete the description of the solution process:

  • Which underestimation function \(\Phi (x)\) should be used? tight underestimation functions improve the algorithm performance, moreover the underestimation function determines the set of branching variables;

  • Which branching variable should be chosen to split the current partition?

  • With respect to which value of the branching variable the current partition should be split?

Actually, another fundamental choice has been already made:

  • At each iteration, the partition with the smaller LB is selected and analyzed.

This criterium is aimed to look for feasible solutions having small values, thus allowing to improve UB as much as possible and to increase as much as possible the number of partitions discarded by means of the stopping “pruning” condition.

2.4 A raw approach

Problem P is a particular quadratic (usually indefinite) program since f(x) can be rewritten as:

$$\begin{aligned} f(x) = \sum _{i=1}^{p} (c_i^Tx)(d_i^Tx) +\hat{a}^Tx+\hat{a}_0=x^T\hat{Q}x+\hat{a}^Tx+\hat{a}_0\,, \end{aligned}$$
(2)

with

$$\begin{aligned} \hat{a}&:=a+\sum _{i=1}^{p} (c_i {d_0}_i+d_i {c_0}_i)\,,\\ \hat{a}_0&:=a_0+\sum _{i=1}^{p} {c_0}_i {d_0}_i\,,\\ \hat{Q}&:=\frac{1}{2}\sum _{i=1}^{p} \left( c_i d_i^T +d_i c_i^T\right) \,. \end{aligned}$$

Quadratic indefinite programs can be efficiently solved with a branch-and-bound approach by means of a suitable eigenvectors-based decomposition of the objective function (see, e.g., Cambini and Sodini 2005, 2008; Fampa et al. 2017). Specifically speaking, since \(\hat{Q}\) is a symmetric matrix, there exists an orthonormal matrix \(U\in {\mathbb {R}}^{n\times n}\) (\(UU^T=U^TU=I\)) and a diagonal matrix \(D\in {\mathbb {R}}^{n\times n}\) such that \(\hat{Q}=UDU^T\). The diagonal elements of D are the eigenvalues \(\lambda _1,\dots ,\lambda _n\in {\mathbb {R}}\) of \(\hat{Q}\), while the orthonormal columns \(u_1,\dots ,u_n\in {\mathbb {R}}^n\) of U are the corresponding eigenvectors of \(\hat{Q}\). As a consequence, it results:

$$\begin{aligned} x^T\hat{Q}x=\sum _{i=1}^{n} \lambda _i (u_i^Tx)^2. \end{aligned}$$

Thus, by means of the sets of indices

$$\begin{aligned} \Lambda ^+=\left\{ i=1,\dots ,n\ : \ \lambda _i>0\right\} \quad , \quad \Lambda ^-=\left\{ i=1,\dots ,n\ : \ \lambda _i<0\right\} \end{aligned}$$

and the vectors

$$\begin{aligned} v_i=\sqrt{|\ \lambda _i\ |} \cdot u_i \quad \forall i=1,\dots ,n\,, \end{aligned}$$

the quadratic component of f can be rewritten as follows:

$$\begin{aligned} x^T\hat{Q}x=\sum _{i\in \Lambda ^+} (v_i^Tx)^2 -\sum _{i\in \Lambda ^-} (v_i^Tx)^2. \end{aligned}$$

In this way, f can be expressed in the following D.C. form:

$$\begin{aligned} f(x)=\left( \sum _{i\in \Lambda ^+} (v_i^Tx)^2+\hat{a}^Tx +\hat{a}_0\right) -\left( \sum _{i\in \Lambda ^-} (v_i^Tx)^2\right) \end{aligned}$$
(3)

The following branching variables are suggested by (3) for all \(i\in \Lambda ^-\):

$$\begin{aligned} \mu _i=v_i^Tx \ \end{aligned}$$

so that

$$\begin{aligned} \underline{\mu }_i=\min _{x\in S} v_i^Tx \ \ \ \ \text {and} \ \ \ \ \overline{\mu }_i=\max _{x\in S} v_i^Tx. \end{aligned}$$

Moreover, for all \(\mu ^L:=(\mu ^L_{i})_{i\in \Lambda ^-}\) , \(\mu ^U:=(\mu ^U_{i})_{i\in \Lambda ^-}\in {\mathbb {R}}^{\left| \Lambda ^{-}\right| }\) such that \(\mu ^L \underline{\le } \mu ^U\), the following rectangle is introduced:

$$\begin{aligned} \left[ \mu ^L , \mu ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n : \, \mu _i^L \le v_i^Tx\le \mu _i^U \ \ \ \forall i \in \Lambda ^{-}\right\} . \end{aligned}$$

Let \(\underline{\mu }:=(\underline{\mu }_i)_{i\in \Lambda ^-}\) and \(\overline{\mu }:=(\overline{\mu }_i)_{i\in \Lambda ^-}\). With respect to branching variables \(\mu\), rectangle \(\left[ \underline{\mu } , \overline{\mu }\right]\) is the smallest partition \(\Pi _0\) containing S. The following underestimation function can then be stated by means of (3).

Theorem 1

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (3). Then, the following convex quadratic function is an underestimation function for f over \(S\cap \left[ \mu ^L , \mu ^U\right]\):

$$\begin{aligned} (u_0)\ \ \left\{ \begin{array}{rcl} \Phi _0(x)&{}=&{} \displaystyle \left( \sum _{i\in \Lambda ^+} (v_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) +\sum _{i\in \Lambda ^-}\left( \mu _i^L\mu _i^U-(\mu _i^L+\mu _i^U)v_i^Tx\right) ,\\ Er_0(x)&{}=&{} \displaystyle \sum _{i\in \Lambda ^-} \left( v_i^Tx-\mu _i^L\right) \left( \mu _i^U-v_i^Tx\right) , \end{array}\right. \end{aligned}$$

with \(\frac{1}{4}\sum _{i\in \Lambda ^-} (\mu _i^U-\mu _i^L)^2\) the maximum error for \(Er_0(x)\) obtained at \(v_i^Tx=\frac{\mu _i^L+\mu _i^U}{2}\), \(i\in \Lambda ^-\).

Proof

Being \(\mu _i^L\le v_i^Tx\le \mu _i^U\) for all \(x \in S\cap \left[ \mu ^L , \mu ^U\right]\) and for all \(i\in \Lambda ^-\), from Lemma 3 it yields:

$$\begin{aligned} -(v_i^Tx)^2\ge -(\mu _i^L+\mu _i^U)v_i^Tx+\mu _i^L\mu _i^U \end{aligned}$$

Hence, \(\Phi _0(x)\) follows trivially from (3). Moreover, it results:

$$\begin{aligned} Er_0(x)= & {} f(x)-\Phi _0(x)\\&= \sum _{i\in \Lambda ^-} \left( -(v_i^Tx)^2+(\mu _i^L+\mu _i^U)v_i^Tx-\mu _i^L\mu _i^U\right) \\&= \sum _{i\in \Lambda ^-} \left( v_i^Tx-\mu _i^L\right) \left( \mu _i^U-v_i^Tx\right) \end{aligned}$$

\(\square\)

2.5 Splitting process

Assume to be in the iterative process and that the partition having the smallest lower bound LB have been extracted from PQ by means of the command “\([\Pi , opt]:=PQextract()\)”. It is now necessary to choose a branching variable in order to split the partition \(\Pi\). Assume, for example, that underestimation \((u_0)\) is used (results are analogous for all underestimations functions which will be proposed in the next Section), so that \(\Pi =\left[ \mu ^L , \mu ^U\right]\) and:

$$\begin{aligned} Er_0(x)= \displaystyle \sum _{i\in \Lambda ^-} \left( v_i^Tx-\mu _i^L\right) \left( \mu _i^U-v_i^Tx\right) , \end{aligned}$$

with maximum error value \(\frac{1}{4}\sum _{i\in \Lambda ^-} (\mu _i^U-\mu _i^L)^2\) obtained at \(v_i^Tx=\frac{\mu _i^L+\mu _i^U}{2}\), \(i\in \Lambda ^-\). Two criteria are generally used to determine the branching variable:

  • The largest interval \(\left[ \mu _i^L , \mu _i^U\right]\): this criterion is aimed to reduce as much as possible the error maximum value; on the other hand, it does not use the feasible point \(opt=\arg \min \nolimits _{x\in S\cap \Pi } \Phi _0(x)\) which may suggest where to look for the optimal solution of the problem.

  • The largest error addend \(\left( v_i^Tx-\mu _i^L\right) \left( \mu _i^U-v_i^Tx\right)\): this criterion is aimed to reduce as much as possible the error value at the point opt, hence tightening the underestimation as much as possible close to opt.

Once the branching variable is chosen, the question is how to split the corresponding interval \(\left[ \mu _i^L , \mu _i^U\right]\), that is, how to determine the value \(\mu _i^*\) so that \(\left[ \mu _i^L , \mu _i^U\right] =\left[ \mu _i^L , \mu _i^*\right] \cup \left[ \mu _i^* , \mu _i^U\right]\). In this light, possible choices are:

  • The medium point of the interval \(\left[ \mu _i^L , \mu _i^U\right]\): the use of \(\mu _i^M=\frac{\mu _i^L+\mu _i^U}{2}\) is aimed to reduce as much as possible the error maximum value, again the feasible point opt is not used;

  • The value \(\mu _i^{opt}=v_i^Topt\): in this case the optimal solution of the underestimation is used, but this value may be close to the boundaries \(\mu _i^L\) and \(\mu _i^U\) thus highly increasing the number of iterations needed to solve the problem and hence increasing the convergence time itself;

  • A linear combination of \(\mu _i^M\) and \(\mu _i^{opt}\): given a value \(\alpha \in [0,1]\) the point \(\mu _i^*=\alpha \mu _i^{opt}+(1-\alpha )\mu _i^M\) could be used to take into account of both the error maximum value and the solution opt.

These criteria can be summarized and linked as follows:

  • Blindly reduce as much as possible the error maximum value: choose the largest interval \(\left[ \mu _i^L , \mu _i^U\right]\) and split it in the middle with \(\mu _i^M\) (see, e.g., Cambini and Sodini 2005; Shen et al. 2020);

  • Use the “infos” given by opt: choose the largest error addend \(\left( v_i^Tx-\mu _i^L\right) \left( \mu _i^U-v_i^Tx\right)\) and split it with respect to \(\mu _i^*=\alpha \mu _i^{opt}+(1-\alpha )\mu _i^M\) (see, e.g., Cambini and Salvi 2009, 2010; Fampa et al. 2017).

The recent literature shows that the latter opportunity is the most performing one, and is the one that will be used in the computational test described in Sect. 4. Finally, notice that, by means of Lemma 2, the splitting process should be performed only if the largest error addend is grater than a suitable tolerance \(errTol>0\).

3 Specific underestimation functions

The eigenvectors-based approach described in Sect. 2.4 actually has some drawbacks. First of all, the eigen-decomposition of a quadratic function is an heavy task from both a computational and a numerical point of view. Moreover, such an eigen-decomposition does not take into account of the particular structure of linear multiplicative functions. In this very light, in the recent literature various papers aimed to approach linear multiplicative problems without the use of eigen-decompositions (see, e.g., Shen et al. 2020, 2022; Wang et al. 2012; Zhou et al. 2015).

The aim of this section is to state some underestimation functions for f not using the eigenvectors of matrix \(\hat{Q}\), in order to efficiently solve Problem P without the computational and numerical troubles of eigenvectors computing.

3.1 Linear underestimation functions

In the recent literature (see Shen et al. 2020, for example) some linear underestimation functions have been used to solve linear multiplicative problems by means of a branch-and-bound approach. Usually these underestimations are not tight, for this very reason in this subsection some further linear underestimations will be studied. Recalling that:

$$\begin{aligned} f(x) = \sum _{i=1}^{p} (c_i^Tx)(d_i^Tx) + \hat{a}^Tx+\hat{a}_0{,} \end{aligned}$$

the following 2p branching variables can be considered:

$$\begin{aligned} \xi _i=c_i^Tx \ \text{ and } \ \delta _i=d_i^Tx\,,\ \ \ {i=1,\ldots ,p} \end{aligned}$$

so that

$$\begin{aligned} \underline{\xi }_i=\min _{x\in S} c_i^Tx \ , \ \overline{\xi }_i=\max _{x\in S} c_i^Tx\ \ \ \ \text {and}\ \ \ \ \underline{\delta }_i=\min _{x\in S} d_i^Tx \ , \ \overline{\delta }_i=\max _{x\in S} d_i^Tx\,. \end{aligned}$$

Moreover, for all \(\xi ^L:=(\xi ^L_{i})_{i=1,\ldots ,p}\), \(\xi ^U:=(\xi ^U_{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^p\) such that \(\xi ^L \underline{\le } \xi ^U\) and for all \(\delta ^L:=(\delta ^L_{i})_{i=1,\ldots ,p}\), \(\delta ^U:=(\delta ^U_{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^p\) such that \(\delta ^L \underline{\le } \delta ^U\), the following rectangles are introduced:

$$\begin{aligned} \left[ \xi ^L , \xi ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n :\, \xi _i^L \le c_i^Tx\le \xi _i^U \ \ \ \forall i=1,\dots ,p\right\} ;\\ \left[ \delta ^L , \delta ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n :\, \delta _i^L \le d_i^Tx\le \delta _i^U \ \ \forall i=1,\dots ,p\right\} . \end{aligned}$$

Let \(\underline{\xi }:=(\underline{\xi }_i)_{i=1,\ldots ,p}\), \(\overline{\xi }:=(\overline{\xi }_i)_{i=1,\ldots ,p}\), \(\underline{\delta }:=(\underline{\delta }_i)_{i=1,\ldots ,p}\), and \(\overline{\delta }:=(\overline{\delta }_i)_{i=1,\ldots ,p}\). With respect to branching variables \(\xi\) and \(\delta\), rectangle \(\left[ \underline{\xi }, \overline{\xi }\right] \cap \left[ \underline{\delta } , \overline{\delta }\right]\) is the smallest partition \(\Pi _0\) containing S.

The following linear underestimation functions can then be stated.

Theorem 2

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (2). Then, the following linear functions are underestimation functions for f over \(S\cap \left[ \xi ^L , \xi ^U\right] \cap \left[ \delta ^L , \delta ^U\right]\):

$$\begin{aligned} (u_{1})\ \ \left\{ \begin{array}{rcl} \Phi _{1}(x) &{}=&{} \displaystyle \left( \hat{a}^Tx+\hat{a}_0\right) +\sum _{i =1}^p\left( \delta _i^L c_i^Tx+\xi _i^L d_i^Tx-\xi _i^L\delta _i^L \right) , \\ Er_{1}(x)&{}=&{} \displaystyle \sum _{i =1}^p\left( c_i^Tx-\xi _i^L\right) \left( d_i^Tx-\delta _i^L\right) , \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} (u_{2})\ \ \left\{ \begin{array}{rcl} \Phi _{2}(x) &=& \displaystyle \left( \hat{a}^Tx+\hat{a}_0\right) +\sum _{i =1}^p\left( \delta _i^U c_i^Tx+\xi _i^U d_i^Tx-\xi _i^U \delta _i^U \right) , \\ Er_{2}(x)&{}=&{} \displaystyle \sum _{i =1}^p\left( \xi _i^U-c_i^Tx\right) \left( \delta _i^U-d_i^Tx\right) , \end{array}\right. \end{aligned}$$

with \(\sum _{i =1}^p (\xi _i^U-\xi _i^L)(\delta _i^U-\delta _i^L)\) the maximum error for \(Er_1(x)\) and \(Er_2(x)\) obtained at \(c_i^Tx=\xi _i^U\), \(d_i^Tx=\delta _i^U\), \(i=1,\dots ,p\), and \(c_i^Tx=\xi _i^L\), \(d_i^Tx=\delta _i^L\), \(i=1,\dots ,p\), respectively.

Proof

For all \(i=1,\dots ,p\), \(\left( c_i^Tx-\xi _i^L\right) \ge 0\), \(\left( d_i^Tx-\delta _i^L\right) \ge 0\), \(\left( \xi _i^U-c_i^Tx\right) \ge 0\) and \(\left( \delta _i^U-d_i^Tx\right) \ge 0\), yield:

$$\begin{aligned}{} & {} 0\le \left( c_i^Tx-\xi _i^L\right) \left( d_i^Tx-\delta _i^L\right) = (c_i^Tx)(d_i^Tx)- \delta _i^L c_i^Tx-\xi _i^L d_i^Tx+\xi _i^L\delta _i^L \\{} & {} \quad \Rightarrow \ (c_i^Tx)(d_i^Tx)\ge \delta _i^Lc_i^Tx+\xi _i^L d_i^Tx-\xi _i^L\delta _i^L ,\\{} & {} 0\le \left( \xi _i^U-c_i^Tx\right) \left( \delta _i^U-d_i^Tx\right) = (c_i^Tx)(d_i^Tx)- \delta _i^U c_i^Tx-\xi _i^U d_i^Tx+\xi _i^U\delta _i^U\\{} & {} \quad \Rightarrow \ (c_i^Tx)(d_i^Tx)\ge \delta _i^U c_i^Tx+\xi _i^U d_i^Tx-\xi _i^U\delta _i^U \end{aligned}$$

Hence, the thesis follows by means of simple calculations. \(\square\)

Remark 3

Notice that \(\delta _i^Lc_i^Tx+\xi _i^L d_i^Tx-\xi _i^L\delta _i^L\) and \(\delta _i^U c_i^Tx+\xi _i^U d_i^Tx-\xi _i^U\delta _i^U\) are know as the McCormick lower envelopes for the bilinear function \((c_i^Tx)(d_i^Tx)\) (McCormick 1976). Notice also that in the case \(\xi ^L\ge 0\) and \(\delta ^L\ge 0\), \(i=1,\ldots ,p\), linear underestimations of the kind \(\sum _{i=1}^{p} (c_i^Tx)\delta _i^L + \hat{a}^Tx+\hat{a}_0\) or \(\sum _{i=1}^{p} (d_i^Tx)\xi _i^L + \hat{a}^Tx+\hat{a}_0\) have been used, and that these are far less tight than the one proposed in this subsection.

3.2 Quadratic underestimation functions

The aim of this subsection is to state underestimation functions of f, tighter than the linear ones, by properly rewriting f in D.C. form.

Theorem 3

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be defined as in (2). Then, f can be rewritten in the following D.C. forms:

  1. (i)

    \(\displaystyle f(x)=\left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( \frac{1}{2}\sum _{i=1}^{p} \left( (c_i^Tx)^2+(d_i^Tx)^2\right) \right) ;\)

  2. (ii)

    \(\displaystyle f(x)=\left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx-d_i^Tx)^2\right) ;\)

  3. (iii)

    \(\displaystyle f(x)=\left( \frac{1}{2}\sum _{i=1}^{p} \left( (c_i^Tx)^2+(d_i^Tx)^2\right) +\hat{a}^Tx+\hat{a}_0\right) -\left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx-d_i^Tx)^2\right) .\)

Proof

Firstly, one get:

$$\begin{aligned}{} & {} \triangleright \ \ (c_i^Tx)(d_i^Tx) =\frac{1}{2} \left( 2(c_i^Tx)(d_i^Tx) + (c_i^Tx)^2+(d_i^Tx)^2 -(c_i^Tx)^2-(d_i^Tx)^2\right) \\{} & {} \quad\quad\quad\quad\quad\quad\;\,=\frac{1}{2}\left( (c_i^Tx+d_i^Tx)^2 -(c_i^Tx)^2-(d_i^Tx)^2 \right) ;\\{} & {} \triangleright \ \ (c_i^Tx)(d_i^Tx) =\frac{1}{4} \left( 2(c_i^Tx) (d_i^Tx) + (c_i^Tx)^2+(d_i^Tx)^2 + 2(c_i^Tx)(d_i^Tx) - (c_i^Tx)^2-(d_i^Tx)^2 \right) \\{} & {} \quad\quad\quad\quad\quad\quad\;\,=\frac{1}{4}\left( (c_i^Tx+d_i^Tx)^2-(c_i^Tx-d_i^Tx)^2 \right) ;\\{} & {} \triangleright \ \ (c_i^Tx)(d_i^Tx)=\frac{1}{2} \left( (c_i^Tx)^2+(d_i^Tx)^2 + 2(c_i^Tx)(d_i^Tx) - (c_i^Tx)^2-(d_i^Tx)^2 \right) \\{} & {} \quad\quad\quad\quad\quad\quad\;\,=\frac{1}{2}\left( (c_i^Tx)^2+(d_i^Tx)^2-(c_i^Tx-d_i^Tx)^2 \right) . \end{aligned}$$

Then, by opportunely replacing each of them in (2), the thesis follows. \(\square\)

The following underestimation function can be stated by means of (i) of Theorem 3 and the 2p branching variables described in the previous subsection.

Theorem 4

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (i) of Theorem 3. Then, the following convex quadratic function is an underestimation function for f over \(S\cap \left[ \xi ^L , \xi ^U\right] \cap \left[ \delta ^L , \delta ^U\right]\):

$$\begin{aligned} (u_3)\ \ \left\{ \begin{array}{rcl} \Phi _3(x)&{}=&{} \displaystyle \left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) \\ &&+\frac{1}{2}\sum\limits_{i=1}^{p} \left( \xi _i^L\xi _i^U+\delta _i^L\delta _i^U- (\xi _i^L+\xi _i^U) c_i^Tx-(\delta _i^L+\delta _i^U)d_i^Tx\right) ,\\ Er_3(x)&{}=&{} \displaystyle \frac{1}{2}\sum _{i =1}^p\left( (c_i^Tx-\xi _i^L)(\xi _i^U-c_i^Tx)+(d_i^Tx-\delta _i^L)(\delta _i^U-d_i^Tx) \right) , \end{array}\right. \end{aligned}$$

with \(\frac{1}{8}\sum _{i =1}^p (\xi _i^U-\xi _i^L)^2+\frac{1}{8}\sum _{i =1}^p (\delta _i^U-\delta _i^L)^2\) the maximum error for \(Er_3(x)\) obtained at \(c_i^Tx=\frac{\xi _i^L+\xi _i^U}{2}\) and \(d_i^Tx=\frac{\delta _i^L+\delta _i^U}{2}\), \(i=1,\dots ,p\).

Proof

Being \(\xi _i^L\le c_i^Tx\le \xi _i^U\) and \(\delta _i^L\le d_i^Tx\le \delta _i^U\) for all \(x \in S\cap \left[ \xi ^L , \xi ^U\right] \cap \left[ \delta ^L , \delta ^U\right]\) and for all \(i=1,\dots ,p\), from Lemma 3 it yields:

$$\begin{aligned} -(c_i^Tx)^2-(d_i^Tx)^2\ge -(\xi _i^L+\xi _i^U)c_i^Tx +\xi _i^L\xi _i^U -(\delta _i^L+d_i^U)d_i^Tx+\delta _i^L\delta _i^U \end{aligned}$$

Hence, \(\Phi _3(x)\) follows trivially from (i) of Theorem 3. Moreover, it results:

$$\begin{aligned} Er_3(x)= & {} f(x)-\Phi _3(x)\\&= \frac{1}{2}\sum _{i =1}^p\left( 2(c_i^Tx)(d_i^Tx) -(c_i^Tx+d_i^Tx)^2+(\xi _i^L+\xi _i^U) c_i^Tx\right. \\ & \quad \left. +(\delta _i^L+\delta _i^U)d_i^Tx -(\xi _i^L\xi _i^U+\delta _i^L\delta _i^U)\right) \\&= \frac{1}{2}\sum _{i =1}^p\left( -(c_i^Tx)^2-(d_i^Tx)^2 +(\xi _i^L+\xi _i^U) c_i^Tx\right. \\ & \quad \left. +(\delta _i^L+\delta _i^U)d_i^Tx -(\xi _i^L\xi _i^U+\delta _i^L\delta _i^U)\right) \\&= \frac{1}{2}\sum _{i =1}^p\left( (c_i^Tx-\xi _i^L) (\xi _i^U-c_i^Tx)+(d_i^Tx-\delta _i^L)(\delta _i^U-d_i^Tx) \right) \end{aligned}$$

\(\square\)

In similar way, for all \(i=1,\dots ,p\), the following branching variables are suggested by (ii) and (iii) of Theorem 3:

$$\begin{aligned} \sigma _i=(c_i-d_i)^Tx\ \end{aligned}$$

so that

$$\begin{aligned} \underline{\sigma }_i=\min _{x\in S} (c_i-d_i)^Tx \ \ \ \ \text {and} \ \ \ \ \overline{\sigma }_i=\max _{x\in S} (c_i-d_i)^Tx \end{aligned}$$

Moreover, for all \(\sigma ^L:=(\sigma ^L_{i})_{i=1,\dots ,p}\) , \(\sigma ^L:=(\sigma ^L_{i})_{i=1,\dots ,p}\in {\mathbb {R}}^p\) such that \(\sigma ^L \underline{\le } \sigma ^U\), the following rectangle is introduced:

$$\begin{aligned} \left[ \sigma ^L , \sigma ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n :\, \sigma _i^L \le (c_i-d_i)^Tx\le \sigma _i^U \ \ \ \forall i=1,\dots ,p\right\} . \end{aligned}$$

Let \(\underline{\sigma }:=(\underline{\sigma }_i)_{i=1,\dots ,p}\) and \(\overline{\sigma }:=(\overline{\sigma }_i)_{i=1,\dots ,p}\). With respect to branching variables \(\sigma\), rectangle \(\left[ \underline{\sigma } , \overline{\sigma }\right]\) is the smallest partition \(\Pi _0\) containing S. In this light, the following underestimation functions can be stated by means of (ii) and (iii) of Theorem 3.

Theorem 5

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (ii) and i(ii) of Theorem 3. Then, the following convex quadratic functions are underestimation functions for f over \(S\cap \left[ \sigma ^L , \sigma ^U\right]\), respectively:

$$\begin{aligned} (u_4)\ \ \left\{ \begin{array}{rcl} \Phi _4(x)&{}=&{} \displaystyle \left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) \\ &{}&{}+\frac{1}{4} \sum\limits _{i=1}^p\left( \sigma _i^L\sigma _i^U-(\sigma _i^L +\sigma _i^U)(c_i^Tx-d_i^Tx)\right) ,\\ Er_4(x)&{}=&{} \displaystyle \frac{1}{4}\sum _{i =1}^p \left( (c_i^Tx-d_i^Tx) - \sigma _i^L \right) \left( \sigma _i^U-(c_i^Tx-d_i^Tx) \right) , \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} (u_5)\ \ \left\{ \begin{array}{rcl} \Phi _5(x)&{}=&{} \displaystyle \left( \frac{1}{2}\sum _{i=1}^{p} \left( (c_i^Tx)^2+(d_i^Tx)^2\right) +\hat{a}^Tx+\hat{a}_0\right) \\ &{}&{}+\frac{1}{2}\sum\limits_{i =1}^p\left( \sigma _i^L\sigma _i^U -(\sigma _i^L+\sigma _i^U) (c_i^Tx-d_i^Tx)\right) ,\\ Er_5(x)&{}=&{} \displaystyle \frac{1}{2}\sum _{i =1}^p \left( (c_i^Tx-d_i^Tx) - \sigma _i^L \right) \left( \sigma _i^U-(c_i^Tx-d_i^Tx) \right) , \end{array}\right. \end{aligned}$$

with \(\frac{1}{16}\sum _{i =1}^p (\sigma _i^U-\sigma _i^L)^2\) and \(\frac{1}{8}\sum _{i =1}^p (\sigma _i^U-\sigma _i^L)^2\) the maximum error for \(Er_4(x)\) and \(Er_5(x)\), respectively, obtained at \((c_i^Tx-d_i^Tx)=\frac{\sigma _i^L+\sigma _i^U}{2}\), \(i=1,\dots ,p\).

Proof

Being \(\sigma _i^L\le (c_i^Tx-d_i^Tx)\le \sigma _i^U\) for all \(x \in S\cap \left[ \sigma ^L , \sigma ^U\right]\) and for all \(i=1,\dots ,p\), from Lemma 3 it yields:

$$\begin{aligned} -(c_i^Tx-d_i^Tx)^2\ge -(\sigma _i^L+\sigma _i^U)(c_i^Tx-d_i^Tx)+\sigma _i^L\sigma _i^U \end{aligned}$$

Hence, \(\Phi _4(x)\) and \(\Phi _5(x)\) follow, respectively, from (ii) and (iii) of Theorem 3. Moreover, it results:

$$\begin{aligned} Er_4(x)= & {} f(x)-\Phi _4(x)\\&= \frac{1}{4}\sum _{i =1}^p\left( 4(c_i^Tx)(d_i^Tx) -(c_i^Tx+d_i^Tx)^2+(\sigma _i^L+\sigma _i^U)(c_i^Tx-d_i^Tx) -\sigma _i^L\sigma _i^U\right) \\&= \frac{1}{4}\sum _{i =1}^p\left( -(c_i^Tx-d_i^Tx)^2 +(\sigma _i^L+\sigma _i^U)(c_i^Tx-d_i^Tx)-\sigma _i^L\sigma _i^U\right) \\&= \frac{1}{4}\sum _{i =1}^p\left( (c_i^Tx-d_i^Tx) - \sigma _i^L \right) \left( \sigma _i^U-(c_i^Tx-d_i^Tx) \right) \end{aligned}$$

and

$$\begin{aligned} Er_5(x)&= f(x)-\Phi _5(x)\\&= \frac{1}{2}\sum _{i =1}^p\left( 2(c_i^Tx)(d_i^Tx) -(c_i^Tx)^2-(d_i^Tx)^2+(\sigma _i^L+\sigma _i^U)(c_i^Tx-d_i^Tx) -\sigma _i^L\sigma _i^U\right) \\&= \frac{1}{2}\sum _{i =1}^p\left( -(c_i^Tx-d_i^Tx)^2 +(\sigma _i^L+\sigma _i^U)(c_i^Tx-d_i^Tx)-\sigma _i^L\sigma _i^U\right) \\&= \frac{1}{2}\sum _{i =1}^p\left( (c_i^Tx-d_i^Tx) - \sigma _i^L\right) \left( \sigma _i^U-(c_i^Tx-d_i^Tx) \right) \end{aligned}$$

\(\square\)

Remark 4

Notice that \((u_4)\) dominates \((u_5)\) since \(Er_5(x)=2\cdot Er_4(x)\). For this very reason, \((u_5)\) has been given just for the sake of completeness and will no more be used in the rest of the paper.

3.3 Further hybrid underestimation functions

For the sake of completeness, some more underestimation functions of f will be studied by applying the eigendecomposition approach to the D.C. forms provided by (i) and (ii) of Theorem 3. Specifically speaking, each of them can be rewritten as follows:

  1. (i)

    \(\displaystyle f(x)=\left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( x^TQ_1x\right)\),

  2. (ii)

    \(\displaystyle f(x)=\left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( x^TQ_2x\right)\),

with \(Q_1=\frac{1}{2}\sum _{i=1}^{p} (c_ic_i^T+d_id_i^T)\) and \(Q_2=\frac{1}{4}\sum _{i=1}^{p} (c_i-d_i)(c_i-d_i)^T\) symmetric positive semidefinite matrices. Hence, there exist two orthonormal matrices \(\tilde{U},\hat{U}\in {\mathbb {R}}^{n\times n}\) and two diagonal matrices \(\tilde{D},\hat{D}\in {\mathbb {R}}^{n\times n}\) such that \(Q_1=\tilde{U}\tilde{D}\tilde{U}^T\) and \(Q_2=\hat{U}\hat{D}\hat{U}^T\). The diagonal elements of \(\tilde{D}\) are the nonnegative eigenvalues \(\tilde{\lambda }_1,\dots ,\tilde{\lambda }_n\in {\mathbb {R}}\) of \(Q_1\), while the orthonormal columns \(\tilde{u}_1,\dots ,\tilde{u}_n\in {\mathbb {R}}^n\) of \(\tilde{U}\) are the corresponding eigenvectors of \(Q_1\); in similar way, the diagonal elements of \(\hat{D}\) are the nonnegative eigenvalues \(\hat{\lambda }_1,\dots ,\hat{\lambda }_n\in {\mathbb {R}}\) of \(Q_2\), while the orthonormal columns \(\hat{u}_1,\dots ,\hat{u}_n\in {\mathbb {R}}^n\) of \(\hat{U}\) are the corresponding eigenvectors of \(Q_2\). As a consequence, since \(Q_1\) and \(Q_2\) have no negative eigenvalues, it results:

$$\begin{aligned} x^TQ_1x=\sum _{i=1}^{n} \tilde{\lambda }_i (\tilde{u}_i^Tx)^2=\sum _{i\in \Theta ^+} (\tilde{v}_i^Tx)^2 \ \ \ \text {and}\ \ \ \ x^TQ_2x=\sum _{i=1}^{n} \hat{\lambda }_i (\hat{u}_i^Tx)^2=\sum _{i\in \Gamma ^+} (\hat{v}_i^Tx)^2 \end{aligned}$$

with \(\Theta ^+=\left\{ i=1,\dots ,n\ : \ \tilde{\lambda }_i>0\right\}\), \(\Gamma ^+=\left\{ i=1,\dots ,n\ : \ \hat{\lambda }_i>0\right\}\), \(\tilde{v}_i=\sqrt{\ \tilde{\lambda }_i}\ \cdot \tilde{u}_i\) for all \(i\in \Theta ^+\), and \(\hat{v}_i=\sqrt{\ \hat{\lambda }_i}\ \cdot \hat{u}_i\) for all \(\forall i\in \Gamma ^+\). Hence, the following further D.C. forms hold:

$$\begin{aligned} f(x)= & {} \left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( \sum _{i\in \Theta ^+} (\tilde{v}_i^Tx)^2 \right) ; \end{aligned}$$
(4)
$$\begin{aligned} f(x)= & {} \left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) -\left( \sum _{i\in \Gamma ^+} (\hat{v}_i^Tx)^2 \right) . \end{aligned}$$
(5)

In this light, the following branching variables are suggested by (4) and (5), respectively:

$$\begin{aligned} \theta _i=\tilde{v}_i^Tx\ , \ i\in \Theta ^+ \ \ \text{ and } \ \ \gamma _i=\hat{v}_i^Tx\ , \ i\in \Gamma ^+ \end{aligned}$$

so that

$$\begin{aligned} \underline{\theta }_i=\min _{x\in S} \tilde{v}_i^Tx \ , \ \overline{\theta }_i=\max _{x\in S} \tilde{v}_i^Tx \ \ \ \text {and}\ \ \ \ \underline{\gamma }_i=\min _{x\in S} \hat{v}_i^Tx \ , \ \overline{\gamma }_i=\max _{x\in S} \hat{v}_i^Tx \end{aligned}$$

For the sake of convenience, let \(\underline{\theta }:=(\underline{\theta }_i)_{i \in \Theta ^+}\), \(\overline{\theta }:=(\overline{\theta }_i)_{i \in \Theta ^+}\), \(\underline{\gamma }:=(\underline{\gamma }_i)_{i \in \Gamma ^+}\), and \(\overline{\gamma }:=(\overline{\gamma }_i)_{i \in \Gamma ^+}\). Moreover, for all \(\theta ^L:=(\theta ^L_{i})_{i \in \Theta ^+}\) and \(\theta ^U:=(\theta ^U_{i})_{i \in \Theta ^+}\) such that \(\theta ^L \underline{\le } \theta ^U\) and for all \(\gamma ^L:=(\gamma ^L_{i})_{i \in \Gamma ^+}\) and \(\gamma ^U:=(\gamma ^U_{i})_{i \in \Gamma ^+}\) such that \(\gamma ^L \underline{\le } \gamma ^U\), the following rectangles are introduced:

$$\begin{aligned} \left[ \theta ^L , \theta ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n :\, \theta _i^L \le \tilde{v}_i^Tx\le \theta _i^U \ \ \ \forall i\in \Theta ^+\right\} ;\\ \left[ \gamma ^L , \gamma ^U\right]= & {} \left\{ x\in {\mathbb {R}}^n :\, \gamma _i^L \le \hat{v}_i^Tx\le \gamma _i^U \ \ \ \forall i\in \Gamma ^+\right\} . \end{aligned}$$

Rectangles \(\left[ \underline{\theta } , \overline{\theta }\right]\) and \(\left[ \underline{\gamma } , \overline{\gamma }\right]\) result to be the smallest partitions \(\Pi _0\) containing S with respect to branching variables \(\theta\) and \(\gamma\), respectively.

Theorem 6

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (i) and (ii) of Theorem 3. Then, the following convex quadratic functions are underestimation functions for f over \(S\cap \left[ \theta ^L , \theta ^U\right]\) and \(S\cap \left[ \gamma ^L , \gamma ^U\right]\), respectively:

$$\begin{aligned} (u_6)\ \ \left\{ \begin{array}{rcl} \Phi _6(x)&{}=&{} \displaystyle \left( \frac{1}{2}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) \\ &{}&{}+ \sum\limits_{i\in \Theta ^+}\left( \theta _i^L\theta _i^U-(\theta _i^L+\theta _i^U) \tilde{v}_i^Tx\right) ,\\ Er_6(x)&{}=&{} \displaystyle \sum\limits _{i\in \Theta ^+} \left( \tilde{v}_i^Tx-\theta _i^L\right) \left( \theta _i^U-\tilde{v}_i^Tx\right) , \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} (u_7)\ \ \left\{ \begin{array}{rcl} \Phi _7(x)&{}=&{} \displaystyle \left( \frac{1}{4}\sum _{i=1}^{p} (c_i^Tx+d_i^Tx)^2+\hat{a}^Tx+\hat{a}_0\right) \\ &{}&{}+ \sum\limits_{i\in \Gamma ^+}\left( \gamma _i^L\gamma _i^U -(\gamma _i^L+\gamma _i^U) \hat{v}_i^Tx\right) ,\\ Er_7(x)&{}=&{} \displaystyle \sum\limits _{i\in \Gamma ^+} \left( \hat{v}_i^Tx-\gamma _i^L\right) \left( \gamma _i^U-\hat{v}_i^Tx\right) , \end{array}\right. \end{aligned}$$

with \(\frac{1}{4}\sum _{i\in \Theta ^+} (\theta _i^U-\theta _i^L)^2\) and \(\frac{1}{4}\sum _{i\in \Gamma ^+} (\gamma _i^U-\gamma _i^L)^2\) the maximum errors for \(Er_6(x)\) and \(Er_7(x)\), respectively, obtained at \(\tilde{v}_i^Tx=\frac{\theta _i^L+\theta _i^U}{2}\), \(i\in \Theta ^+\), and \(\hat{v}_i^Tx=\frac{\gamma _i^L+\gamma _i^U}{2}\), \(i\in \Gamma ^+\).

Proof

The thesis follows in the same lines of Theorems 4 and 5. \(\square\)

3.4 A particular case

In many applicative problems (see the forthcoming Sect. 5), the linear multiplicative objective function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) has the following particular structure:

$$\begin{aligned} f(z,\beta ,g)=\sum _{i=1}^{p}\beta _ig_i + q(z,\beta ,g), \end{aligned}$$
(6)

where \({z\in {\mathbb {R}}^{m}}\), \(\beta :=(\beta _{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^{p}\), \(g:=(g_{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^{p}\), \(q(z,\beta ,g)\) a linear or a convex quadratic term, and \(n:=m+2p\). In this light, it is worth studying the behavior of the underestimation functions proposed so far in the particular case of objective functions of type (6).

Theorem 7

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be expressed as in (6). Then, the following underestimation functions for f are stated:

  1. (i)

    for \(\beta \in [\beta ^L,\beta ^U]\) and \(g\in [g^L,g^U]\), it results

    $$\begin{aligned} \Phi _{1}(z,\beta ,g)&= \displaystyle q(z,\beta ,g) + \sum _{i =1}^p \left( g_i^L \beta _i+\beta _i^L g_i-\beta _i^Lg_i^L\right) ,\\ \Phi _{2}(z,\beta ,g)&= \displaystyle q(z,\beta ,g) + \sum _{i =1}^p \left( g_i^U \beta _i+\beta _i^U g_i-\beta _i^U g_i^U\right) ,\\ \Phi _3(z,\beta ,g)&= \left( \frac{1}{2}\sum _{i=1}^{p} (\beta _i+g_i)^2+q(z,\beta ,g)\right) \\{} & {} +\frac{1}{2} \sum _{i=1}^{p} \left( \beta _i^L\beta _i^U+g_i^Lg_i^U - (\beta _i^L+\beta _i^U) \beta _i-(g_i^L+g_i^U)g_i\right) , \end{aligned}$$

    and \(\Phi _6(z,\beta ,g)=\Phi _3(z,\beta ,g)\);

  2. (ii)

    for \((\beta -g)\in [\sigma ^L,\sigma ^U]\), it results

    $$\begin{aligned} \Phi _4(z,\beta ,g)&= \displaystyle \left( \frac{1}{4}\sum _{i=1}^{p} (\beta _i+g_i)^2+q(z,\beta ,g)\right) \\{} & {} +\frac{1}{4}\sum _{i =1}^p \left( \sigma _i^L\sigma _i^U-(\sigma _i^L+\sigma _i^U)(\beta _i-g_i)\right) \end{aligned}$$

    and \(\Phi _0(z,\beta ,g)=\Phi _7(z,\beta ,g)=\Phi _4(z,\beta ,g)\).

Proof

Firstly, from Theorem 3, it results:

$$\begin{aligned}& \triangleright \ \ \displaystyle f(z,\beta ,g)=\left( \frac{1}{2}\sum _{i=1}^{p} (\beta _i+g_i)^2+q(z,\beta ,g)\right) -\left( \frac{1}{2}\sum _{i=1}^{p} \left( \beta _i^2+g_i^2\right) \right) ,\end{aligned}$$
$$\begin{aligned}& \triangleright \ \ \displaystyle f(z,\beta ,g)=\left( \frac{1}{4}\sum _{i=1}^{p} (\beta _i+g_i)^2+q(z,\beta ,g)\right) -\left( \frac{1}{4}\sum _{i=1}^{p} (\beta _i-g_i)^2\right)\end{aligned}$$

Hence, the introduced \(\Phi _1\), \(\Phi _2\), \(\Phi _3\) and \(\Phi _4\) are underestimation functions for f due to Theorems 3, 4 and 5. As regards to \(\Phi _0\) notice that:

$$\begin{aligned} \beta _i g_i = \frac{1}{2} (\beta _i, g_i)\left[ \begin{array}{cc} 0 &{} 1\\ 1 &{} 0\end{array}\right] \left( \begin{array}{c} \beta _i\\ g_i \end{array}\right) . \end{aligned}$$

In particular, \(\left[ \begin{array}{cc} 0 &{} 1\\ 1 &{} 0\end{array}\right]\) has eigenvalues \(\lambda =1\) and \(\lambda =-1\) with corresponding eigenvectors \((1,1)^T\) and \((1,-1)^T\), respectively; hence, \(\left( (1,-1)(\beta _i, g_i)^T\right) ^2=(\beta _i-g_i)^2\). Instead, with respect to \(\Phi _6\), notice that:

$$\begin{aligned} \beta _i^2 + g_i^2 = (\beta _i, g_i)\left[ \begin{array}{cc} 1 &{} 0\\ 0 &{} 1 \end{array}\right] \left( \begin{array}{c} \beta _i\\ g_i \end{array}\right) . \end{aligned}$$

Being that \(\left[ \begin{array}{cc} 1 &{} 0\\ 0 &{} 1\end{array}\right]\) has eigenvalue \(\lambda =1\) with algebraic multiplicity equal to 2 and orthogonal eigenvectors \((1,0)^T\) and \((0,1)^T\), then \(\left( (1,0)(\beta _i, g_i)^T\right) ^2+\left( (0,1)(\beta _i, g_i)^T\right) ^2=\beta _i^2 + g_i^2\). Furthermore, relatively to \(\Phi _7\), notice that:

$$\begin{aligned} (\beta _i - g_i)^2 = (\beta _i, g_i)\left[ \begin{array}{cc} 1 &{} -1\\ -1 &{} 1 \end{array}\right] \left( \begin{array}{c} \beta _i\\ g_i \end{array}\right) . \end{aligned}$$

In particular, \(\left[ \begin{array}{cc} 1 &{} -1\\ -1 &{} 1\end{array}\right]\) has eigenvalues \(\lambda =2\) and \(\lambda =0\) with corresponding eigenvectors \((1,-1)^T\) and \((1,1)^T\), respectively; hence, \(\left( (1,-1)(\beta _i, g_i)^T\right) ^2=(\beta _i-g_i)^2\). \(\square\)

Remark 5

It is worth to underlay that (ii) of Theorem 7 results to be of great interest in the light of the computational results that will be presented in the next Section.

4 A computational experience

The solution method described and discussed in the previous sections has been implemented in a macOS 12.5.1 environment with an M1 Pro 10-core processor, MATLAB 2022a for coding and Gurobi 9.5.2 as solver for LP and QP problems. In this section, the results of some computational tests are presented, where the performances are compared with respect to the various underestimation functions previously studied. In this light, various instances have been randomly generated by using the “randi()” MATLAB function (integer numbers generated with uniform distribution). The average times spent to solve the instances (obtained with the “tic” and “toc” MATLAB commands), as well as the average number of iterations in procedure “Solve()” needed to solve them, are given as results of the computational tests. The used tolerance parameters are \(relTol=2^{-35}\) and \(errTol=2^{-20}\). For the sake of simplicity, in the instances generation we fixed the values \(a_0=0\) and \({c_0}_i,{d_0}_i=0\) for all \(i\in 1,\dots ,p\), we considered no equality constraints \(A_{eq}x=b_{eq}\) and a number of inequality constraints \(m=2n\). Moreover, the two following cases are taken into account in the instances generation:

  • A “general” case, where vectors a, , \(c_i\) and \(d_i\) have been randomly generated with components in the interval \([-4,4]\), vectors \(l_b\) and \(u_b\) have been generated with components in the interval \([-10,10]\), matrix \(A_{in}\) has been generated with components in the interval \([-10,10]\), \(b_{in}\) has been generated in order to guarantee a feasible region different from the box \([l_b,u_b]\) and nonempty;

  • A “nonnegative” case, where vectors \(c_i\) and \(d_i\) have been randomly generated with components in the interval [0, 4], vector a has been generated with components in the interval \([-4,4]\), vectors \(l_b\) and \(u_b\) have been generated with components in the interval [0, 15], matrix \(A_{in}\) has been generated with components in the interval \([-10,10]\), \(b_{in}\) has been generated in order to guarantee a feasible region different from the box \([l_b,u_b]\) and nonempty.

Remark 6

The “nonnegative” case is aimed to study the behavior of the underestimation functions when both variables and branching variables are nonnegative, just like sometimes assumed in the literature (see, e.g., Zhou et al. 2015; Shen et al. 2020) and thus covering as a particular case the applicative problems described in Sect. 4.

4.1 A first comparison of all the underestimations

First of all, it is worth comparing all the introduced underestimation functions. The standard value \(\alpha =0.5\) has been assumed for splitting all the underestimations. Starting from instances having \(n=10\) and \(p=4\), the behaviors with p increased to \(p=6\) and with n increased to \(n=20\) are considered too. Moreover, both the “general” and the “nonnegative” cases are taken into account. The average times and average iterations are given as results of this first computational test and are summarized in the following tables. Six groups of instances (depending on n, p, and “general”/“nonnegative” cases) and 100 instances for each group are considered, with a grand total of 4200 problems solved.

Table 1 Average number of iterations
Table 2 Average elapsed times (secs)

The results provided in Tables 1 and 2 point out that:

  • In the “general” case the linear underestimation functions provide very bad tightness and hence very bad performance, while in the “nonnegative” case their performance has the same order of magnitude of the other underestimations;

  • Among the quadratic underestimation functions, \(u_4\) is always much better than \(u_3;\)

  • Among the hybrid underestimation functions, \(u_7\) is always much better than \(u_6\) (this follows being \(u_6\) derived from \(u_3\) and being \(u_7\) derived from \(u_4\));

  • The most performing underestimation functions are always \(u_0\), \(u_4\), \(u_7;\)

  • In the “nonnegative” case, the performance of the underestimation functions are always better than the “general” case: this is due to the tightness of underestimation functions which results to be more effective in the “nonnegative” case than in the “general” one;

  • Among the linear and quadratic underestimation functions, increasing the value of p affects the performances much more than increasing the number of variables n;

  • Performances follow the number of branching variables of the various underestimation functions; in this light, recall that \(u_1\), \(u_2\) and \(u_3\) have 2p branching variables, \(u_4\) has p branching variables, while \(u_0\), \(u_6\) and \(u_7\) have \(\left| \Lambda ^-\right|\), \(\left| \Theta ^+\right|\) and \(\left| \Gamma ^+\right|\) branching variables, respectively.

4.2 A deep comparison of \(u_0\), \(u_4\), \(u_7\)—part 1

The previous subsection pointed out that the most performing underestimations are \(u_0\), \(u_4\) and \(u_7\) (and recall that, in the particular case of Sect. 4, these underestimations coincide). The aim of this subsection is to focus on the behavior of these underestimations with respect to the parameter \(\alpha\) used to split the partitions. Assuming a number of variables \(n=25\), instances for \(p=4\), \(p=7\) and \(p=10\) are considered in both the “general” and the “nonnegative” cases. Values \(\alpha\) from 0 to 1 are tested. The average times and average iterations are given as results of this second computational test and are summarized in the following tables. Six groups of instances (depending on p, and “general”/“nonnegative” cases) and 100 instances for each group are considered, with a grand total of 19800 problems solved. Numbers in bold emphasize the best results (lower values) in terms of iterations or computational time. Notice that, at the best of our knowledge, no detailed studies have been published regarding to the impact of the splitting parameter \(\alpha\) in the behavior of the branch-and-bound method (for example, just \(\alpha =0.25\) is used in Gerard et al. (2017) and just \(\alpha =0.8\) is considered in Fampa et al. (2017)). Taking into account the results in Tables 3, 4, 5 and 6, it is worth noticing that:

Table 3 Average number of iterations—“general” case—\(n=25\)
Table 4 Average elapsed times (secs)—“general” case—\(n=25\)
Table 5 Average number of iterations—“nonnegative” case—\(n=25\)
Table 6 Average elapsed times (secs)—“nonnegative” case—\(n=25\)
  • underestimation functions \(u_4\) and \(u_7\) have similar performances when \(p<n\); in this light, notice that \(u_4\) can be easily obtained while \(u_7\) needs some eigenvectors to be computed; take into account also that in the case \(p>n\) underestimation \(u_7\) has less branching variables than \(u_4\) and hence is more performing;

  • \(u_0\) has the best performances, but needs some eigenvectors to be computed;

  • the use of \(\alpha =1.0\) should be avoided since provides bad performances;

  • the greater is the value of p, the smaller is the value of \(\alpha\) providing the best performance; in this light, the parameters suggested in Fampa et al. (2017), Gerard et al. (2017) seem no useful;

  • performances in the “nonnegative” case are much better than the ones in the “general” case (underestimations’ tightness results better in the “nonnegative” case than in the “general” one);

  • performances decrease exponentially with respect to the number of branching variables (and, hence, with respect to p);

  • as regards the particular class of problems described in Sect. 4, in which \(u_0\), \(u_4\) and \(u_7\) coincide, the best choice is to use \(u_4\) which needs no eigendecompositions.

4.3 A deep comparison of \(u_0\), \(u_4\), \(u_7\)—part 2

The aim of this subsection is to focus on the behavior of underestimations \(u_0\), \(u_4\) and \(u_7\) with respect to the number of variables n. Assuming a parameter \(p=10\), instances for \(n=5\), \(n=10\) and \(n=20\) are considered in both the “general” and the “nonnegative” cases. Values \(\alpha\) from 0 to 1 are tested. The average times and average iterations are given as results of this second computational test and are summarized in the following tables. Six groups of instances (depending on n, and “general”/“nonnegative” cases) and 100 instances for each group are considered, with a grand total of 19800 problems solved. Numbers in bold emphasize the best results (lower values) in terms of iterations or computational time. In other words, a detailed computational experience is provided to show the behavior of underestimations \(u_0\), \(u_4\) and \(u_7\) in the cases \(n<p\), \(n=p\) and \(n>p\). The results provided in Tables 7, 8, 9 and 10 point out that:

Table 7 Average number of iterations—“general” case—\(p=10\)
Table 8 Average elapsed times (secs)—“general” case—\(p=10\)
Table 9 Average number of iterations—“nonnegative” case—\(p=10\)
Table 10 Average elapsed times (secs)—“nonnegative” case—\(p=10\)
  • Underestimation \(u_0\) is the most performing one (but needs eigenvalues and eigenvectors to be computed);

  • Comparing \(u_4\) and \(u_7\) (recall that \(u_7\) is derived from \(u_4\) by means of an eigen-decomposition), performances are similar when \(n>p\), \(u_7\) is better than \(u_4\) when \(n=p\), while \(u_7\) outperforms \(u_4\) when \(n<p\); this behavior yields from the number of splitting variables which results to be smaller than or equal to \(\min \{n,p\}\);

  • Performances in the “nonnegative” case are much better than the ones in the “general” case;

  • The higher is the value of n the smaller the value of \(\alpha\) should be.

4.4 Overall comments

The main results of this computational experience are:

  • \(u_4\) is the most performing underestimation function based on the structure of linear multiplicative functions thus avoiding the numerical troubles of eigenvectors computing;

  • Linear underestimations, often used in the literature, are actually worse than quadratic underestimations;

  • The parameter \(\alpha\) has no value better than others, unlike the literature proposes;

  • In the particular applicative case described in Sect. 4\(u_4\) is the best choice and there is no need at all to use eigenvectors;

  • The “nonnegative” case results to have better performances than the “general” one, and this deserves to be deepened on in future researches.

5 Toward applications to bilevel problems

In Sect. 3.4, a special linear multiplicative problem is considered in order to point out its behavior with respect to the underestimation functions previously introduced. At the same time, the study of this particular case is also motivated by the fact that the structure (6) is commonly present in several bilevel programming problems of leader-follower type (see, e.g., Dempe 2020 for a state of the art). Specifically speaking, leader-follower type problems are hierarchical mathematical formulations with an upper and lower-level structure: the upper/leader-level is a suitable optimization problem partially constrained by a second (parametric) optimization problem as lower/follower-level.

In many real-world applications formulated as bilevel programming problems, the upper/leader-level objective function is defined as in (6), that is, \(f(z,\beta ,g)=\sum _{i=1}^{p}\beta _ig_i + q(z,\beta ,g)\). By using the notation introduced in Sect. 3.4, if \(\beta :=(\beta _{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^{p}\) represents the upper/leader variable (for instance, a vector of clearing prices), \(g:=(g_{i})_{i=1,\ldots ,p} \in {\mathbb {R}}^{p}\) represents the lower/follower variable (for instance, a vector of certain quantities sold) and \(x:=(z,\beta )\) so that \(f(z,\beta ,g) \equiv f(x,g)\), then the standard form of the resulting optmistic bilevel programming problem is the following:

$$\begin{aligned}&\min _{x,g} f(x,g)\nonumber \\&\ \text {s.t.}\ \ {\left\{ \begin{array}{ll} x \in X\\ g \in \left\{ g \in K(x): h(x,g)=\min _{\tilde{g} \in K(x)}h(x,\tilde{g})\right\} , \end{array}\right. } \end{aligned}$$
(7)

where

$$\begin{aligned} X:=\left\{ x \in {\mathbb {R}}^{p+m}: \gamma _{i_{u}}(x) \le 0\,,\ i_{u}=1,\ldots ,k_{u}\right\} \end{aligned}$$

and

$$\begin{aligned} K(x):=\left\{ g \in {\mathbb {R}}^{p}: \zeta _{i_{l}}(x,g) \le 0\,,\ i_{l}=1,\ldots ,k_{l}\right\} \ \ \ \ \forall x \in X \end{aligned}$$

are the upper/leader and the lower/follower feasible sets, respectively. Under suitable assumptions (see, e.g., Dempe and Zemkoho 2012, 2013) and under opportune constraints qualification conditions (see, e.g., Aussel and Svensson 2019; Dempe 2020), the optimistic bilevel programming problem (7) can be transformed into a single-level optimization problemFootnote 1, that is, a Mathematical Program with Equilibrium Constraints (MPEC) in which the lower/follower problem is reformulated by using opportune Karush-Kuhn-Tucker (KKT) optimality conditions. Notice that, in this case, the lower-level KKT conditions result to be:

$$\begin{aligned} {\left\{ \begin{array}{ll} \nabla _{g} h(x,g) + \nabla _{g} \zeta (x,g) ^{T} \eta = 0,\\ \eta \underline{\ge } 0,\ \ -\zeta (x,g)\underline{\ge } 0, \ \ \eta ^{T}\zeta (x,g)=0. \end{array}\right. } \end{aligned}$$
(8)

Then, the resulting KKT reformulation of the optimistic bilevel programming problem (7) is:

$$\begin{aligned}&\min _{x,g,\eta } f(x,g)\nonumber \\&\ \text {s.t.}\ \ {\left\{ \begin{array}{ll} x \in X\\ (g,\eta ) \in KKT(x):=\{(g, \eta ):\ \text{ conditions } \ (8)\ \text{ are } \text{ satisfied } \} \end{array}\right. } \end{aligned}$$
(9)

Remark 7

The KKT reformulation allows to transform the optimistic bilevel programming problem (7) into a single-level program (9) at the cost of introducing the new variables \(\eta\) in the formulation of the problem. In addition, the presence of complementarity constraints implies that the feasible region of the resulting problem is no more a polyhedron.

From a computational point of view, the linear multiplicative function f can be studied by using opportunely the results obtained in Sect. 3.

Moreover, the complementarity constraints of the lower/follower problem in the KKT reformulation merged with the upper/leader-level, can be managed in various ways:

  • If the upper and lower level of (7) are linear multiplicative, then a classical procedure to solve bilevel programs is to consider a suitable penalization of \(\eta ^{T}\zeta (x,g)\) that could be used to get a linear multiplicative single-level optimization problem (see Section 2.4.3 in Bard 1997, for example) at the cost of introducing the new penalty parameter in the formulation of the problem;

  • An outer approximation branch-and-bound method based on a feasible region relaxation where some of the products \(\eta _{i_{l}}\zeta _{i_{l}}(x,g)=0\) are omitted and some of the variables \(\eta _{i_{l}}\) or \(\zeta _{i_{l}}(x,g)\) are fixed to zero;

  • A quadratic indefinite penalty function \(M\sum _{i_{l}=1}^{k_{l}} \eta _{i_{l}}\zeta _{i_{l}}(x,g)\), with \(M>>0\) great enough, to be added to the objective function (again, a branch-and-bound may deserve);

  • Binary \(0-1\) variables and the so called big-M method (constraints \(\eta _{i_{l}}\zeta _{i_{l}}(x,g)=0\) substituted with \(\eta _{i_{l}}\le \delta _{i_{l}} M\) and \(\zeta _{i_{l}}(x,g)\le (1-\delta _{i_{l}})M\), \(\delta _{i_{l}}\in \{0,1\}\) and \(M>>0\) great enough), a branch-and-bound approach may be needed to manage the binary variables.

Remark 8

Complementarity constraints can be directly managed by means of a branch and bound approach where various subproblems are solved. Moreover, further approaches can be used in the case the available solvers are able to manage particular constraints. In this light, as \(\eta _{i_{l}}\ge 0\) and \(-\zeta _{i_{l}}(x,g)\ge 0\), then the following conditions (i)–(v) are equivalent:

  1. (i)

    \(\eta _{i_{l}}\zeta _{i_{l}}(x,g)=0\);

  2. (ii)

    \(\left( \eta _{i_{l}}+\zeta _{i_{l}}(x,g)\right) ^2 =\left( \eta _{i_{l}}-\zeta _{i_{l}}(x,g)\right) ^2\);

  3. (iii)

    \(\left( \eta _{i_{l}}+\zeta _{i_{l}}(x,g)\right) =\left| \eta _{i_{l}}-\zeta _{i_{l}}(x,g)\right|\);

  4. (iv)

    \(\left( \eta _{i_{l}}+\zeta _{i_{l}}(x,g)\right) =\max \left\{ \eta _{i_{l}}-\zeta _{i_{l}}(x,g);0\right\} +\max \left\{ \zeta _{i_{l}}(x,g)-\eta _{i_{l}};0\right\}\);

  5. (v)

    \(\left\{ \eta _{i_{l}},\zeta _{i_{l}}(x,g)\right\}\) is a “SOS1” set of variables (Special Ordered Set of type 1, set of variables where at most one variable may be nonzero, all others being at 0).

As a consequence, solvers able to manage quadratic indefinite constraints, absolute value constraints, “max” constraints or SOS constraints could be useful too.

In the context of this class of problems, although opportune numerical experiments could be of great interest to provide a performance comparison with works on the topic available in the literature (see Kleinert and Schmidt 2021, for example), it is beyond the scope of this work. Anyway, future investigations will be in this direction in relation to the study of suitable real-world problems.

6 Conclusions

In this paper, an extended computational experience regarding linear multiplicative problems is provided and a unifying framework to approach such problems with a branch-and-bound method is fully described. In this light, several underestimation functions are studied and various partitioning criteria are compared. In particular, it has been shown that the quadratic underestimation function \(u_4\) has to be chosen to avoid eigenvectors-based approaches. As regards the splitting parameter \(\alpha\), as higher the number of branching variables (or the number of variables) as smaller the value of \(\alpha\) should be. The proposed solution method results to be very efficient in the case of nonnegative variables and nonnegative branching variables. In this light, a particular case (useful in applications and in bilevel programming) has been theoretically studied in deep, pointing out also that \(u_4\) is the underestimation to be used. The results obtained within the proposed unifying framework provide detailed comparisons and improvements with respect to the current literature.