1 Introduction

This paper is focused on the solution of root-finding problems in several variables where the system is composed by algebraic second-degree equations. This kind of problem is of interest in many application areas, including queuing problems, neutron transport theory, linear quadratic differential games (see, e.g., [29] and references therein). Our work is motivated by the study of nonnegative steady states of biological interaction networks which frequently arise in systems biology [10, 18, 19]. In particular, the problem of determining the steady states of complex chemical reaction networks (CRNs) in healthy and cancer cells is considered [23, 34]. In this respect, the G1/S transition of the cell cycle is a critical time when a cell becomes healthy or cancerous depending on the presence of certain predetermined genetic mutations. Several hundreds of proteins and as many reactions are involved in this transition, resulting in large ODE systems. A method for rapidly solving the system of equations is critical for modeling therapies that alter the rate constants of the system, such as current molecularly targeted therapies. The in silico study of these therapies is a primary perspective that this method aims to support. Indeed, by applying the law of mass action, the kinetics of the concentration of the proteins involved in the network can be modeled by a large first-order polynomial system of ordinary differential equations (ODEs). Finally, when no exogenous factors are considered, the equation system is quadratic and autonomous [8, 17, 44]. From an abstract point of view, steady states of large system of quadratic equations are far from being known, as a general theory exists up to the two-dimensional case [31]. In the case of an ODE representing a CRN, unknown concentrations cannot assume negative values, and then, the asymptotically steady states must fulfill the nonnegative constrained algebraic system of equations deriving from setting equal to zero the time derivatives of the ODE system. The number of involved unknown protein concentrations may scale up to several hundreds, and an efficient and accurate algorithm to solve the nonnegative steady-state problem is at the basis of tuning the kinetic parameters of the ODE system starting from experimental data, thus enabling the study of cell cancer behavior in real applications.

From a computational point of view, equilibrium can be found either in the direct way by taking the limit of the flux of the ODEs, or imposing the vanishing of the derivatives and solving the corresponding root-finding problem. The direct approach is computationally expensive, especially when the orbits of the dynamical system are bent around the equilibrium point as the time to run across the orbit may become arbitrarily large [15]. On the other hand, the main drawback of the second strategy is that these systems do not usually show any mathematical property which can ensure convergence of the root-finding algorithm. Indeed, pertinent good mathematical properties, such as matrix positive definiteness, depend on the form of the considered biological network and are not ensured in a general case. The typical structure of the ODE system associated with a chemical reaction network (based on the mass action law) not only prevents us from exploiting recent methods to find the steady-state solutions by solving vector quadratic equations [29], but also makes it difficult to use classical methods, such as the Newton’s or gradient descent methods. Indeed, as the nonnegative steady state normally belongs to the frontier of the positive cone, and therefore, it has many components equal to zero, classical nonnegative projected Newton-type methods are unstable as the Jacobian matrix—computed in a neighborhood of the solution—is strongly sparse and noninvertible. Moreover, classical projected gradient methods are known to be stable but with slow convergence, especially in cases of coupling them with a nonnegative projection.

In this work, we propose to overcome these limitations, by introducing a root-finding strategy based on combining steps of Newton’s method and steps of gradient descent. While the Newton’s method is applied to the algebraic equation system, the gradient descent is applied by scalarizing the system, i.e., minimizing the norm of the l.h.s. of the equation system [24]. To make the Newton’s method more stable, instead of the standard orthogonal projection, we use a nonlinear projection operator onto the nonnegative orthant that is substantially an idempotent operator providing small positive entries rather than zero components. Doing so, it improves the condition number of the Jacobian matrix preventing the Newton’s step to be unstable and hence making regularization unnecessary. Therefore, we combine the (nonlinearly projected) Newton’s method with a gradient method to iteratively refine the starting point of the former until we get the convergence to a nonnegative stationary point. We prove the convergence of this combined technique provided that a proper backtracking rule on the gradient method is considered. Moreover, we test the efficiency of the proposed technique in the case of simulated CRN data, showing that, compared to standard ODE solvers, this method computes the steady states achieving greater accuracy in less time. The MATLAB® codes implementing the proposed approach are freely available at the GitHub repository https://github.com/theMIDAgroup/CRC_CRN.git.

The rest of the paper is organized as follows. In Sect. 2, we introduce the mathematical formulation of the problem and we describe the proposed algorithm whose convergence properties are studied in Sect. 3. In Sect. 4, we consider the problem of finding the asymptotically steady states of a CRN and we reformulate it as a nonnegative constrained root-finding problem. In Sect. 5, we show the results obtained by applying NLPC to a CRN designed for modeling cell signaling in colorectal cells and the most common mutations occurring in colorectal cancer. Finally our conclusions are offered in Sect. 6.

2 Mathematical Formulation

We consider the box-constrained set of nonlinear equations

$$\begin{aligned}{} & {} \textbf{f}(\textbf{x}) = \textbf{0} \nonumber \\{} & {} \textbf{x} \in \varOmega , \end{aligned}$$
(1)

where is the Cartesian product of n closed intervals \(\varOmega _i \subseteq {\mathbb {R}}\), and \(\textbf{f}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) is a continuously differentiable function on \(\varOmega \). In the considered problem of finding the nonnegative steady states of quadratic autonomous ODE systems, \(\textbf{f}\) is composed by second-degree polynomials and \(\varOmega \) is the positive convex cone.

Several numerical approaches have been proposed to solve (1). Among these, a classical fast approach is the projected Newton’s method [2, 3] where the projector on the closed convex set \(\varOmega \), \(P:{\mathbb {R}}^n \rightarrow \varOmega \) such that for all \(\textbf{z} \in {\mathbb {R}}^n\)

$$\begin{aligned} P(\textbf{z}) = \underset{\textbf{y} \in \varOmega }{\textrm{argmin}} || \textbf{y} - \textbf{z}|| , \end{aligned}$$
(2)

is applied at each iteration of a Newton’s scheme so that the final solution satisfies the box constraints in (1). However, in the general case convergence properties of the projected Newton’s methods strongly depend on the initial point, as no global convergence is guaranteed [26]. Additionally, the standard orthogonal projector P tends to provide iterative estimates on the boundary of \(\varOmega \) (e.g., when \(\varOmega \) is the positive cone P sets to zero all the negative components), and therefore, it may compromise the stability of the Newton’s method as the Jacobian of \(\textbf{f}\) can be singular computed at these boundary estimates.

An alternative approach to Newton’s method consists in using the projected gradient descent method [21, 25] for solving the optimization problem

$$\begin{aligned} \textbf{x} = \underset{\textbf{x} \in \varOmega }{\textrm{arg min}} \, \varTheta (\textbf{x}) , \end{aligned}$$
(3)

where

$$\begin{aligned} \varTheta (\textbf{x}) = \frac{1}{2} ||\textbf{f}(\textbf{x})||^2 . \end{aligned}$$
(4)

As opposite to Newton’s method, many convergence results may be proved for the projected gradient methods, see, e.g., [3, 42] and references therein. On the other hand, the projected gradient method only has a sub-linear convergence rate and thus results to be slower than the Newton’s algorithm also when properly designed strategies for selecting the step size are used [1, 11, 12, 33].

Motivated by this consideration, some recent works have proposed to combine the two approaches [9, 13, 22, 36]. Along this lines we present the nonlinearly projected combined (NLPC) method that is summarized in Algorithm 1. The main ideas behind NLPC are two. First of all, we replace the classical projector with a novel operator \(\mathcal {P}\), introduced in the following definition, that ensures the constraint \(\textbf{x} \in \varOmega \) to be respected while lowering the probability that the points defined at each iteration reach the boundary of \(\varOmega \).

Definition 2.1

Given , \(\varOmega _i \subseteq {\mathbb {R}}\) convex for all \(i \in \{1, \dots , n\}\), and given \(\textbf{x} = \left( x_1, \dots , x_n \right) ^\top \in \varOmega \), we define the operator \(\mathcal {P}(\, \cdot \, ; \, \textbf{x}): {\mathbb {R}}^n \rightarrow \varOmega \) so that, for all \(\textbf{z} \in {\mathbb {R}}^n\), \(\mathcal {P}(\textbf{z} \, ; \, \textbf{x}) = \left( p_{1}(z_1 \, ; \, x_1), \dots , p_{n}(z_n \, ; \, x_n) \right) ^\top \) where

$$\begin{aligned} p_{i}(v \, ; \, w) = {\left\{ \begin{array}{ll} v \quad \text {if} \quad v \in \varOmega _i\\ w \quad \text {if} \quad v \not \in \varOmega _i \end{array}\right. } \end{aligned}$$

with \(v\in {\mathbb {R}}\) and \(w\in \varOmega _i \subset \mathbb R\).

The second idea behind NLPC method was inspired by [9] and consists in trying at each iteration a fixed number of step lengths \(\alpha ^j,\, j \in \{ 0, \dots , J\}\), along the Newton’s direction \(\textbf{d}_k\), where \(\textbf{d}_k\) is defined as the solution of the set of equations \({\textbf{J}}_{\textbf{f}}(\textbf{x}_k) \textbf{d}_k = -\textbf{f}(\textbf{x}_k)\), being \({\textbf{J}}_{\textbf{f}}(\textbf{x}_k)\) the Jacobian matrix of \(\textbf{f}\) evaluated in \(\textbf{x}_k\). If none of the tested step sizes satisfies the Armijo Rule

$$\begin{aligned} \Vert \textbf{f}(\mathcal {P}(\textbf{x}_k + \alpha ^{j} \textbf{d}_k \, ; \, \textbf{x}_k) )\Vert \le \sqrt{1- \alpha ^{j} \sigma _N} \ \Vert \textbf{f}(\textbf{x}_k)\Vert , \end{aligned}$$
(5)

we then move along the gradient descent direction with a step size chosen so as to satisfy two conditions that, as we shall prove in Theorem 3.5, guarantee a convergence result for NLPC algorithm.

Algorithm 1
figure a

The NLPC algorithm

3 Convergence Properties of the NLPC Method

Now we present a convergence analysis of NLPC algorithm after describing the main tools exploited in the algorithm and the main properties of \(\mathcal {P}\).

Definition 3.1

Given \(\textbf{x} \in \varOmega \), \(\textbf{d} \in {\mathbb {R}}^n {\setminus } \{\textbf{0}\} \) and \(\alpha > 0\), we define

$$\begin{aligned}{} & {} \mathcal {B}(\textbf{x}, \textbf{d}):= \left\{ i \in \{1, \dots , n\}\ s.t.\ x_i + \alpha d_i \notin \varOmega _i \ \forall \alpha >0 \right\} \end{aligned}$$
(6)
$$\begin{aligned}{} & {} \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d}):= \left\{ i \in \{1, \dots , n\}\ s.t.\ x_i + \alpha d_i \in \varOmega _i \right\} \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \mathcal {N}_{\alpha }(\textbf{x}, \textbf{d}):= \{1, \dots , n\}{\setminus } \left( \mathcal {B}(\textbf{x}, \textbf{d}) \cup \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d}) \right) . \end{aligned}$$
(8)

For the ease of notation, when \(\textbf{d} = - \nabla \varTheta (\textbf{x})\), the set defined in (6), (7) and (8) will be simply denoted as \(\mathcal {B}(\textbf{x})\), \(\mathcal {M}_{\alpha }(\textbf{x})\) and \(\mathcal {N}_{\alpha }(\textbf{x})\), respectively.

Fig. 1
figure 1

a Example where \(i \in \mathcal {B}(\textbf{x}, \textbf{d})\). b Example where \(i \in \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})\) and \(i \in \mathcal {N}_{2 \alpha }(\textbf{x}, \textbf{d})\)

Remark 3.1

It can be easily shown that, for all \(\alpha > 0\),

$$\begin{aligned} \mathcal {B}(\textbf{x}, \textbf{d})\ \cup \ \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d}) \ \cup \ \mathcal {N}_{\alpha }(\textbf{x}, \textbf{d}) \ = \ \left\{ 1, \dots , n \right\} \end{aligned}$$
(9)

and the three sets are pairwise disjoint. More in detail, as illustratively depicted in Fig. 1a, the set \(\mathcal {B}(\textbf{x}, \textbf{d})\) contains all the coordinates i that prevent \(\textbf{d}\) from being a feasible direction as moving along the corresponding component \(d_i\) violates the constraint in (1). As an example, when , \(\ell _i < u_i\),

$$\begin{aligned} \mathcal {B}(\textbf{x}, \textbf{d}) = \left\{ i \in \{1, \dots , n\}\ s.t. \ (x_i = \ell _i \wedge d_i < 0) \vee (x_i = u_i \wedge d_i > 0)\right\} . \end{aligned}$$

Instead, fixed a step size \(\alpha >0\), \(\mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})\) contains all the components i for which \(x_i + \alpha d_i\) still satisfies the constraint of the problem, while \(\mathcal {N}_{\alpha }(\textbf{x}, \textbf{d})\) collects the components for which the step size \(\alpha \) is too big, but a feasible vector may be found by lowering it, see Fig. 1b.

Proposition 3.1

Given \(\textbf{x} \in \varOmega \) and \(\textbf{d} \in {\mathbb {R}}^n\) it holds

  1. (a)

    For all \(\alpha > 0\)

    $$\begin{aligned} \left( \textbf{x} - \mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x}) \right) ^\textrm{T} \left( \textbf{x} + \alpha \textbf{d} - \mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x}) \right) = 0 \end{aligned}$$
    (10)

    and

    $$\begin{aligned} ||\mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x}) - \textbf{x} || = \alpha \sqrt{\sum _{i \in \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})}{d_i^2}}; \; \end{aligned}$$
    (11)
  2. (b)

    \({\textbf{g}}: (0, \infty ) \rightarrow \varOmega \) s.t. \({\textbf{g}}(\alpha ) = \mathcal {P}(\textbf{x} + \alpha \textbf{d} \, ; \, \textbf{x})\) is continuous in 0;

  3. (c)

    \(\varphi : (0, \infty ) \rightarrow {\mathbb {R}}\) s.t. \(\varphi (\alpha ) = \frac{||\mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x})- \textbf{x}||}{\alpha }\) is monotonically nonincreasing.

Proof

(a) Equations (10) and (11) follow from Definition 2.1 which implies

$$\begin{aligned} \begin{aligned} ( \textbf{x} -&\mathcal {P}(\textbf{x} + \alpha \textbf{d} \, ; \, \textbf{x}) )^\textrm{T} \left( \textbf{x} + \alpha \textbf{d} - \mathcal {P}(\textbf{x} + \alpha \textbf{d} \, ; \, \textbf{x}) \right) \\&=\sum _{i=1}^n {\left( x_i - p_{i}(x_i + \alpha d_i \, ; \, x_i) \right) \left( x_i + \alpha d_i - p_{i}(x_i + \alpha d_i \, ; \, x_i)) \right) } = 0 \, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} ||\mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x}) - \textbf{x} ||&= \sqrt{\sum _{i=1}^n (p_{i}(x_i + \alpha _i d_i \, ; \, x_i)-x_i)^2} = \alpha \sqrt{\sum _{i\in \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})} d_i^2} . \end{aligned} \end{aligned}$$

(b) The result directly follows from Eq. (11). Indeed

$$\begin{aligned} ||{\textbf{g}}(\alpha ) - {\textbf{g}}(0)|| = ||\mathcal {P}(\textbf{x}+\alpha \textbf{d} \, ; \, \textbf{x}) - \textbf{x} || = \alpha \sqrt{\sum _{i \in \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})}{d_i^2}} \le \alpha \ ||\textbf{d}|| \xrightarrow [\alpha \rightarrow 0^+]{} 0 . \end{aligned}$$

(c) We observe that Eq. (11) implies \(\varphi (\alpha ) = \sqrt{\sum _{i\in \mathcal {M}_{\alpha }(\textbf{x}, \textbf{d})} d_i^2}\). Since \(\varOmega _i\) is a convex set, given \(0 \le \alpha _1 \le \alpha _2\) it holds \(\mathcal {M}_{\alpha _2}(\textbf{x}, \textbf{d}) \subseteq \mathcal {M}_{\alpha _1}(\textbf{x}, \textbf{d})\), and thus,

$$\begin{aligned} \begin{aligned} \varphi (\alpha _1) - \varphi (\alpha _2)&= \sqrt{\sum _{i\in \mathcal {M}_{\alpha _1}(\textbf{x}, \textbf{d})} d_i^2} - \sqrt{\sum _{i\in \mathcal {M}_{\alpha _2}(\textbf{x}, \textbf{d})} d_i^2} \ge 0 . \end{aligned} \end{aligned}$$

\(\square \)

As we shall see in the next theorems, the results shown in Proposition 3.1 allow us to prove convergence properties of the proposed NLPC algorithm similar to those holding when the classical projector on the closed set \(\varOmega \) is employed instead of the operator \(\mathcal {P}\) [3, 9].

Theorem 3.1

Given \(\varTheta : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) a continuously differentiable function on \(\varOmega \) and \(\textbf{x} \in \varOmega \), then \(\textbf{x}\) is a stationary point of \(\varTheta \) in \(\varOmega \) iff

$$\begin{aligned} \mathcal {P}(\textbf{x} - \alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) = \textbf{x} \quad \forall \, \alpha > 0 . \end{aligned}$$
(12)

Proof

Let us consider the projector P on the closed convex set \(\varOmega \), defined in Eq. (2). The following properties hold: (i) \(P(\textbf{z}) = (P_1(z_1), \dots , P_n(z_n))\), being

$$\begin{aligned} P_i(z_i) = \left\{ \begin{array}{cl} \ell _i &{} \quad \text {if} \quad z_i \le \ell _i \\ z_i &{} \quad \text {if} \quad \ell _i< z_i < u_i \\ u_i &{} \quad \text {if} \quad z_i \ge u_i \end{array}\right. , \end{aligned}$$

where we denoted \(\varOmega _i = [\ell _i, u_i]\) with \(\ell _i,\, u_i \in {\mathbb {R}} \cup \left\{ \pm \infty \right\} \); and (ii) \(\textbf{x}\) is a stationary point iff \(P(\textbf{x} - \alpha \nabla \varTheta (\textbf{x})) = \textbf{x}\) \(\forall \, \alpha > 0\) [3].

We now assume that condition (12) holds, and thus, \(\forall i \in \{1, \dots , n\}\),

$$\begin{aligned} p_{i}(x_i - \alpha \partial _i\varTheta (\textbf{x}) \, ; \, x_i) = x_i \quad \forall \alpha > 0 . \end{aligned}$$

For each \(i \in \{1, \dots , n\}\), we then have only three possibilities:

  • \(x_i \in (\ell _i, u_i)\) and \(\partial _i\varTheta (\textbf{x}) = 0\). Then \(P_i(x_i - \alpha \partial _i\varTheta (\textbf{x})) = P(x_i) = x_i \ \forall \alpha > 0\);

  • \(x_i = \ell _i\) and \(\partial _i\varTheta (\textbf{x}) \ge 0\). In this case, \(P_i(x_i - \alpha \partial _i\varTheta (\textbf{x})) = \ell _i = x_i \ \forall \alpha > 0\);

  • \(x_i = u_i\) and \(\partial _i\varTheta (\textbf{x}) \le 0\). In this case, \(P_i(x_i - \alpha \partial _i\varTheta (\textbf{x})) = u_i = x_i \ \forall \alpha > 0\).

In all three cases, we obtained \(P_i(x_i - \alpha \partial _i\varTheta (\textbf{x})) = x_i\) \(\forall \alpha > 0\) which implies that \(\textbf{x} \in \varOmega \) is a stationary point on \(\varOmega \).

Conversely, consider a stationary point \(\textbf{x} \in \varOmega \) and let us assume it exists \(\alpha > 0\) such that \(\mathcal {P}(\textbf{x} - \alpha \nabla \varTheta (\textbf{x}); \textbf{x}) \ne \textbf{x}\). From Proposition 3.1 (a) it follows

$$\begin{aligned} \begin{aligned} 0&= \left( \textbf{x} - \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) \right) ^\textrm{T} \left( \textbf{x} -\alpha \nabla \varTheta (\textbf{x}) - \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) \right) \\&= ||\textbf{x} - \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x})||^2 - \alpha \nabla \varTheta (\textbf{x})^\textrm{T} \left( \textbf{x} - \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) \right) , \end{aligned} \end{aligned}$$

and thus,

$$\begin{aligned} \nabla \varTheta (\textbf{x})^\textrm{T} \left( \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) - \textbf{x}\right) = - \frac{||\textbf{x} - \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x})||^2}{\alpha } < 0 . \end{aligned}$$
(13)

Equation (13) contradicts the assumption of \(\textbf{x}\) being a stationary point that would imply \(\nabla \varTheta (\textbf{x})^\textrm{T}(\textbf{z}-\textbf{x}) \ge 0\) \(\forall \, \textbf{z} \in \varOmega \) [3]. \(\square \)

Theorem 3.2

Given \(\varTheta : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) a continuously differentiable function on \(\varOmega \) and \(\textbf{x} \in \varOmega \) that is not a stationary point of \(\varTheta \), then it exists \(\alpha ^*>0\) such that \(\forall \alpha \in (0,\alpha ^*]\) \(\left( \mathcal {P}(\textbf{x} - \alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) - \textbf{x} \right) \) is a descent direction for \(\varTheta \).

Proof

Since \(\textbf{x}\) is not a stationary point, according to Theorem 3.1 it exists \(\alpha ^* > 0\) such that \(\mathcal {P}(\textbf{x} - \alpha ^* \nabla \varTheta (\textbf{x}); \textbf{x}) \ne \textbf{x}\), and thus, \(\mathcal {P}(\textbf{x} - \alpha \nabla \varTheta (\textbf{x}); \textbf{x}) \ne \textbf{x} \ \forall \alpha \in (0, \alpha ^*]\) because \(\varOmega _i\) is a convex set \(\forall \ i \in \{1, \dots , n\}\). Therefore, from (13) it follows \(\nabla \varTheta (\textbf{x})^\textrm{T} \left( \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) - \textbf{x}\right) < 0\). \(\square \)

Theorem 3.3

Given \(\varTheta : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) a continuously differentiable function on \(\varOmega \), \(\textbf{x} \in \varOmega \) and \(\sigma _G \in (0, 1)\), it exists \({\overline{\alpha }}>0\) so that for all \(\alpha \in (0, {\overline{\alpha }}]\)

$$\begin{aligned} \varTheta (\mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x})) \leqslant \varTheta (\textbf{x}) + \sigma _G \nabla \varTheta (\textbf{x})^\textrm{T} \left( \mathcal {P}(\textbf{x} - \alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) - \textbf{x} \right) . \end{aligned}$$
(14)

Proof

If \(\mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) = \textbf{x}\) for all \(\alpha >0\), then the thesis holds for any \({\overline{\alpha }} > 0\). Therefore, we can assume it exists \({\widetilde{\alpha }} \in (0, 1)\) such that \(\mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x}) \ne \textbf{x}\) for all \(\alpha \in (0, {\widetilde{\alpha }}]\). In the following, we shall denote \(\textbf{x}(\alpha ):= \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x})\).

By the mean value theorem, it exists \(\varvec{\xi }_{\alpha }\) on the segment between \(\textbf{x}\) and \(\textbf{x}(\alpha )\) so that

$$\begin{aligned} \begin{aligned} \varTheta (\textbf{x}(\alpha )) - \varTheta (\textbf{x})&= \nabla \varTheta (\varvec{\xi }_{\alpha })^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x}\right) \\&= \sigma _G \nabla \varTheta (\textbf{x})^\textrm{T} \left( \textbf{x}(\alpha ) -\textbf{x} \right) - (\sigma _G -1) \nabla \varTheta (\textbf{x})^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x} \right) \\&\qquad + \left( \nabla \varTheta (\varvec{\xi }_{\alpha }) - \nabla \varTheta (\textbf{x})\right) ^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x} \right) , \end{aligned} \end{aligned}$$

and thus, the inequality (14) can be rewritten as

$$\begin{aligned} \left( \nabla \varTheta (\varvec{\xi }_{\alpha }) - \nabla \varTheta (\textbf{x})\right) ^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x} \right) \leqslant (\sigma _G -1) \nabla \varTheta (\textbf{x})^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x} \right) . \end{aligned}$$

Since \(\sigma _G<1\), from Proposition 3.1 (c) it follows that

$$\begin{aligned} (\sigma _G - 1) \nabla \varTheta (\textbf{x})^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x}\right)= & {} (1-\sigma _G) \frac{||\textbf{x}(\alpha ) - \textbf{x}||^2}{\alpha } \\\geqslant & {} (1-\sigma _G) \frac{||\textbf{x}({\widetilde{\alpha }}) - \textbf{x}||}{{\widetilde{\alpha }}}\ ||\textbf{x}(\alpha ) - \textbf{x}|| > 0 . \end{aligned}$$

The theorem is proved if we show that it exists \({\overline{\alpha }} \in (0, {\widetilde{\alpha }}]\) such that

$$\begin{aligned} \begin{aligned} \left( \nabla \varTheta (\varvec{\xi }_{\alpha }) - \nabla \varTheta (\textbf{x})\right) ^\textrm{T} \left( \textbf{x}(\alpha ) - \textbf{x} \right) \leqslant (1-\sigma _G) \frac{||\textbf{x}({\widetilde{\alpha }}) - \textbf{x}||}{{\widetilde{\alpha }}}\ ||\textbf{x}(\alpha ) - \textbf{x}|| , \end{aligned} \end{aligned}$$

for all \(\alpha \in [0, {\overline{\alpha }}]\). This follows from the fact that

$$\begin{aligned} \lim _{\alpha \rightarrow 0} \left| \left( \nabla \varTheta (\varvec{\xi }_{\alpha }) - \nabla \varTheta (\textbf{x})\right) ^\textrm{T} \frac{\left( \textbf{x} - \textbf{x}(\alpha ) \right) }{||\textbf{x} - \textbf{x}(\alpha )||}\right| \leqslant \lim _{\alpha \rightarrow 0} ||\nabla \varTheta (\varvec{\xi }_{\alpha }) - \nabla \varTheta (\textbf{x})|| = 0 , \end{aligned}$$

where the last equality is a consequence of Proposition 3.1 (b) and of the regularity assumptions on \(\varTheta \). \(\square \)

Theorem 3.4

Given \(\varTheta : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\), a continuously differentiable function on \(\varOmega \), \(\textbf{x} \in \varOmega \), and \(\rho \in (0, 1]\), it exists \({\overline{\alpha }}>0\) so that, for all \(\alpha \in (0, {\overline{\alpha }}]\), \(\mathcal {N}_{\alpha } (\textbf{x})= \emptyset \), and thus,

$$\begin{aligned} \sqrt{ \sum _{i \in \mathcal {M}_{\alpha }(\textbf{x})} (P_i(x_i - \partial _i \varTheta (\textbf{x})) - x_i)^2 }\geqslant \ \rho \sqrt{ \sum _{i \in \mathcal {N}_{\alpha }(\textbf{x})} (P_i(x_i - \partial _i \varTheta (\textbf{x})) - x_i)^2 } . \end{aligned}$$
(15)

Proof

For all \(i \in \left\{ 1, \dots , n \right\} \smallsetminus \mathcal {B}(\textbf{x})\), we only have three possibilities:

  • \(x_i \in \mathring{\varOmega _i}\), where \(\mathring{\varOmega _i}\) denotes the interior of \(\varOmega _i\). Then, since \(\mathring{\varOmega }_i\) is an open set, \(\exists \, {\overline{\alpha }}_i > 0\) such that \(x_i- \alpha \partial _i \varTheta (\textbf{x}) \in \mathring{\varOmega }_i \subseteq \varOmega _i\) \(\forall \alpha \le {\overline{\alpha }}_i\);

  • \(x_i = \ell _i\) and \(\partial _i \varTheta (\textbf{x}) <0 \). Then \(x_i- \alpha \partial _i \varTheta (\textbf{x}) \in \varOmega _i\) \(\forall \alpha \le {\overline{\alpha }}_i:= -\frac{u_i - \ell _i}{\partial _i \varTheta (\textbf{x})}\);

  • \(x_i = u_i\) and \(\partial _i \varTheta (\textbf{x}) >0 \). Then \(x_i- \alpha \partial _i \varTheta (\textbf{x}) \in \varOmega _i\) \(\forall \alpha \le {\overline{\alpha }}_i:= \frac{u_i - \ell _i}{\partial _i \varTheta (\textbf{x})}\).

Therefore, for all \(i \in \left\{ 1, \dots , n \right\} \smallsetminus \mathcal {B}(\textbf{x})\) it exists \({\overline{\alpha }}_i> 0\) such that \(i \in \mathcal {M}_{\alpha }(\textbf{x})\) \(\forall \alpha \in (0, {\overline{\alpha }}_i]\). By choosing \( {\overline{\alpha }} =\min \limits _{i \in \{1, \dots , n\} \smallsetminus \mathcal {B}(\textbf{x})} {\overline{\alpha }}_i\), it follows that, for all \(\alpha \le {\overline{\alpha }}\), \(\mathcal {N}_{\alpha } (\textbf{x})= \emptyset \), and thus,

$$\begin{aligned} \sqrt{ \sum _{i \in \mathcal {M}_{\alpha }(\textbf{x})} (P_i(x_i - \partial _i \varTheta (\textbf{x})) - x_i)^2 } \geqslant 0 =\ \rho \sqrt{\sum _{i \in \mathcal {N}_{\alpha }(\textbf{x})} (P_i(x_i - \partial _i \varTheta (\textbf{x})) - x_i)^2 } .\nonumber \\ \end{aligned}$$
(16)

Hence, the theorem is proved. \(\square \)

Remark 3.2

The previous theorems hold in particular if \(\varTheta \) is defined as in equation (4). Specifically, inequality (14) is the classical Armijo rule along the projection arc where we employed the operator introduced in Definition 2.1. Inequality (15) is an additional condition that prevents NLPC from choosing a too large step size that would result in an actual update of only few components. An illustrative example can be seen in Fig. 2.

Theorems 3.3 and 3.4 together guarantee that the step size within the gradient descent step of the NLPC algorithm is well defined.

Fig. 2
figure 2

Illustration of the benefit of the additional condition (15). In a, only the first component of \(\textbf{x}\) is updated as \(1 \in \mathcal {M}_{\alpha }(\textbf{x})\) and \(2 \in \mathcal {N}_{\alpha }(\textbf{x})\). In this scenario, NLPC may get stuck in a point which is not stationary because the chosen step size is too big and the second component never updated. As shown in b, inequality (15) prevents this issue by promoting the choice of a smaller step size so that a higher number of components is updated. Here, \(\widehat{\textbf{x}} = \mathcal {P}(\textbf{x}-\alpha \nabla \varTheta (\textbf{x}) \, ; \, \textbf{x})\), \(\textbf{x}_{\min }\) is a stationary point of \(\varTheta \), and \(\rho =1\)

Henceforth, \(\left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}} \subseteq \varOmega \) and \(\left\{ \mathbf {\alpha }^{j_k} \right\} _{k \in {\mathbb {N}}}\) shall denote a sequence of points generated with the NLPC algorithm described in Algorithm 1, and the corresponding step sizes, respectively. In particular, \(\alpha \in (0, 1)\), while \(j_k\) is a suitable exponent whose value belongs to a different range depending on whether the Newton’s or the gradient descent approach has been used at the kth iteration.

Lemma 3.1

Let \(\left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}}\) be a sequence generated with the NLPC algorithm. For each \(k \in {\mathbb {N}}\)

  1. (a)

    if \(\textbf{x}_{k+1}\) has been obtained with a projected gradient descent step, then \(\varTheta (\textbf{x}_{k+1}) \le \varTheta (\textbf{x}_{k})\);

  2. (b)

    if \(\textbf{x}_{k+1}\) has been obtained with a projected Newton’s step, then

    $$\begin{aligned} \varTheta (\textbf{x}_{k+1}) \le \left( 1-\alpha ^J \sigma _N \right) ^{n_{k}+1} \varTheta (\textbf{x}_0) , \end{aligned}$$

    being \(n_{k}\) the number of projected Newton’s steps performed until iteration k;

  3. (c)

    \(0 \leqslant \varTheta (\textbf{x}_{k+1}) \leqslant \varTheta (\textbf{x}_{k})\), that is, \(\{\varTheta (\textbf{x}_{k})\}_{k \in {\mathbb {N}}}\) is bounded below by zero and not increasing.

Proof

(a) By the Armijo rule along the projected arc in Eq. (14), we have

$$\begin{aligned} \varTheta (\textbf{x}_{k+1}) \le \varTheta (\textbf{x}_{k}) + \sigma _G \nabla \varTheta (\textbf{x}_{k})^\textrm{T} (\textbf{x}_{k+1} - \textbf{x}_{k}) \le \varTheta (\textbf{x}_{k}) , \end{aligned}$$

where the last inequality follows from Theorem 3.2. Thus, we have the thesis.

(b) If \(\textbf{x}_{k+1}\) is defined with the projected Newton’s method, then it exists \(j \in \{ 0, \dots , J \}\) such that

$$\begin{aligned} \varTheta (\textbf{x}_{k+1}) \le (1-\alpha ^j \sigma _N)\ \varTheta (\textbf{x}_{k}) \le (1-\alpha ^J \sigma _N)\ \varTheta (\textbf{x}_{k}) , \end{aligned}$$
(17)

where in last inequality we exploited the fact that \(\alpha < 1\).

The thesis follows by iteratively applying (17) for each projected Newton’s step and the result in point (a) for each projected gradient descent step.

Point (c) follows straightforwardly from (a) and Eq. (17), since both \(\alpha ^J\) and \(\sigma _N\) belong to the interval (0, 1).

\(\square \)

Theorem 3.5

Let \(\left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}}\) be a sequence generated with the NLPC algorithm, and let \(\textbf{x}^*\) be an accumulation point of \(\left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}}\); then, \(\textbf{x}^*\) is a stationary point of \(\varTheta \) in \(\varOmega \). Additionally if the projected Newton’s method has been used for infinitely many k, then \(\textbf{x}^*\) is a solution of (1).

Proof

Since \(\textbf{x}^*\) is an accumulation point of \(\left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}}\), there exists a subsequence \(\left\{ \textbf{x}_{k} \right\} _{k\in K} \subseteq \left\{ \textbf{x}_k \right\} _{k \in {\mathbb {N}}} \), \(K \subseteq {\mathbb {N}}\), such that \(\lim \limits _{k(\in K) \rightarrow \infty } \textbf{x}_{k} = \textbf{x}^*\), and thus, \(\lim \limits _{k(\in K) \rightarrow \infty } \varTheta (\textbf{x}_{k}) = \varTheta (\textbf{x}^*)\). For all \(k \in K\), from Lemma 3.1 it follows

$$\begin{aligned} \varTheta (\textbf{x}_{k}) \le \left( 1-\alpha ^J \sigma _N \right) ^{n_{k}} \varTheta (\textbf{x}_0) . \end{aligned}$$

If the projected Newton’s method has been used for infinitely many k, then \(\lim \limits _{k(\in K) \rightarrow \infty } n_{k} = + \infty \); therefore,

$$\begin{aligned} \varTheta (\textbf{x}^*) = \lim \limits _{k(\in K) \rightarrow \infty } \varTheta (\textbf{x}_{k}) \le \lim \limits _{k(\in K) \rightarrow \infty } \left( 1-\alpha ^J \sigma _N \right) ^{n_{k}} \varTheta (\textbf{x}_0) = 0 . \end{aligned}$$

Hence, \(\varTheta (\textbf{x}^*) = 0\), that is, \(\textbf{x}^*\) solves (1) and is a stationary point of \(\varTheta \).

Instead, if the projected gradient direction has been used for all but finitely many iterations, then it exists \( {\overline{k}} \in {\mathbb {N}}\) so that \(\textbf{x}_{k+1}\) has been obtained through a gradient descent step \(\forall \ k \geqslant {\overline{k}}\). From Lemma 3.1 (c), it follows that \(\{\varTheta (\textbf{x}_{k})\}_{k \geqslant {\overline{k}}}\) is not increasing and bounded below by zero. Hence, it converges and

$$\begin{aligned} \lim \limits _{k \rightarrow \infty } \left( \varTheta (\textbf{x}_{k+1}) - \varTheta (\textbf{x}_{k}) \right) = 0 . \end{aligned}$$

Henceforth, we shall denote with \(\{ \alpha ^{j_k} \}_{k \in {\mathbb {N}}}\) the sequence of step sizes used within NLPC. From (14), (13) and (11), it holds

$$\begin{aligned} \begin{aligned} \varTheta (\textbf{x}_{k+1}) - \varTheta (\textbf{x}_{k})&\leqslant \sigma _G \nabla \varTheta (\textbf{x}_k)^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) \\&= - \sigma _G \frac{|| \mathcal {P}(\textbf{x}_k-\alpha ^{j_k}\nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k)-\textbf{x}_k||^2}{\alpha ^{j_k}} \\&= - \sigma _G\ \alpha ^{j_k} \sum _{i \in \mathcal {M}_{\alpha ^{j_k}}(\textbf{x}_k)} (\partial _i \varTheta (\textbf{x}_k))^2 \le 0, \end{aligned} \end{aligned}$$

and thus,

$$\begin{aligned} \lim \limits _{k (\in K) \rightarrow \infty } \alpha ^{j_k} \sum _{i \in \mathcal {M}_{\alpha ^{j_k}}(\textbf{x}_k)} (\partial _i \varTheta (\textbf{x}_k))^2 = 0 . \end{aligned}$$
(18)

Two cases exist: \(\liminf \limits _{k(\in K) \rightarrow \infty } \alpha ^{j_k} > 0\) (case 1) and \(\liminf \limits _{k(\in K) \rightarrow \infty } \alpha ^{j_k} = 0\) (case 2).

If case 1 holds, then Eq. (18) implies

$$\begin{aligned} \begin{aligned} 0&= \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i \in \mathcal {M}_{\alpha ^{j_k}}(\textbf{x}_{k})} (\partial _i \varTheta (\textbf{x}_{k}))^2 \\&\ge \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i \in \mathcal {M}_{\alpha ^{j_k}}(\textbf{x}_{k})} (P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i})^2 \\&\ge \rho ^2 \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i \in \mathcal {N}_{\alpha ^{j_k}}(\textbf{x}_{k})} (P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i})^2 , \end{aligned} \end{aligned}$$

where the last inequality comes from the constraint described by (15). Hence, in particular

$$\begin{aligned} \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i \in \mathcal {M}_{\alpha ^{j_k}}(\textbf{x}_{k})} (P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i})^2= & {} \\ \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i \in \mathcal {N}_{\alpha ^{j_k}}(\textbf{x}_{k})} (P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i})^2= & {} 0 . \end{aligned}$$

Since additionally \((P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i}) = 0\) for all \(i \in \mathcal {B}(\textbf{x}_{k})\), from Eq. (9), from the continuity of \(\nabla \varTheta \) and of the classical projector, and being \(\textbf{x}^*\) the limit point of \(\{\textbf{x}_{k}\}_{k \in K}\), it follows

$$\begin{aligned} ||P(\textbf{x}^* - \nabla \varTheta (\textbf{x}^*)) - \textbf{x}^*||^2 = \lim \limits _{k(\in K) \rightarrow \infty } \sum _{i=1}^n (P_i(x_{{k},i} - \partial _i \varTheta (\textbf{x}_{k})) - x_{{k},i})^2 = 0 . \end{aligned}$$

Hence, \(\textbf{x}^*\) is a stationary point.

On the other hand, case 2 implies it exists an infinite set \(K' \subset K\) such that \(\lim \limits _{k(\in K') \rightarrow \infty } \alpha ^{j_k} = 0\), and thus, \(\lim \limits _{k(\in K') \rightarrow \infty } \alpha ^{j_k-1} = 0\). Therefore, by defining \(J = \left\{ i \in \{1, \dots , n\}\ s.t.\ i \notin \mathcal {B}(\textbf{x}^*)\ \wedge \ |\partial _i \varTheta (\textbf{x}^*)| > 0 \right\} \), the following holds.

  1. (i)

    \(\textbf{x}^*\) is a stationary point iff \(J = \emptyset \). More specifically, \(|P_i(x^*_{i} - \partial _i \varTheta (\textbf{x}^*)) - x^*_i| = 0\) iff \(i \notin J\).

  2. (ii)

    It exists \({\overline{k}}\) such that \(\forall \ k \in K',\ k \ge {\overline{k}}\), \(J \subseteq \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_k)\). Indeed, let us consider \(i \in J\). Since in particular \(i \notin \mathcal {B}(\textbf{x}^*)\) and \(\alpha <1\), from Theorem 3.4 it follows that it exists \({\overline{j}} \in {\mathbb {N}}\) such that \(\mathcal {N}_{\alpha ^j}(\textbf{x}^*) = \emptyset \) and \(i \in \mathcal {M}_{\alpha ^j}(\textbf{x}^*)\) \(\forall j \geqslant {\overline{j}}\). It can be easily shown that, being \(\textbf{x}^*\) the limit point of \(\left\{ \textbf{x}_{k} \right\} _{k\in K}\), this implies it exists \(k_i\) such that \(\forall \ k \in K'\) with \(\ k \ge k_i\), \(i \in \mathcal {M}_{\alpha ^{j}}(\textbf{x}_k)\) \(\forall j > {\overline{j}}\). Additionally, since \(\lim \limits _{k(\in K') \rightarrow \infty } \alpha ^{j_k-1} = 0\), it exists \(k'_i \ge k_i \) such that \(\forall \ k \in K',\ k \ge k'_i\), \(\alpha ^{j_k-1} < \alpha ^{{\overline{j}}}\). Hence, the thesis follows by considering \({\overline{k}} = \max \limits _{i \in \{1, \dots , n\}}\{ k'_i\}\).

To prove that \(\textbf{x}^*\) is a stationary point, we proceed by contradiction and we assume that it exists \(i \in J\). Then, by the results in (i) and (ii), it follows

$$\begin{aligned} \begin{aligned} \lim \limits _{k(\in K') \rightarrow \infty }&\sqrt{\sum _{i \in \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\&\ge \lim \limits _{k(\in K') \rightarrow \infty } \sqrt{ \sum _{i \in J} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\&= \sqrt{ \sum _{i \in J} |P_i(x^*_{i}-\partial _i \varTheta (\textbf{x}^*))-x^*_{i}|^2 } > 0 , \end{aligned} \end{aligned}$$

while, denoted \(J^C = \left\{ 1, \dots , n \right\} {\setminus } J\),

$$\begin{aligned} \begin{aligned} \lim \limits _{k(\in K') \rightarrow \infty }&\rho \sqrt{ \sum _{i \in \mathcal {N}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\&\le \rho \lim \limits _{k(\in K') \rightarrow \infty } \sqrt{ \sum _{i \in J^C} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\&= \rho \sqrt{ \sum _{i \in J^C} |P_i(x^*_{i}-\partial _i \varTheta (\textbf{x}^*))-x^*_{i}|^2 } = 0 . \end{aligned} \end{aligned}$$

Therefore, for sufficiently large \(k \in K'\)

$$\begin{aligned}{} & {} \sqrt{ \sum _{i \in \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\{} & {} \quad \ge \rho \sqrt{ \sum _{i \in \mathcal {N}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } , \end{aligned}$$

i.e., condition (15) is satisfied by the step size \(\alpha ^{j_k-1}\) which is the last step size tried by NLPC before the chosen one. As a consequence, such a step size cannot satisfy condition (14), i.e.,

$$\begin{aligned} \begin{aligned} \varTheta ( \mathcal {P}(\textbf{x}_k&-\alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k)) - \varTheta (\textbf{x}_k)\\&> \sigma _G \nabla \varTheta (\textbf{x}_k)^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) . \end{aligned} \end{aligned}$$

By the mean value theorem, it exists \(\tau \in (0, 1)\) such that, defined \(\varvec{\xi }_k =\tau \textbf{x}_{k} + (1-\tau ) \mathcal {P}(\textbf{x}_{k} - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_{k}) \, ; \, \textbf{x}_{k})\), then

$$\begin{aligned} \begin{aligned} \varTheta ( \mathcal {P}(\textbf{x}_k&-\alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k)) - \varTheta (\textbf{x}_k) \\&= \nabla \varTheta (\varvec{\xi }_k)^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) \\&= \left( \nabla \varTheta (\varvec{\xi }_k) - \nabla \varTheta (\textbf{x}_k)\right) ^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) \\&\quad + \nabla \varTheta (\textbf{x}_k)^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) . \end{aligned} \end{aligned}$$

Together with the previous result, this implies

$$\begin{aligned} \begin{aligned} (1-\sigma _G)&\nabla \varTheta (\textbf{x}_k)^\textrm{T} \left( \textbf{x}_k - \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) \right) \\&< \left( \nabla \varTheta (\varvec{\xi }_k) - \nabla \varTheta (\textbf{x}_k)\right) ^\textrm{T} \left( \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k \right) \\&\le ||\nabla \varTheta (\varvec{\xi }_k) - \nabla \varTheta (\textbf{x}_k)|| \cdot || \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k|| ; \end{aligned} \end{aligned}$$

hence,

$$\begin{aligned} \begin{aligned} \frac{1}{1-\sigma _G} ||\nabla \varTheta (\varvec{\xi }_k) - \nabla \varTheta (\textbf{x}_k)||&> \frac{\nabla \varTheta (\textbf{x}_k)^\textrm{T} \left( \textbf{x}_k - \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) \right) }{|| \mathcal {P}(\textbf{x}_k - \alpha ^{j_k-1} \nabla \varTheta (\textbf{x}_k) \, ; \, \textbf{x}_k) - \textbf{x}_k||}\\&= \sqrt{\sum _{i \in \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} (\partial _i \varTheta (\textbf{x}_{k}))^2 } , \end{aligned} \end{aligned}$$

where the last equality comes from Eqs. (13) and (11). Therefore, from the properties of the classical projector and from the result previously shown in (ii), it follows

$$\begin{aligned} \begin{aligned} 0&= \lim \limits _{k(\in K') \rightarrow \infty } \sqrt{\sum _{i \in \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} (\partial _i \varTheta (\textbf{x}_{k}))^2 } \\&\ge \lim \limits _{k(\in K') \rightarrow \infty } \sqrt{\sum _{i \in \mathcal {M}_{\alpha ^{j_k-1}}(\textbf{x}_{k})} |P_i(x_{k,i}-\partial _i \varTheta (\textbf{x}_{k}))-x_{k,i}|^2 } \\&\ge \sqrt{\sum _{i \in J} |P_i(x^*_{i}-\partial _i \varTheta (\textbf{x}^*))-x^*_{i}|^2} , \end{aligned} \end{aligned}$$

which is possible only if \(J=\emptyset \) and this contradicts the hypothesis. \(\square \)

Finally, we remark that all the results proved in this section can be easily extended to the case where the gradient direction is scaled for example by normalizing it [43] or by exploiting Barzilai–Borwein’s step lengths [1].

4 Application to Chemical Reaction Networks

Let us consider a chemical reaction network (CRN) composed of r chemical reactions involving n well-mixed proteins. Specifically, in this work we will focus on the CRN devised for modeling cell signaling during the G1/S transition phase in colorectal cells described in [39, 40] and henceforth denoted as CR-CRN. In this case, \(n=419\) and \(r=851\).

By assuming that the law of mass action holds [38, 44], the dynamics of the CRN gives rise to a set of n ordinary differential equations (ODEs)

$$\begin{aligned} \dot{\textbf{x}} = \textbf{S} \textbf{v}(\textbf{x}, \textbf{k}), \end{aligned}$$
(19)

where the state vector \(\textbf{x} \in {\mathbb {R}}^n_+\) contains the protein molecular concentrations (nM); the superposed dot denotes the time derivative; \(\textbf{S}\) is the constant stoichiometric matrix of size \(n \times r\); \(\textbf{k} \in {\mathbb {R}}_+^r\) is the vector of the rate constants of the reactions; and \(\textbf{v}(\textbf{x}, \textbf{k}) \in {\mathbb {R}}^r_+\) is the time-variant vector of the reaction fluxes. Specifically, from the law of mass action it follows [28]

$$\begin{aligned} \textbf{v}(\textbf{x}, \textbf{k}) = \text {diag}(\textbf{k}) \textbf{z}(\textbf{x}), \end{aligned}$$
(20)

where the elements of \(\textbf{z}(\textbf{x})\) are monomials of the form \(z_j(\textbf{x}) = \prod _{i=1}^n x_i^{p_{ij}}\), \(\forall j = 1, \dots , r\). In the CR-CRN, \(p_{ij} \in \left\{ 0, 1, 2 \right\} \), because all the reactions involve up to two reactants.

Given a solution \(\textbf{x}(t)\) of system (19), a semi-positive conservation vector is a constant vector \(\varvec{\gamma } \in {\mathbb {N}}^n {\setminus } \{{\textbf{0}}\}\) for which it exists \(c \in {\mathbb {R}}_+\) so that \(\varvec{\gamma }^\textrm{T} {\textbf{x}}(t) = c \forall \ t > 0\) [37, 38]. Conservation vectors can be determined by studying the kernel of \({\textbf{S}}^\textrm{T}\) [32]. In the remaining of the paper, we shall assume that the considered CRN satisfies the following properties in terms of its conservation vectors.

  1. (i)

    The CRN is weakly elemented [38], i.e., it exists a set of independent generators \(\left\{ \varvec{\gamma }_1, \dots , \varvec{\gamma }_p \right\} \subset {\mathbb {N}}^n {\setminus } \{\textbf{0}\}\) of the semi-positive conservation vectors such that \(p = n - \text {rank}({\textbf{S}})\) and, up to a change of the proteins order,

    $$\begin{aligned} {\textbf{N}}:= \begin{bmatrix} \varvec{\gamma }_1^\textrm{T} \\ \vdots \\ \varvec{\gamma }_{p}^\textrm{T} \end{bmatrix} = \left[ {\textbf{I}}_p, {\textbf{N}}_2 \right] \, \end{aligned}$$
    (21)

    being \({\textbf{I}}_p\) the identity matrix of size \(p \times p\).

  2. (ii)

    The CRN satisfies the global stability condition [38], i.e., for each \(\textbf{c} \in {\mathbb {R}}^p_+\) it exists a unique asymptotically stable state on the stoichiometric compatibility class (SCC) \(\left\{ \textbf{x} \in {\mathbb {R}}^n_+\ \text {s.t.}\ \textbf{N}\textbf{x} = \textbf{c} \right\} \). Fixed a SCC, the corresponding asymptotically stable state \(\textbf{x}_e \in {\mathbb {R}}^n_+\) solves the system

    $$\begin{aligned}{} & {} \textbf{S} \textbf{v}(\textbf{x}, \textbf{k}) = 0 \nonumber \\{} & {} \textbf{N} \textbf{x} - \textbf{c} = 0. \end{aligned}$$
    (22)

Lemma 4.1

For a weakly elemented CRN satisfying the global stability condition, the system in (22) is equivalent to the square system

$$\begin{aligned}{} & {} \textbf{S}_2 \textbf{v}(\textbf{x}, \textbf{k}) = 0 \nonumber \\{} & {} \textbf{N} \textbf{x} - \textbf{c} = 0, \end{aligned}$$
(23)

where \(\textbf{S}_2\) is a matrix of size \((n-p) \times r\) defined by the last \(n-p\) rows of \(\textbf{S}\).

Proof

Obviously, a solution of (22) also solves (23). On the other hand, let \(\textbf{x}_e\) be a solution of (23). The theorem is proved by showing that

$$\begin{aligned} \textbf{S}_1 \textbf{v}(\textbf{x}_e, \textbf{k}) = 0 \, \end{aligned}$$

being \(\textbf{S}_1\) the matrix of size \(p \times r\) defined by the first p rows of \(\textbf{S}\). To this end, we observe that, since any conservation vector belongs to the kernel of \({\textbf{S}}^\textrm{T}\), it holds

$$\begin{aligned} \textbf{0} = \textbf{N} \textbf{S} = \textbf{S}_1 + \mathbf {N_2} \mathbf {S_2}. \end{aligned}$$

Therefore,

$$\begin{aligned} \textbf{S}_1 \textbf{v}(\textbf{x}_e, \textbf{k}) = - \mathbf {N_2} \mathbf {S_2} \textbf{v}(\textbf{x}_e, \textbf{k}) = 0. \end{aligned}$$

\(\square \)

According to Lemma 4.1, in a weakly elemented CRN satisfying the global stability condition, the equilibrium point on a fixed SCC can be computed by solving a box-constrained system as in Eq. (1), being \(\varOmega = {\mathbb {R}}_+^n\) and

$$\begin{aligned} \textbf{f}(\textbf{x}) = \left[ \begin{array}{c} \textbf{S}_2 \textbf{v}(\textbf{x}, \textbf{k}) \\ \textbf{N} \textbf{x} - \textbf{c} \end{array}\right] . \end{aligned}$$
(24)

Lemma 4.2

Consider the function \(\textbf{f}:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) defined as in Eq. (24). \(\textbf{f}\) is continuously differentiable on \({\mathbb {R}}^n_+\) and

$$\begin{aligned} \textbf{J}_{\textbf{f}}(\textbf{x}) = \left[ \begin{array}{c} \textbf{S}_2 \text {diag}(\textbf{k}) \textbf{J}_\textbf{z}(\textbf{x}) \\ \textbf{N} \end{array}\right] , \end{aligned}$$
(25)

where \([J_{\textbf{z}}(\textbf{x})]_{ji} = p_{ij} x_i^{p_{ij}-1} \prod _{\ell =1, \ell \ne i}^n x_\ell ^{p_{\ell j}}\) \(\forall i \in \{ 1, \dots , n\}\) and \(j = 1, \dots , r\).

Proof

The thesis follows from the definition of the reaction fluxes in Eq. (20). \(\square \)

5 Numerical Results on the CR-CRN

5.1 General Consideration

To show the advantages of using NLPC for computing the asymptotically stable states of a CRN, we applied it to the CR-CRN. The parameters describing the network in a physiological state have been extensively described in previous works [38,39,40] and can be downloaded from the GitHub repository https://github.com/theMIDAgroup/CRC_CRN.git as MATLAB® structure. This includes the list of proteins and reactions involved in the network, as well as the values of the rate constants \({\textbf{k}}\) and of the total conserved moieties \({\textbf{c}}\). The corresponding stoichiometric matrix \({\textbf{S}}\) and reaction fluxes \({\textbf{v}}({\textbf{x}}, {\textbf{k}})\) can be derived as described in the previous section. The aforementioned repository also contains the MATLAB® codes implementing the NLPC algorithm and the analysis shown in this paper.

We exploited the model introduced by Sommariva and colleagues [38, 39] to test the proposed approach under different biologically plausible conditions. Specifically, we modified the values of the parameters \({\textbf{k}}\) and \({\textbf{c}}\) as described in [39] to simulate the effect of some of the mutations that most commonly arise in colorectal cancer. A total of 9 different mutations was considered (loss of function of APC, AKT, SMAD4, PTEN, p53 and gain of function of k-Ras, Raf, PI3K, Betacatenin) which give rise to as many different mutated networks.

From a practical point of view, if not otherwise specified, the parameters required in input by Algorithm 1 were set as follows. The threshold within the stopping criterion was \(\tau =10^{-12}\), while \(\sigma _N = \sigma _G = 10^{-4}\) and \(\rho = 10^{-2}\). The initial step size was \(\alpha =0.79\) and a maximum of \(J=20\) step sizes was tested within each iteration of the Newton’s method. NLPC is initialized with a point \({\textbf{x}}_0\) randomly drawn from the SCC \(\left\{ \textbf{x} \in {\mathbb {R}}^n_+\ \text {s.t.}\ \textbf{N}\textbf{x} = \textbf{c} \right\} \) by exploiting the procedure presented in [38]. Additionally, we only retained points such that the condition number of the Jacobian matrix \(\textbf{J}_{\textbf{f}}(\textbf{x}_0)\) was lower than \(10^{17}\). To avoid the algorithm getting stuck in a stationary point that is not a zero of \({\textbf{f}}\), we also set a maximum number of allowed iterations: If the stopping criterion was not reached after 250 iterations, then a new initial point was drawn from the SCC and NLPC was restarted. Finally, to speed up NLPC (i) we fixed a maximum number of tested step sizes also within the gradient descent method: if conditions (14) and (15) were not met after 40 possible values of the step length, we chose the last tested value, and at the following NLPC iteration, we performed again a gradient descent step; (ii) at each iteration, we scaled the gradient direction: two scaling procedures were tested, namely we normalized the gradient and we implemented an approach inspired by the Barzilai–Borwein’s (BB) rule thoroughly described in Online Resource 1.

5.2 Comparison with a Classical Dynamic Approach

A classical approach [38, 39] for computing the stationary state of system (19) on a given SCC consists in simulating the whole concentration dynamics \(\textbf{x}(t)\) by solving the Cauchy problem

$$\begin{aligned}{} & {} \dot{\textbf{x}} = \textbf{S} \textbf{v}(\textbf{x}, \textbf{k}) \nonumber \\{} & {} \textbf{x}(0) = \textbf{x}_0, \end{aligned}$$
(26)

where \(\textbf{x}_0\) is a point on the SCC, and then computing the asymptotic value

$$\begin{aligned} \textbf{x}_\textrm{dyn} = \lim _{\textbf{t} \rightarrow +\infty } \textbf{x}(t). \end{aligned}$$
(27)

In this section, we compare the results obtained through this approach with those from NLPC algorithm.

To this end, we started from the CR-CRN and we built 10 different experiments, by varying the values of the kinetic parameters \(\textbf{k}\) and of the total conserved moieties \(\textbf{c}\) that define the SCC, so as to mimic a colorectal cell either healthy or affected by one of the 9 mutations listed in Sect. 5.1. For each experiment, we sampled 50 initial points \(\textbf{x}_0^{(j)}\) on the corresponding SCC. For each initial point, i.e., for \(j \in \{ 1, \dots , 50\}\), we computed the solutions \(\textbf{x}_\textrm{nlpc}^{(j)}\) and \(\textbf{x}_\mathrm{nlpc\_bb}^{(j)}\) provided by the NLPC algorithm when the gradient direction is scaled by its norm or through the BB-inspired scaling rule, respectively. We compared these results with the asymptotically stable state \(\textbf{x}_\textrm{dyn}^{(j)}\) computed through the dynamic approach previously described.

Specifically, as in [38], we used the MATLAB® tool ode15s [35] to integrate the ODE system in (26) on the interval \([0, 2.5 \cdot 10^7]\) and we defined \(\textbf{x}_\textrm{dyn}^{(j)}\) as the value of the computed solution at the last time point of the interval.

All the other options of the ode15s routine were kept fixed to MATLAB® default values. The choice of this parameter setting was inspired by previous works [38, 39] where the obtained results were extensively validated. In particular, one of the parameters affecting the quality of the final solution is the span of the time interval. However, some tests summarized in Online Resource 1 suggested that larger intervals do not noticeably improve the accuracy of the solution while they may introduce numerical errors.

As shown in Figs. 3 and 4, NLPC outperforms the dynamic approach in terms of both accuracy of the obtained results and computational cost. Indeed, Fig. 3 shows that, in all the 10 considered experiments, when the gradient direction is normalized, the elapsed time for the NLPC algorithm averaged across 50 runs obtained by varying the initial points, ranges from about 23 s (mutated network with gain of function of k-Ras) to 59 s (mutated network with loss of function of PTEN). Such values decrease even more when the BB rule is chosen for scaling, going down to a range between 5 and 37 s. Despite this, it is important to underline that in this second case sometimes (in our simulations, with about \(5.4 \%\) of probability) numerical issues occur compromising the stability of the method and preventing it from returning the desired equilibrium point. On the contrary, the results of the dynamic approach show an higher variability across the different CRNs and the averaged elapsed time scales up to about 8 min in the network incorporating a gain of function mutation of PI3K. It is worth noticing that, for each of the 10 experiments, few runs of NLPC required an higher elapsed time (higher than the third quartile of the corresponding distributions). These runs needed a large number of restarts of the NLPC algorithm due to the fact that the maximum number of 250 iterations was reached without meeting the stopping criterion on the norm of \(\textbf{f}\), probably because the gradient method tended to stationary points that were not roots of \(\textbf{f}\). Future work will be devoted to refining the stopping criterion so that, when needed, NLPC is restarted before reaching 250 iterations.

As a final test, we verified whether the parameter setting in the two approaches coincide in terms of solution accuracy. Since we are looking for the roots of \(\textbf{f}\), the accuracy of the obtained results was evaluated by computing the \(\ell _2\)-norm of \(\textbf{f}\) in the solutions provided by the two algorithms, namely \(\textbf{x}_\textrm{nlpc}^{(j)}\) and \(\textbf{x}_\textrm{dyn}^{(j)}\), \(j \in \{1, \dots , 50\}\). As shown in Fig. 4, for all 10 considered experiments the norm of \(\textbf{f}\) in the NLPC solutions, \(\textbf{x}_\textrm{nlpc}^{(j)}\), was always below \(10^{-12}\) as imposed by the stopping criterion of the algorithm. Instead the value of \(||\textbf{f}(\textbf{x}_\textrm{dyn}^{(j)})||\) ranged between \(10^{-2}\) and \(10^1\), regardless of the time employed to compute the solution \(\textbf{x}_\textrm{dyn}^{(j)}\).

Fig. 3
figure 3

Elapsed time for the NLPC algorithm to converge, when the gradient direction is normalized (NLPC) or is scaled through a Barzilai–Borwein-inspired rule (NLPC with BB), compared to the time required to compute the equilibrium point by solving the dynamical system in (26). Boxplots summarize the values obtained across 50 different runs for 10 distinct networks mimicking either a physiological state (phys) or a mutation affecting the protein shown in the axis labels

Fig. 4
figure 4

Accuracy as a function of the elapsed time for the NLPC algorithm (left) and the dynamic approach (right). Accuracy is quantified as the norm of \(\textbf{f}\) evaluated in the results provided by the two algorithms, \(\textbf{x}_\textrm{nlpc}\) and \(\textbf{x}_\textrm{dyn}\), respectively. In each panel, 50 different results are shown for each of the considered CRNs that mimic mutations of k-Ras, Raf and PTEN (orange diamonds), physiological state and mutations of Betacatenin, APC, AKT, SMAD4, PTEN, p53 (yellow crosses), and mutation of PI3K (purple dots). This color code has been chosen so as to cluster together results for which the times required for computing \(\textbf{x}_\textrm{dyn}\) were similar, as depicted in Fig. 3. Notice the different scale on the y-axis. The analogous plot for the NLPC with BB approach can be found in Online Resource 1

5.3 Benefits of the Operator \(\mathcal {P}\) over the Classical Projector

The goal of this section is to quantify the benefit of using the operator \(\mathcal {P}\) instead of the classical projector P on the closed convex set \(\varOmega \) defined in Eq. (2). To this end, for each of the 10 experiments defined in the previous section, and for each of the 50 initial points \({\textbf{x}}_0^{(j)}\), \(j \in \{1, \dots , 50\}\), drawn on the corresponding SCCs, we computed the solution of NLPC (with normalized gradient) by replacing in Algorithm 1 the proposed operator \(\mathcal {P}\) with the classical projector P. We denoted with \(\textbf{x}_\textrm{ort}^{(j)}\) the corresponding solution.

Fig. 5
figure 5

Number of restarts required by NLPC in order to satisfy the stopping criterion within the fixed maximum number of iterations. The boxplots describe the values obtained across 50 different runs for 10 distinct networks when, within NLPC, we employed the proposed nonlinear projector \(\mathcal {P}\) (red) or the classical orthogonal projector P (green)

As shown in Fig. 5, if combined with the classical projector, NLPC algorithm requires a higher number of restarts and thus a higher elapsed time than those required when the proposed operator is used. Specifically, the ratio between the number of restarts required by the projector P and the one required by the operator \(\mathcal {P}\), averaged over all the 10 considered experiments and all the sampled initial points, is around 4.87.

The bad performances of the projector P are caused by the fact that at any given iteration k all the negative components of the novel proposed point \(\textbf{x}_{k+1}\) are set equal to zero. As a consequence, the percentage of proteins estimated as having a null concentration increases sharply and this results in a high condition number of the corresponding Jacobian matrix \({{\textbf {J}}}_{{{\textbf {f}}}}\) defined as in (25). In turn, the ill-conditioning of \({{\textbf {J}}}_{{{\textbf {f}}}}\) compromises the stability of Newton’s method, and thus, NLPC algorithm tends to spend most of the allowed iterations by performing gradient descent steps. As shown in Table 1, the use of the operator \(\mathcal {P}\) helps preventing this issue.

Table 1 Average and standard deviation over 50 initial points of the maximum number of null components (first row) and the maximum condition number of the Jacobian matrix \(\textbf{J}_{\textbf{f}}\) (second row) reached across the iterations performed by NLPC

6 Conclusions

In this paper an iterative algorithm for solving root-finding box-constrained problems is presented. It combines Newton’s and gradient descent methods and exploits the operator \(\mathcal {P}\) in Definition (2.1) for assuring the required constraints at each iteration (and preventing numerical instability issues that would occur if the projector P was applied). Together with a suitable backtracking rule we prove that the method converges to a stationary point of the objective function in Eq. (4). Despite outperforming the dynamic approach both in accuracy and speed, in CRNs’ framework the NLPC algorithm provides less information than simulating the whole concentration dynamics. However, in many contexts such as tuning kinetic parameters starting from experimental data or for topics described in [38, 39] the comprehension of the whole dynamic is not required, but only knowing equilibrium points of the system is of interest. The present work could be extended in different aspects. On the one hand, defining and implementing a stopping criterion in case the algorithm converged to stationary points which do not coincide with roots of \(\textbf{f}\) would be interesting. On the other hand, general properties of the NLPC algorithm should be investigated in extreme scenarios such those of problems where the solution is not unique or does not exist. In our case, this means that the function \(\textbf{f}\) has multiple nonnegative roots or does not have any. Some preliminary results shown in Online Resource 1 suggest that in the former case NLPC is able to find the different roots by changing the initial point, while in the latter case the stopping criterion \(||\textbf{f}(\textbf{x}_k)|| > \tau \) is not met and the algorithm may approach local minima of \(\varTheta \) possibly on the boundary of \(\varOmega \). In this respect, a different future development of this research could go in the direction of coupling the homotopy continuation methods with the proposed numerical tools. Finally, an interesting study would concern a thorough benchmarking with other state-of-the-art methods also in terms of scalability [7, 14, 16, 20, 30]. Some preliminary tests on the comparison with the interior point optimizer (IPOPT, [27, 41]) and the scaled gradient projection (SGP [4,5,6]) method are presented in Online Resource 1.