1 Introduction

We focus on solving (upper-level) variational inequalities whose feasible set is given by the solution set of another (lower-level) variational inequality. These problems are commonly referred to as nested variational inequalities and they represent a flexible modelling tool when it comes to solving, for instance, problems arising in finance (see, e.g., the most recent Lampariello et al. (2021) for an application in the multi-portfolio selection context) as well as several other fields (see, e.g., Facchinei et al. 2014; Scutari et al. 2012 for resource allocation problems in communications and networking).

As far as the literature on nested VIs is concerned, it is still in its infancy if compared to the bilevel instance of hierarchical optimization or, more generally, bilevel structures as in Dempe (2002), Lampariello and Sagratella (2017, 2020) and Lampariello et al. (2019). However, there are two main solution methods that are most adopted in the field-related literature: hybrid-like techniques (see, e.g., Lu et al. 2009; Marino and Xu 2011; Yamada 2001) and Tikhonov-type schemes (see, e.g. Lampariello et al. (2020) for the latest developments and Kalashnikov and Kalashinikova (1996) for some earlier developments, as well as Facchinei et al. (2014) and the references therein). It should be pointed out that hybrid-like procedures very often require particularly strong assumptions [(e.g. demanding co-coercivity of the lower-level map as in Lu et al. (2009), Marino and Xu (2011) and Yamada (2001)] in order to ensure convergence or to work properly at all. Hence, we rely on the Tikhonov paradigm drawing from the general schemes proposed in Facchinei et al. (2014) and Lampariello et al. (2020), however asking for less stringent assumptions and combining the Tikhonov approach with a new averaging procedure.

We widen the scope and expand the applicability of nested variational inequalities by showing for the first time in related literature that solutions can be provably computed in the more general framework of simply monotone upper- and lower-level variational inequalities. Specifically, in Facchinei et al. (2014) and Lampariello et al. (2020), where as far as we are aware the most advanced results are obtained, the upper-level map is required to be monotone plus. Relying on a combination of a Tikhonov approach with an averaging procedure, the algorithm we propose is shown to converge provably to a solution of a nested variational inequality where the upper-level map is required to be just monotone.

We also obtain complexity results for our method. Except for Lampariello et al. (2020), not only does this analysis represent the only other complexity study in the literature of nested variational inequalities, it is also the first one in the field dealing with upper-level optimality.

2 Problem definition and motivation

Let us consider the nested variational inequality \(\text {VI}\big (G, \text {SOL}(F,Y)\big )\), where \(G: {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^n\) is the upper-level map, and \(\text {SOL}(F,Y)\) is the solution set of the lower-level VI(FY). We recall that, given a subset \(Y \subseteq {\mathbb {R}}^n\) and a mapping \(F: {\mathbb {R}}^{n} \rightarrow \mathbb R^n\), the variational inequality VI(FY) is the problem of computing a vector \(x\in {\mathbb {R}}^n\) such that

$$\begin{aligned} x \in Y, \quad F(x)^{\scriptscriptstyle T}(y-x) \ge 0,\quad \forall \,y\in Y. \end{aligned}$$
(1)

In other words, \(\text {VI}\big (G, \text {SOL}(F,Y)\big )\) is the problem of finding \(x \in {\mathbb {R}}^n\) that solves

$$\begin{aligned} x \in \text {SOL}(F,Y), \quad G(x)^{\scriptscriptstyle T}(y-x) \ge 0,\quad \forall \,y\in \text {SOL}(F,Y). \end{aligned}$$
(2)

As it is clear from (2), the feasible set of \(\text {VI}\big (G, \text {SOL}(F,Y)\big )\) is implicitly defined as the solution set of the lower-level VI(FY). The nested variational inequality (2) we consider has a purely hierarchical structure in that the lower-level problem (1) is non parametric with respect to the upper-level variables, unlike the more general bilevel structures presented in Lampariello and Sagratella (2020). Under mild conditions, VIs equivalently reformulate NEPs so that in turn, by means of structure (2), we are able to model well-known instances of bilevel optimization and address multi-follower games.

We introduce the following blanket assumptions which are widely adopted in the literature of solution methods for variational inequalities:

  1. (A1)

    the upper-level map G is Lipschitz continuous with constant \(L_{G}\) and monotone on Y;

  2. (A2)

    the lower-level map F is Lipschitz continuous with constant \(L_F\) and monotone on Y;

  3. (A3)

    the lower-level feasible set Y is nonempty, convex and compact.

Due to (A2) and (A3), SOL(FY) is a nonempty, convex, compact and not necessarily single-valued set, see e.g. Facchinei and Pang (2003, Section 2.3). As a consequence, the feasible set of the nested variational inequality (2) is not necessarily a singleton. Moreover, thanks to (A1), the solution set of the nested variational inequality (2) can include multiple points.

Notice that assumption (A1) on the upper-level map G is much less demanding than the one required in Facchinei et al. (2014) and Lampariello et al. (2020). Specifically, here we assume G to be only monotone, while in Facchinei et al. (2014) and Lampariello et al. (2020) it must be monotone plus.

For the sake of completeness, we recall that a mapping \(G: Y \subseteq {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^n\) is said to be monotone plus on Y if both the following conditions hold:

  1. 1.

    G is monotone on Y, i.e. \((G(x)-G(y))^{\scriptscriptstyle T}(x-y) \ge 0 \quad \forall \; x,y \in Y\);

  2. 2.

    \((x-y)^{\scriptscriptstyle T}(G(x)-G(y))=0 \Rightarrow G(x)=G(y) \quad \forall \; x,y \in Y\).

Consequently, we are now able to dispose of the monotonicity plus assumption on operator G, which inevitably represents a more stringent condition when compared to plain monotonicity, thus hereby asking for G to be simply monotone. Indeed, whenever the upper-level map G is nonsymmetric, requiring G to be monotone plus is “slightly less” than assuming G to be strongly monotone (see, e.g. Bigi et al. 2021). In fact, the main objective of this work is to define, for the first time in the field-related literature, an algorithm (Algorithm 1) which is able to compute solutions of the monotone nested variational inequality (2), under the weaker assumptions (A1)-(A3), see the forthcoming Theorem 1.

In addition, we study complexity properties of Algorithm 1 in detail, see Theorems 2 and 3. We highlight that all steps in Algorithm 1 can be readily implemented and no nontrivial computations are required, see e.g. the numerical illustration in Sect. 5.

Summarizing,

  • we show that the algorithm we propose is globally (subsequential) convergent to solutions of monotone nested variational inequalities under the weakest conditions in the literature,

  • we provide the first complexity analysis for nested variational inequalities considering optimality of both the upper- and lower-level [instead in Lampariello et al. (2020) just lower-level optimality is contemplated].

3 A projected averaging Tikhonov algorithm

For the sake of notation, let us introduce the following operator:

$$\begin{aligned} \Phi _\tau (x) \triangleq F(x) + \frac{1}{\tau } G(x), \end{aligned}$$

which is the classical operator used to define subproblems in Tikhonov-like methods. For any \(\tau \in {\mathbb {R}}_{++}\), by assumptions (A1) and (A2), \(\Phi _\tau \) is monotone and Lipschitz continuous with constant \(L_\Phi \triangleq L_{F} + L_{G}\) on Y. Moreover, the following finite quantities are useful in the forthcoming analysis:

$$\begin{aligned} H \triangleq \max _{y \in Y} \Vert G(y)\Vert _2, \quad R \triangleq \max _{y \in Y} \Vert F(y)\Vert _2, \quad D \triangleq \max _{v, y \in Y}\Vert v - y\Vert _2. \end{aligned}$$

For the sake of clarity, let us recall that, by definition, the Euclidean projection \(P_Y(x)\) of a vector \(x \in {\mathbb {R}}^n\) onto a closed convex subset \(Y \subseteq {\mathbb {R}}^n\) is the unique solution of the strongly convex (in y) problem

$$\begin{aligned} \begin{aligned} \mathop {\textrm{minimize}}\limits _{y} \quad&\frac{1}{2} \; (y-x)^{\scriptscriptstyle T}(y-x)\\ \text {s.t.} \quad&y \in Y.\\ \end{aligned} \end{aligned}$$

The solution of the latter problem is a unique vector \({\bar{y}} \in Y\) that is closest to x in the Euclidean norm [see, e.g. Facchinei and Pang (2003, Th.1.5.5) for an exhaustive overview on the Euclidean projector and its properties].

In our analysis we rely on approximate solutions of VIs. Specifically, we say that \(x \in K\) approximately solves VI\((\Psi ,K)\) (with \(\Psi \) continuous and K convex and compact) if

$$\begin{aligned} \Psi (x)^\top (y-x) \ge -\varepsilon , \quad \forall \, y \in K, \end{aligned}$$
(3)

where \(\varepsilon \ge 0\). Relation (3), for example, when VI\((\Psi ,K)\) defines the first-order optimality conditions of a convex problem, guarantees that the problem is solved up to accuracy \(\varepsilon \). Relation (3), in view of assumption (A3), is equivalent to

$$\begin{aligned} \min _{y \in K} \; \Psi (x)^\top (y-x) \ge -\varepsilon . \end{aligned}$$
(4)

We remark that \(\Psi (x)^\top (y-x)\) is linear in y; moreover, if K is polyhedral (as, e.g., in the multi-portfolio selection context) computing \(\min _{y \in K} \; \Psi (x)^\top (y-x)\) amounts to solving a linear optimization problem. In any event, we assume this computation to be easy to do in practice.

With the following result we relate approximate solutions of the VI subproblem

$$\begin{aligned} \Phi _\tau (x)^\top (y-x) \ge -\varepsilon _{\text {sub}}, \quad \forall \, y \in Y, \end{aligned}$$
(5)

where \(\varepsilon _{\text {sub}} \ge 0\), with approximate solutions of problem (2).

Proposition 1

Assume conditions (A1)-(A3) to hold, and let \(x \in Y\) be a solution of the VI subproblem (5) with \(\tau > 0\) and \(\varepsilon _{\text {sub}} \ge 0\). It holds that

$$\begin{aligned} G(x)^\top (y-x) \ge -\varepsilon _{\text {up}}, \quad \forall y \in \text {SOL}(F,Y), \end{aligned}$$
(6)

with \(\varepsilon _{\text {up}} \ge \varepsilon _{\text {sub}} \tau \), and

$$\begin{aligned} F(x)^\top (y-x) \ge -\varepsilon _{\text {low}}, \quad \forall y \in Y, \end{aligned}$$
(7)

with \(\varepsilon _{\text {low}} \ge \varepsilon _{\text {sub}} + \frac{1}{\tau } H D\).

Proof

We have for all \(y \in \text {SOL}(F,Y)\):

$$\begin{aligned} -\varepsilon _{\text {sub}} \tau&\le \left[ \tau F(x) + G(x)\right] ^\top (y-x) \\&\le \left[ \tau F(y) + G(x)\right] ^\top (y-x) \\&\le G(x)^\top (y-x), \end{aligned}$$

where the first inequality is due to (5), the second one comes from the monotonicity of F, and the last one is true because \(x \in Y\) and then \(F(y)^\top (x-y) \ge 0\). That is (6) is true.

Moreover, we have for all \(y \in Y\):

where the inequality is due to (5). Therefore we get (7). \(\square \)

Proposition 1 suggests a way to solve, with a good degree of accuracy, the hierarchical problem (2). That is solving the VI subproblem (5) with a big value for \(\tau \) and an \(\varepsilon _{\text {sub}}\) sufficiently small in order to make \(\varepsilon _{\text {sub}} \tau \) small enough. Following this path, we propose a Projected Averaging Tikhonov Algorithm (PATA), see Algorithm 1, to compute solutions of problem (2).

figure a

Some comments about PATA are in order. Index i denotes the outer iterations that occur when the condition in step (S.4) is verified, and they correspond to solutions \(w^{i+1}\) of the VI subproblems (5) with \(\varepsilon _{\text {sub}} = {\bar{\varepsilon }}^i\) and \(\tau = {\bar{\tau }}^i\). The sequence \(\{y^k\}\) is obtained by making classical projection steps with stepsizes \(\gamma ^k\), see step (S.2). The sequence \(\{z^k\}\) consists of the inner iterations needed to compute a solution of the VI subproblem (5), and it is obtained by performing a weighted average on the points \(y^j\), see step (S.3). Index l is included in order to let the sequence of the stepsizes \(\{\gamma ^k\}\) restart at every outer iteration and to consider only the points \(y^j\) belonging to the current subproblem to compute \(z^{k+1}\).

We remark that the condition in step (S.4) only requires the solution of an optimization problem with a linear objective function over the convex set Y (see the discussion about inexact solutions of VIs below relation (4)). In Sect. 5 we give a practical implementation of PATA.

In the following section we show that Proposition 1 can be used to prove that PATA effectively computes solutions of problem (2).

4 Main convergence properties

First of all we deal with convergence properties of PATA.

Theorem 1

Assume conditions (A1)-(A3) to hold, and let conditions

$$\begin{aligned} \sum _{k=0}^\infty \gamma ^k = \infty , \frac{\sum _{k=0}^\infty (\gamma ^k)^2}{\sum _{k=0}^\infty \gamma ^k} = 0, \frac{1}{{\bar{\tau }}^{i}} \downarrow 0, \sum _{i=0}^\infty \frac{1}{{\bar{\tau }}^{i}} = \infty , {\bar{\varepsilon }}^{i} = \frac{c}{({\bar{\tau }}^{i})^\beta }, \end{aligned}$$
(8)

hold with \(\beta > 1\) and \(c > 0\). Every limit point of the sequence \(\{w^i\}\) generated by PATA is a solution of problem (2).

Proof

First of all we show that \(i \rightarrow \infty \). Assume by contradiction that this is not true, therefore there exists an index \({\bar{k}}\) such that the condition in step (S.4) is violated for every \(k \ge {\bar{k}}\), and either \({\bar{k}} = 0\) or the condition in step (S.4) is satisfied at the iteration \({\bar{k}} - 1\). We denote \({\bar{\tau }} = \tau ^{{\bar{k}}}\), and observe that \(\tau ^k = {\bar{\tau }}\) for every \(k \ge {\bar{k}}\).

For every \(j \in [{\bar{k}}, k]\), and for any \(v \in Y\), we have

$$\begin{aligned} \begin{array}{rcl} \Vert y^{j+1} - v\Vert _2^2 &{} = &{} \Vert P_Y(y^j - \gamma ^{j-{\bar{k}}} \Phi _{{\bar{\tau }}}(y^j)) - v\Vert _2^2\\ &{} \le &{} \Vert y^j - \gamma ^{j-{\bar{k}}} \Phi _{{\bar{\tau }}}(y^j) - v\Vert _2^2\\ &{} = &{} \Vert y^j - v\Vert _2^2 + (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert _2^2 - 2 \gamma ^{j-{\bar{k}}} \Phi _{{\bar{\tau }}}(y^j)^\top (y^j - v), \end{array} \end{aligned}$$

and, in turn,

$$\begin{aligned} \Phi _{{\bar{\tau }}}(y^j)^\top (v - y^j) \ge \frac{\Vert y^{j+1} - v\Vert _2^2 - \Vert y^j - v\Vert _2^2}{2 \gamma ^{j-{\bar{k}}}} - \frac{\gamma ^{j-{\bar{k}}}}{2} \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert _2^2. \end{aligned}$$

Summing, we get

$$\begin{aligned} { \begin{array}{rcl} \frac{\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}} \Phi _{{\bar{\tau }}}(y^j)^\top (v - y^j)}{\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} &{} \ge &{} \frac{\sum _{j={\bar{k}}}^{k} \left( \Vert y^{j+1} - v\Vert _2^2 - \Vert y^j - v\Vert _2^2 - (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert _2^2\right) }{2\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} \\ &{} = &{} \frac{\left( \Vert y^{k+1} - v\Vert _2^2 - \Vert y^{{\bar{k}}} - v\Vert _2^2 - \sum _{j={\bar{k}}}^{k} (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert ^2_2\right) }{2\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} \\ &{} \ge &{} - \frac{ \left( \Vert y^{{\bar{k}}} - v\Vert _2^2 + \sum _{j={\bar{k}}}^{k} (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert ^2_2\right) }{2\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}}, \end{array} } \end{aligned}$$
(9)

which implies

$$\begin{aligned} \begin{array}{rcl} \Phi _{{\bar{\tau }}}(v)^\top (v - z^k) &{} = &{} \frac{1}{\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} \sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}} \Phi _{{\bar{\tau }}}(v)^\top (v - y^j)\\ &{} \ge &{} -\frac{\left( \Vert y^{{\bar{k}}} - v\Vert _2^2 + \sum _{j={\bar{k}}}^{k} (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert ^2_2\right) }{2\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} \\ &{} &{} + \frac{1}{\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}} \sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}} (\Phi _{{\bar{\tau }}}(v) - \Phi _{{\bar{\tau }}}(y^j))^\top (v - y^j)\\ &{} \ge &{} -\frac{\left( \Vert y^{{\bar{k}}} - v\Vert _2^2 + \sum _{j={\bar{k}}}^{k} (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert ^2_2\right) }{2\sum _{j={\bar{k}}}^{k} \gamma ^{j-{\bar{k}}}}, \end{array} \end{aligned}$$
(10)

where the last inequality is due to the monotonicity of \(\Phi _{{\bar{\tau }}}\). Denoting by \(z \in Y\) any limit point of the sequence \(\{z^k\}\), taking the limit in the latter relation and subsequencing, the following inequality holds:

$$\begin{aligned} {{ \begin{array}{rcl} \Phi _{{\bar{\tau }}}(v)^\top (v - z)\ge & {} -\frac{\left( \Vert y^{{\bar{k}}} - v\Vert _2^2 + \sum _{j={\bar{k}}}^{\infty } (\gamma ^{j-{\bar{k}}})^2 \Vert \Phi _{{\bar{\tau }}}(y^j)\Vert ^2_2\right) }{2\sum _{j={\bar{k}}}^{\infty } \gamma ^{j-{\bar{k}}}} = 0, \end{array}}} \end{aligned}$$

because \(\sum _{j={\bar{k}}}^{\infty } \gamma ^{j-{\bar{k}}} = +\infty \) and \(\left( \sum _{j={\bar{k}}}^{\infty } (\gamma ^{j-{\bar{k}}})^2\right) /\left( \sum _{j={\bar{k}}}^{\infty } \gamma ^{j-{\bar{k}}}\right) = 0\), and then z is a solution of the dual problem

$$\begin{aligned} \Phi _{{\bar{\tau }}}(v)^\top (v - z) \ge 0, \forall v \in Y. \end{aligned}$$

Hence, the sequence \(\{z^k\}\) converges to a solution of VI\((Y, \Phi _{{\bar{\tau }}})\), see e.g. Facchinei and Pang (2003, Theorem 2.3.5) in contradiction to \(\min _{y \in Y} \Phi _{{\bar{\tau }}}(z^{k+1})^\top (y - z^{k+1}) < - \varepsilon ^k = - \varepsilon ^{{\bar{k}}}\) for every \(k \ge {\bar{k}}\).

Therefore the algorithm produces an infinite sequence \(\{w^i\}\) such that \(w^{i+1} \in Y\) and

$$\begin{aligned} \Phi _{{\bar{\tau }}^i} (w^{i+1})^\top (y-w^{i+1}) \ge -\frac{c}{({\bar{\tau }}^i)^\beta }, \quad \forall \, y \in Y, \end{aligned}$$

that is (5) holds at \(w^{i+1}\) with \(\varepsilon _{\text {sub}} = \frac{c}{({\bar{\tau }}^i)^\beta }\). By Proposition 1, specifically from (6) and (7), we obtain

$$\begin{aligned} G(w^{i+1})^\top (y-w^{i+1}) \ge -\frac{c}{({\bar{\tau }}^i)^{\beta -1}}, \quad \forall y \in \text {SOL}(F,Y), \end{aligned}$$
(11)

and

$$\begin{aligned} F(w^{i+1})^\top (y-w^{i+1}) \ge -\frac{c}{({\bar{\tau }}^i)^{\beta }} - \frac{1}{{\bar{\tau }}^i} H D, \quad \forall y \in Y. \end{aligned}$$
(12)

Taking the limit \(i \rightarrow \infty \), and recalling that G and F are continuous and \(\beta > 1\), we get the desired convergence property. \(\square \)

Conditions (8) for the sequence of stepsizes \(\{\gamma ^k\}\) are satisfied, e.g., if we choose

$$\begin{aligned} \gamma ^k = \min \left\{ 1,\frac{a}{k^\alpha }\right\} , \end{aligned}$$

with \(a > 0\) and \(\alpha \in (0,1]\), see Proposition 4 in the Appendix. Another possible choice of step-size rule satisfying conditions (8), as shown in Facchinei et al. (2015), is

$$\begin{aligned} \gamma ^k=\gamma ^{k-1}(1-\theta \gamma ^{k-1}), \quad k=1,... \end{aligned}$$

where \(\theta \in (0,1)\) is a given constant, provided that \(\gamma ^0 \in (0,1]\). Both the above step-size rules satisfy conditions (8), needed for Theorem 1 to be valid.

We remark that, even if we require assumptions that are less stringent with respect to related literature, we still obtain the same type of convergence as in related literature, namely subsequential convergence to a solution of problem (2). Note that, thanks to assumption (A3), at least a limit point of sequence \(\{w^i\}\) generated by PATA exists. As it is common practice when using an iterative algorithm like PATA, referring to (11) and (12), \(w^{i+1}\) can be considered an approximate solution of problem (2) as soon as \(\frac{c}{({\bar{\tau }}^i)^{\beta -1}}\) and \(\left( \frac{c}{({\bar{\tau }}^i)^\beta } + \frac{1}{{\bar{\tau }}^i} H D \right) \) are small enough. Clearly, if the upper-level map G is strongly monotone on Y, the whole sequence \(\{w^i\}\) converges to the unique solution of problem (2).

We consider the so-called natural residual map for VI\((\Psi , K)\) (with \(\Psi \) continuous and K convex and compact)

$$\begin{aligned} U(x) \triangleq \Vert P_K(x - \Psi (x)) - x\Vert . \end{aligned}$$
(13)

As recalled in Lampariello et al. (2020), the function U is continuous and nonnegative. Moreover, \(U(x) = 0\) if and only if \(x \in \text {SOL}(\Psi , K)\). Specifically, classes of problems exist for which the value U(x) also gives an actual upper-bound to the distance between x and \(\text {SOL}(\Psi , K)\), see Lampariello et al. (2020) and the references therein. Therefore, the following condition

$$\begin{aligned} U(x) \le {\widehat{\varepsilon }}, \end{aligned}$$
(14)

with \({\widehat{\varepsilon }} \ge 0\), is alternative to (3). However, in view of the compactness of K, relations (3) and (14) turn out to be related to each other: we show in Appendix (Proposition 3) that if x satisfies (3), then (14) holds with \({\widehat{\varepsilon }} \ge \sqrt{\varepsilon }\). Vice versa, condition (14) implies (3) with \(\varepsilon \ge (\Omega + \Xi ) {\widehat{\varepsilon }}\), where \(\Omega \triangleq \max _{v, y \in K}\Vert v - y\Vert _2\) and \(\Xi \triangleq \max _{y \in K} \Vert \Psi (y)\Vert _2\).

We remark that, whenever Y is not compact, e.g. if \(Y = {\mathbb {R}}^n\), the condition in step (S.4) of Algorithm 1 involving the classical gap function for VIs (see, e.g., Facchinei and Pang 2003, page 88) cannot be relied upon and could be replaced by the natural map-related relation \(\Vert P_Y(z^{k+1} - \Phi _{\tau ^k}(z^{k+1})) - z^{k+1}\Vert \le \varepsilon ^k\), at the price of a slightly modified convergence analysis.

In order to deal with the convergence rate analysis of our method, we consider the natural residual map for the lower-level VI(FY)

$$\begin{aligned} V(x) \triangleq \Vert P_Y(x - F(x)) - x\Vert . \end{aligned}$$
(15)

Clearly, the following condition

$$\begin{aligned} V(x) \le {\widehat{\varepsilon }}_{\text {low}}, \end{aligned}$$
(16)

with \({\widehat{\varepsilon }}_{\text {low}} \ge 0\), is alternative to (7) to take care of the feasibility of problem (2).

In this context, we underline that the convergence rate we establish is intended to give an upper bound to the number of iterations needed to drive both the upper-level error \(\varepsilon _{\text {up}}\), given in (6), and the lower-level error \({\widehat{\varepsilon }}_{\text {low}}\), given in (16), under some prescribed tolerances \(\delta _{\text {up}}\) and \({\widehat{\delta }}_{\text {low}}\), respectively.

Theorem 2

Assume conditions (A1)-(A3) to hold and, without loss of generality, assume \(L_\Phi < 1\). Consider PATA. Given some precisions \(\delta _{\text {up}}, {\widehat{\delta }}_{\text {low}} \in (0,1)\), set \(\gamma ^k = \min \left\{ 1,\frac{1}{2 k^\frac{1}{2}}\right\} \), \({\bar{\tau }}^i = \max \{1,i\}\), and \({\bar{\varepsilon }}^i = \frac{1}{({\bar{\tau }}^{i})^2}\). Let us define the quantity

$$\begin{aligned} I_{\max } \triangleq \left\lceil \max \left\{ \frac{1}{\delta _{\text {up}}}, \frac{H+1}{{\widehat{\delta }}_{\text {low}}} \right\} \right\rceil . \end{aligned}$$

Then, the upper-level approximate problem (6) is solved for \(x = z^{k+1}\) with \(\varepsilon _{\text {up}} \le \delta _{\text {up}}\) and the lower-level approximate problem (16) is solved for \(x = z^{k+1}\) with \({\widehat{\varepsilon }}_{\text {low}} \le {\widehat{\delta }}_{\text {low}}\) and the condition in step (S.4) is satisfied in at most

$$\begin{aligned} \sigma \triangleq I_{\max } \left\lceil \max \left\{ I_{\max }^8 \frac{(D+R)^4}{(1-L_\Phi )^2} C_1, I_{\max }^{\frac{8}{1-2\eta }} \frac{(D+R)^{\frac{4}{1-2\eta }}}{(1-L_\Phi )^{\frac{2}{1-2\eta }}} C_{2,\eta } \right\} \right\rceil , \end{aligned}$$

iterations k, where \(\eta > 0\) is a small number, and

$$\begin{aligned} C_1 \triangleq \left( D^2 + \frac{5}{4} (R+H)^2 \right) ^2, \quad C_{2,\eta } \triangleq \left( \frac{(R+H)^2}{(4\eta )} \right) ^{\frac{2}{1-2\eta }}. \end{aligned}$$
(17)

Proof

First of all we show that if \(i \ge I_{\max }\), we reach the desired result. Specifically, about the upper-level problem (6), we obtain

$$\begin{aligned} \varepsilon _{\text {up}} = {\bar{\varepsilon }}^i {\bar{\tau }}^i = \frac{1}{i} \le \delta _{\text {up}}, \end{aligned}$$

where the first equality is due to Proposition 1, and the last inequality follows from \(i \ge I_{\max } \ge (\delta _{\text {up}})^{-1}\).

About the lower-level problem (16), preliminarily we observe that

$$\begin{aligned} \left\| P_Y(w^{i+1} - \Phi _{{\bar{\tau }}_i}(w^{i+1})) - w^{i+1}\right\| _2 \le \sqrt{{\bar{\varepsilon }}^i}, \end{aligned}$$
(18)

because \(w^{i+1}\) satisfies the condition in step (S.4) with \({\bar{\varepsilon }}^i\), see Proposition 3. Moreover, we get

$$\begin{aligned} {\widehat{\varepsilon }}_{\text {low}} \le \frac{1}{{\bar{\tau }}^i} H + \sqrt{{\bar{\varepsilon }}^i} = \frac{H+1}{i} \le {\widehat{\delta }}_{\text {low}}, \end{aligned}$$

where the first inequality is due to (26) and (18), and the last inequality follows from \(i \ge I_{\max } \ge ({\widehat{\delta }}_{\text {low}})^{-1} (H+1)\).

Now we consider the number of inner iterations needed to satisfy the condition in step (S.4) with the smallest error \({\bar{\varepsilon }}^{I_{\max }} = I_{\max }^{-2}\) and for a \(\tau > 0\). Without loss of generality, in the following developments we will assume \({\bar{k}}=0\), meaning that we are simply computing the number of inner iterations. By (10), the dual subproblem

(19)

is solved for . From Lemma 1, we obtain

$$\begin{aligned} \sum _{j=0}^{k} \gamma ^{j} \ge k^{\frac{1}{2}}, \quad \sum _{j=0}^{k} (\gamma ^{j})^2 \le \frac{5}{4} + \frac{1}{4} \ln (k) \le \frac{5}{4} + \frac{1}{4 \eta } k^\eta , \quad \eta > 0, \end{aligned}$$

because

$$\begin{aligned} \ln (k) = \int _1^k t^{-1} dt \le \int _1^k t^{-1+\eta } dt \le \frac{1}{\eta } k^\eta . \end{aligned}$$

Therefore

(20)

Now we show that

(21)

In fact, taking \(y = v^k = P_Y(z^k - \Phi _\tau (z^k)) \in Y\) in (19), we have

where the last inequality follows from the Lipschitz continuity of \(\Phi _\tau \) and the characteristic property of the projection.

From Proposition 3 and inequality (21), we obtain the following error for the subproblem

(22)

and then, by (20), the desired accuracy for the subproblem is obtained when

$$\begin{aligned} I_{\max }^{-2} = {\bar{\varepsilon }}^{I_{\max }} \ge \frac{D+R}{\sqrt{1-L_\Phi }} \max \left\{ \frac{C_1^{\frac{1}{4}}}{k^{\frac{1}{4}}}, \frac{C_{2,\eta }^{\frac{1-2\eta }{4}}}{k^{\frac{1-2\eta }{4}}}\right\} , \end{aligned}$$

that is

$$\begin{aligned} k \ge \max \left\{ I_{\max }^8 \frac{(D+R)^4}{(1-L_\Phi )^2} C_1, I_{\max }^\frac{8}{1-2\eta } \frac{(D+R)^\frac{4}{1-2\eta }}{(1-L_\Phi )^\frac{2}{1-2\eta }} C_{2,\eta } \right\} . \end{aligned}$$

The thesis follows by multiplying the number of outer iterations (\(i \ge I_{\max }\)) for the number of inner ones. \(\square \)

In order to provide other complexity results for our method, we consider the following proposition, which is the dual counterpart of Proposition 1, and provides a theoretical basis for Theorem 3.

Proposition 2

Assume conditions (A1)-(A3) to hold, and let \(x \in Y\) be an approximate solution of the dual VI subproblem:

(23)

with \(\tau > 0\) and . It holds that x turns out to be an approximate solution for the dual formulation of problem (2), that is

(24)

with , and

(25)

with .

Proof

We have for all \(y \in \text {SOL}(F,Y)\):

where the first inequality is due to (23) and the last one is true because \(x \in Y\) and then \(F(y)^\top (x-y) \ge 0\). That is (24) is true.

Moreover, we have for all \(y \in Y\):

where the inequality is due to (23). Therefore we get (25). \(\square \)

The following theorem considers a simplified version of PATA. Specifically, the parameter \(\tau \) is right away initialized to a value sufficiently large to get the prescribed optimality accuracy. Moreover, approximate optimality for problem (2) is considered only in its dual version. That said, the complexity bound obtained is better than the one given by Theorem 2.

Theorem 3

Assume conditions (A1)-(A3) to hold. Consider PATA. Given some precision \(\delta \in (0,1)\), set \(\gamma ^k = \min \left\{ 1,\frac{1}{2 k^\frac{1}{2}}\right\} \), \({\bar{\tau }}^0 = {\bar{I}}_{\max }\), and \({\bar{\varepsilon }}^0 = 0\) where

$$\begin{aligned} {\bar{I}}_{\max } \triangleq \left\lceil \frac{H+1}{\delta } \right\rceil . \end{aligned}$$

Then, the upper-level approximate dual problem (24) is solved for \(x = z^{k+1}\) with and the lower-level approximate dual problem (25) is solved for \(x = z^{k+1}\) with in at most

$$\begin{aligned} {\bar{\sigma }} \triangleq \left\lceil \max \left\{ {\bar{I}}_{\max }^4 C_1, {\bar{I}}_{\max }^{\frac{4}{1-2\eta }} C_{2,\eta } \right\} \right\rceil , \end{aligned}$$

iterations k, where \(\eta > 0\) is a small number, and \(C_1\) and \(C_{2,\eta }\) are given in (17).

Proof

First of all we denote with the error with which the current iteration solves the dual subproblem (23). Notice that as soon as , the desired accuracy for both the upper- and the lower-level dual problems is reached. In fact, as done in the proof of Theorem 2, and considering Proposition 2, we have:

where the first equality is due to Proposition 1, and the last inequality follows from \(i \ge {\bar{I}}_{\max } \ge \delta ^{-1}\), and

where the first inequality is due to (26) and (18), and the last inequality follows from \(i \ge {\bar{I}}_{\max } \ge \delta ^{-1} (H+1)\).

By (20) we have

Therefore, is implied by

$$\begin{aligned} {\bar{I}}_{\max }^{-2} \ge \max \left\{ \frac{C_1^{\frac{1}{2}}}{k^{\frac{1}{2}}}, \frac{C_{2,\eta }^{\frac{1-2\eta }{2}}}{k^{\frac{1-2\eta }{2}}}\right\} , \end{aligned}$$

that is

$$\begin{aligned} k \ge \max \left\{ I_{\max }^4 C_1, I_{\max }^\frac{4}{1-2\eta } C_{2,\eta } \right\} , \end{aligned}$$

and the thesis follows. \(\square \)

5 Numerical experiments

We now tackle a practical example which is representative of the fact that, under assumptions (A1)-(A3), PATA produces the sequence of points \(\{z^k\}\) that is (subsequentially) convergent to a solution of the hierarchical problem, while the sequence \(\{y^k\}\) never approaches the solution set. Notice that \(\{y^k\}\) coincides with the sequence produced by the Tikhonov methods proposed in Lampariello et al. (2020) when no proximal term is considered.

Let us examine the selection problem (2), where:

$$\begin{aligned} G(y)=\begin{pmatrix}0 &{} -\frac{1}{2}\\ \frac{1}{2} &{} 0\end{pmatrix} \begin{pmatrix} y_1\\ y_2\end{pmatrix}, \, F(y) = \begin{pmatrix}0 &{} 1\\ -1 &{} 0\end{pmatrix} \begin{pmatrix} y_1\\ y_2\end{pmatrix}, \, Y = {\mathbb {B}}(0,1), \end{aligned}$$

where \({\mathbb {B}}(0,1)\) denotes the unit ball. The unique feasible point and, thus, the unique solution of the problem is \(z^* = (0,0)^{\scriptscriptstyle T}\). The assumptions (A1)–(A3) are satisfied, but notice that G does not satisfy convergence conditions of the Tikhonov-like methods proposed in Facchinei et al. (2014) and Lampariello et al. (2020) because it is not monotone plus.

The generic kth iteration of PATA, in this case, should read as reported below:

$$\begin{aligned} y^{k+1} = P_Y(y^{k} - \gamma ^k [F(y^{k}) + \frac{1}{\tau ^k} G(y^{k})]), \end{aligned}$$

where we take, for example, but without loss of generality, \(\tau ^k = \tau \ge 1\) and \(\gamma ^k = \gamma > 0\). We remark that the unique exact solution of the VI subproblem (5) is the origin, and then every inexact solution, with a reasonably small error, cannot be far from it. For every k it holds that

$$\begin{aligned} y^{k+1}= P_Y\left( \begin{pmatrix} 1 &{} \gamma (\frac{1}{2\tau }-1)\\ \gamma (1 - \frac{1}{2\tau }) &{} 1 \end{pmatrix} \begin{pmatrix} y_1^{k}\\ y_2^{k}\end{pmatrix}\right) ,\\ \end{aligned}$$

hence \(\Vert y^{k+1}\Vert _2 = \min \left\{ 1, \sqrt{1+\gamma ^2(\frac{1}{2\tau }-1)^2}\ \Vert y^k\Vert _2 \right\} \). Therefore we consider \(\Vert y^0\Vert _2 = 1\) and get \(\Vert y^{k}\Vert _2 = 1\) for every k, because \(\sqrt{1+\gamma ^2(\frac{1}{2\tau }-1)^2}\ > 1\). Therefore, neither does the sequence \(\{y^k\}\) produced by PATA lead to the unique solution \(z^*\) of problem (2), nor does it approach the inexact solution set of the VI subproblem.

We now consider the sequence \(\{z^k\}\) produced by PATA. In order to show that this sequence leads us to the solution of the hierarchical problem, we analyze a numerical implementation of the algorithm. Some further considerations are in order before showing the actual implemented scheme.

  • A general rule for the update of the variable \(z^{k}\) is given by the following relation:

    $$\begin{aligned} z^{k+1}= \frac{z^k \gamma ^{\text {sum},l,k}+ \gamma ^{k+1-l}y^{k+1}}{\gamma ^{\text {sum},l,k} + \gamma ^{k+1-l}}, \end{aligned}$$

    where

    $$\begin{aligned} \gamma ^{\text {sum},l,k} \triangleq \sum _{j=l}^k \gamma ^{j-l}, \end{aligned}$$

    which gives us the expression of \(z^{k+1}\) reported in Step (S.3) in PATA. This is done in order to avoid keeping trace of all \(y^{j}\), \(j=l,..,k\), which carries a heavy computational weight. Instead, we only need to know the current value of \(z^{k}\), the sum \(\gamma ^{\text {sum},l,k}\), \(\gamma ^{k+1-l}\) and, last but not least, the current point \(y^{k+1}\). This allows us to save 4 entities only, which is far more convenient.

  • Because the feasible set \(Y={\mathbb {B}}(0,1)\) is the unit ball of radius 1, the computation of the projection steps (see Step (S.3)) becomes straightforward, since it is sufficient to divide the argument by its vector norm:

    $$\begin{aligned} P_{{\mathbb {B}}(0,1)}(w) = \frac{w}{\Vert w\Vert _2} \ \forall w:\Vert w\Vert _2 \ge 1. \end{aligned}$$

    Moreover, a closed-form expression for the unique solution u of the minimum problem at Step (S.4):

    $$\begin{aligned} u = \arg \min _{y \in {\mathbb {B}}(0,1) }\ [F(z^{k+1}) + \frac{1}{\tau ^{k}} G(z^{k+1})]^\top (y - z^{k+1}) \end{aligned}$$

    is achievable. On the basis that the feasible set \(Y={\mathbb {B}}(0,1)\) becomes an active constraint at the optimal solution u, the KKT-multiplier associated to this constraint is strictly positive. We, of course, do not know the value of the multiplier itself, but we can impose that the optimal point has Euclidean norm 1, so that it belongs to the boundary of \({\mathbb {B}}(0,1)\):

    $$\begin{aligned} u = -\frac{F(z^{k+1}) + \frac{1}{\tau ^{k}} G(z^{k+1})}{\Vert F(z^{k+1}) + \frac{1}{\tau ^{k}} G(z^{k+1})\Vert _2}. \end{aligned}$$

We now show the implemented scheme in Algorithm 2.

figure b

As far as the steps of Algorithm 2 are concerned, (S.2) and (S.3) perform step (S.2) of PATA, while (S.5) and (S.6) fulfil step (S.4) of PATA.

We set the parameters \(k^{\max } = 10^6\), \(\text {tol} = 10^{-3}\), \(a = \alpha = \frac{1}{2}\), \(\beta = 2\). Table 1 summarizes the results obtained by running Algorithm 2. It is clear to see how \(\Vert z^{k+1}\Vert _2\) tends to 0 as the number of iterations k grows, which is what we expected, being \(z^*=(0, 0)^{\scriptscriptstyle T}\) the unique solution of the problem.

Table 1 Numerical experiment for PATA

To further reiterate the elements of novelty that PATA displays, we hereby present some numerical experiments in which PATA performs better than Algorithm 1 presented in Lampariello et al. (2020). We do not intend to present a thorough numerical comparison between these solution methods, we just want to show that PATA is a fundamental solution tool when the classical Tikhonov gradient method presented in Lampariello et al. (2020) struggles to converge.

Again, for the sake of simplicity we consider \(Y={\mathbb {B}}(0,1)\). This time, we extend the problem to encompass \(n=100\) variables and consider \(G(x) = M_{G} x + b_G\) and \(F(x) = M_F\), with

\(b_G = \zeta v^{b_G}\), and \(v^{M_G}\), \(u^{M_G}\), \(w^{M_G}\), \(v^{M_F}\), \(u^{M_F}\), \(w^{M_F}\), and \(v^{b_G}\) are randomly generated between 0 and 1, \(\zeta > 0\). We remark that when \(\zeta =0\) the problem is a generalization of that in the simple example described at the beginning of this section. In our experiments, we consider the cases \(\zeta = 0.1\) and 0.01.

As far as PATA parameters are concerned, for the purpose of the implementation we set \(k^{\max } = 5 \cdot 10^4\), \(a = 1\), \(\alpha = \frac{1}{4}\) and \(\beta = 2\). As for the Tikhonov scheme proposed in Lampariello et al. (2020), we set \(\lambda = 0.1\).

Following Proposition 1, a merit function for the nested variational inequality (2) can be given by

$$\begin{aligned} \text {optimality measure}(k) \triangleq \max \left\{ \varepsilon _{\text {sub}}^k \tau ^k, \varepsilon _{\text {sub}}^k + \frac{1}{\tau ^k} \right\} . \end{aligned}$$

We generated 3 different instances of the problem and considered 2 values for \(\zeta \), for a total of 6 different test problems.

Figure 1 shows the evolution of the optimality measure as the number of inner iterations k grows towards \(k^{\max }\), when both PATA and the classical Tikhonov algorithm described in Lampariello et al. (2020) are applied to the 6 test problems.

It is clear to see how the practical implementation for PATA always outperforms the classical Tikhonov in Lampariello et al. (2020), as it needs a significant smaller number of inner iterations k to reach small values of the optimality measure. However, we remark that computing averages, such as in step (S.3) of PATA, can be computationally expensive. Hence, PATA becomes an essential alternative tool when other Tikhonov-like methods, not including averaging steps, either fail to reach small values of the optimality measure, as shown in our practical implementation (see Fig. 1), or do not converge at all.

Fig. 1
figure 1

Plots a, c and e correspond to the value \(\zeta =0.01\); plots b, d and f correspond to \(\zeta =0.1\). Each row is related to a different instance of the problem, namely a different seed for the random generation

A similar scenario can be observed also when dealing with nonlinear problems. In Table 2, we report the numerical results that are obtained by running Algorithm 2 to address a nonlinear modification of the problem defined at beginning of this section: in particular, inspired by some of the test problems in Solodov and Svaiter (1999), we take

$$\begin{aligned}{} & {} G(y)=\begin{pmatrix}0 &{}\quad -\frac{1}{2}\\ \frac{1}{2} &{}\quad 0\end{pmatrix} \begin{pmatrix} y_1\\ y_2\end{pmatrix}, \, F(y) = \begin{pmatrix}0 &{}\quad 1\\ -1 &{}\quad 0\end{pmatrix} \begin{pmatrix} y_1\\ y_2\end{pmatrix} + \begin{pmatrix} \max \{0, y_1\}^2\\ \max \{0, y_2\}^2\end{pmatrix}, \,\\{} & {} Y= {\mathbb {B}}(0,1). \end{aligned}$$

The unique solution of this problem is again the origin. As for the previous case, PATA seems to work well: it reaches a good approximation level of the exact solution in a fair amount of inner iterations. Note that the nonlinear operator F is monotone on Y but also strongly monotone on \(Y \setminus \{(0, 0)\}\): the strong monotonicity modulus vanishes when approaching the origin. This issue is clearly reflected in the behavior of a standard gradient-like version of the algorithm, that is readily obtained neglecting in Algorithm 2 the averaging step, thus taking there \(z^{k+1} = y^{k+1}\) (see Table 3). Comparing Tables 2 and 3, one can observe as, for the initial outer iterations i, PATA and the non averaged counterpart perform similarly, while, when approaching the origin, the non averaged gradient-like procedure struggles to converge (see the number of inner iterations that performed for each outer iterate in Table 3).

Table 2 Numerical experiment (nonlinear case) for PATA
Table 3 Numerical experiment (nonlinear case) for the non averaged gradient-like algorithm

6 Conclusions

We have shown that PATA is (subsequentially) convergent to solutions of monotone nested variational inequalities under the weakest conditions in the literature so far, see Theorem 1. Specifically, besides the standard convexity and monotonicity assumptions, G is required to be just monotone, while all other papers demand the monotonicity plus of G, see Facchinei et al. (2014) and Lampariello et al. (2020).

In addition, PATA enjoys interesting complexity properties, see Theorems 2 and 3. Notice that we have provided the first complexity analysis for nested variational inequalities considering optimality of both the upper- and lower-level. Conversely, authors in Lampariello et al. (2020) only handled lower-level optimality.

Although the convergence and complexity properties of the method are guaranteed by the results in Sect. 4, we plan to test numerically PATA more extensively (also when addressing nonlinear problems) in order to evaluate more accurately its performances from a practical point of view.

Possible future research may focus on generalizing the problem to consider quasi variational inequalities as well as generalized variational inequalities. The first step would be extending Proposition 1 to encompass these more complex variational problems. We leave this investigation to following works.