1 Introduction

The variational inequality (VI) represents the first-order optimality conditions of optimization problems and models equilibrium problems, which plays a key role in optimization and operations research. The definition of VI is as follows.

Definition 1

(Deterministic VI) Given a nonempty closed convex set \(X\subset \mathbb {R}^n\) and a single-valued mapping \(F: X\rightarrow \mathbb {R}^n\), the variational inequality problem, denoted by VI(XF), is to find an \(x^*\in X\) such that

$$\begin{aligned} 0\in F(x^*) + {\mathcal {N}}_X(x^*), \end{aligned}$$
(1.1)

where \({\mathcal {N}}_X(x^*)\) denotes the normal cone of X at \(x^*\). Furthermore, if X is a cone and \(X^*\) is its dual, defined as \(X^*:=\{h: h^\mathrm{T}v\geqslant 0, \forall v\in X\}\), then the complementarity problem, denoted by CP(XF), requires an \(x^*\in X\) such that \(X \ni x^* \perp F(x^*)\in X^*\), where \(u\perp v\) means that \(u^\mathrm{T}v=0\).

In the case when \(X=\mathbb {R}^n\), (1.1) reduces to the system of nonlinear equations

$$\begin{aligned} 0=F(x^*). \end{aligned}$$

When \(X=\mathbb {R}^n_+\), (1.1) reduces to the nonlinear complementarity problem (NCP)

$$\begin{aligned} 0\leqslant x^* \perp F(x^*) \geqslant 0. \end{aligned}$$

The deterministic VI has wide applications in the operations research, including convex optimization problems, convex Nash games over continuous strategy sets and economic equilibrium problems. For more details, see [1, 2].

However, in many real applications in finance, management, engineering and science, the decision makers have to make sequential decisions in an uncertain environment. In such situations, the deterministic VI may not be suitable. Motivated by those applications, one-stage, two-stage and multistage SVIs arouse the attention of scholars. One-stage SVI has been investigated for many years [3, 4]. But the investigation of multistage SVI and even two-stage SVI has just begun [5]. In this paper, starting from one-stage SVI, we will introduce the model, the motivation and recent progress of two-stage SVI (and extend to multistage SVI), including theoretical results, algorithms and applications.

The organization of the paper is as follows. Starting from one-stage SVI, the motivations, the models and the properties of the two-stage SVI (the multistage SVI) are introduced in Sect. 2. In Sect. 3, we introduce several approximation methods and algorithms for different models of two-stage SVI (multistage SVI). Two applications of two-stage SVI are shown in Sect. 4. Section 5 gives several final remarks about the challenge of this research area.

Through out this paper, we use \(\xi :\varOmega \rightarrow \mathbb {R}^d\) to denote a random vector in probability space \((\varOmega , \mathcal{F}, {P})\) with support set \(\varXi \subset \mathbb {R}^d\). For function \(f(\cdot , \cdot ):\mathbb {R}^n\times \mathbb {R}^m\rightarrow \mathbb {R}\), \(\partial _x f(x, y)\) denotes the Clarke subdifferential of f w.r.t. x.

2 Two-Stage Stochastic Variational Inequalities: Modeling and Analysis

In this section, we will give the motivation and model of two-stage stochastic variational inequalities and introduce the analysis in the literature. We will start the model of one-stage SVI firstly and then turn to two-stage SVI.

2.1 From One-Stage SVI to Two-Stage SVI

As a stochastic generalization of the deterministic VI under uncertainties, one-stage SVI has been investigated deeply and applied widely. However, a good formulation of a VI in a stochastic environment is not straightforward. We use the following one-stage stochastic game to introduce the motivation of different one-stage SVI formulations firstly.

Example 1

(Motivation of one-stage SVI) We consider a duopoly market where two firms compete to supply a homogeneous product (or service) noncooperatively in future. They need to make a decision on the quantity of production based on the other firm’s decision in an uncertain environment.

The market demand in future is characterized by a random inverse demand function \(p(q,\xi (\omega ))\), where \(p(q,\xi (\omega ))\) is the market price and q is the total supply to the market. Specifically, for each realization of the random vector \(\xi \), we obtain a different inverse demand function \(p(\cdot ,\xi )\). The uncertainty in the inverse demand function is then characterized by the distribution of the random variable \(\xi \). Firm i’s cost function for producing (supplying) a quantity of \(y_i\) in future is \(H_i(y_i, \xi )\), \(i=1,2\) with limit capacity \(c_i\). We assume \(H_i(y_i, \xi )\) is twice continuously differentiable, \(H_i'(y_i, \xi )\geqslant 0\) and \(H_i''(y_i, \xi )\geqslant 0\) for \(y_i\geqslant 0\), \(p(q,\xi )\) is twice continuously differentiable in q and \(p_q'(q,\xi )<0\) and \(p'_q(q,\xi )+qp''_{qq}(q,\xi )\leqslant 0\) for \(q\geqslant 0\) and \(\xi \in \varXi .\)

We further assume each firm aims to maximize their profit and consider two situations: 1. The decision makers can make decision after they observe the further uncertainty; 2. the decision makers make decision before they observe the further uncertainty.

  • For situation 1, to maximize each firm’s profit, they need to find \((y_1^*(\cdot ), y_2^*(\cdot ))\) such that it solves the following problem:

    $$\begin{aligned} \begin{array}{ll} \displaystyle \max _{y_i(\xi )} &{}\quad p(y_i(\xi )+y^*_{-i}(\xi ),\xi )y_i(\xi ) -H_i(y_i(\xi ), \xi ) \\ \text{ s.t. } &{}\quad 0\leqslant y_i(\xi ) \leqslant c_i, \end{array} \end{aligned}$$
    (2.1)

    for almost every (a.e.) \(\xi \in \varXi \), where \(y_{-i}\) is for decision variable of the firm(s) other than i. Moreover, we can write down the Karush–Kuhn–Tucker (KKT) conditions for (2.1) as follows: for a.e. \(\xi \in \varXi \)

    $$\begin{aligned}&0\leqslant \begin{pmatrix} y_i(\xi )\\ \mu _i(\xi ) \end{pmatrix} \nonumber \\&\perp \begin{pmatrix} -p(y_i(\xi )+y_{-i}(\xi ),\xi )- y_i(\xi )p_q'(y_i(\xi )+y_{-i}(\xi ),\xi )+H_i'(y_i(\xi ), \xi ) + \mu _i(\xi )\\ c_i - y_i(\xi ) \end{pmatrix}\geqslant 0, \end{aligned}$$
    (2.2)

    and (2.2) is a “wait-and-see” model.

  • For situation 2, we consider the expected residual minimization (ERM) formulation firstly. To maximize each firm’s profit in ERM formulation, they need to find \((y_1^*, y_2^*)\) such that it solves the following problem:

    $$\begin{aligned} \min _{0\leqslant y_1\leqslant c_1, y_2\leqslant c_2} \;\; {{E}} [\phi (y, \mu , \xi )], \end{aligned}$$
    (2.3)

    where \(y=(y_1, y_2), \mu =(\mu _1,\mu _2)\) \(\phi \) is a residual function of one-stage SVI in “wait-and-see” model (2.2), e.g.,

    $$\begin{aligned}&\phi (y, \mu , \xi )\\&\quad : = \left\| \min \left\{ \begin{pmatrix}y_1\\ \mu _1\\ y_2\\ \mu _2\end{pmatrix} , \begin{pmatrix} -p(y_1+y_{2},\xi )- y_1p_q'(y_1+y_{2},\xi )+H_1'(y_1, \xi ) + \mu _1\\ c_1 - y_1\\ -p(y_2+y_1,\xi )- y_2p_q'(y_2+y_1,\xi )+H_2'(y_2, \xi ) + \mu _2\\ c_2 - y_2 \end{pmatrix}\right\} \right\| _2^2 \end{aligned}$$

    and (2.3) is a one-stage SVI in ERM formulation.

  • We then consider the expected-value (EV) formulation. To maximize each firm’s profit in EV formulation, they need to find \((y_1^*, y_2^*)\) such that it solves the following problem:

    $$\begin{aligned} \begin{array}{ll} \displaystyle \max _{y_i} &{}\quad {{E}}[p(y_i+y^*_{-i},\xi )y_i -H_i(y_i, \xi )] \\ \text{ s.t. } &{}\quad 0\leqslant y_i \leqslant c_i, \end{array} \end{aligned}$$
    (2.4)

    for almost every \(\xi \in \varXi \). Moreover, we can write down the KKT conditions for (2.4) as follows:

    $$\begin{aligned} 0\leqslant & {} \begin{pmatrix} y_i\\ \mu _i \end{pmatrix} \perp \begin{pmatrix} {E}[-p(y_i+y_{-i},\xi )- y_ip_q'(y_i+y_{-i},\xi )+H_i'(y_i, \xi ) + \mu _i]\\ c_i - y_i \end{pmatrix}\nonumber \\\geqslant & {} 0, \text{ for } \text{ a.e. } \xi \in \varXi , \end{aligned}$$
    (2.5)

    and (2.5) is a one-stage SVI in EV formulation.

By the three formulations of the two-stage stochastic game, we can give the definition of three kinds of one-stage SVI. Let \(f:\mathbb {R}^n\times \varXi \rightarrow \mathbb {R}^n\) be a continuous function.

  1. 1.

    “wait-and-see” model: find \(x: \varXi \rightarrow \mathbb {R}^n\) such that

    $$\begin{aligned} 0\in f(x(\xi ), \xi ) + {\mathcal {N}}_X(x(\xi ), \xi ), \;\; \text{ for } \text{ a.e. } \xi \in \varXi . \end{aligned}$$

    In this model, for every scenario \(\xi \) in the further, we have \(x(\xi )\) as the solution and \(x(\cdot )\) is a measurable function on \(\varXi \). Since the solution \(x(\xi )\) depends on the further uncertain scenario, we call it “wait-and-see” solution and the model “wait-and-see” model. Although the model can give perfect solution for every scenario in the further, in many real-world applications when the decision needs to be made before we observe the uncertain scenario, it is unworkable.

  2. 2.

    The expected residual minimization (ERM) formulation [6,7,8,9,10]: Find \(x\in \mathbb {R}^n\) such that it is a solution of

    $$\begin{aligned} \min _{x\in X} \;\; {E}[\phi (x, \xi )], \end{aligned}$$

    where \(\phi (\cdot , \xi ): X\rightarrow \mathbb {R}\) is a residual function of the VI\((X, f(\cdot , \xi ))\) for a.e. \(\xi \in \varXi \), that is,

    $$\begin{aligned} \phi (\cdot , \xi ) \geqslant 0, \;\;\;\;\; \text{ and } \;\;\;\;\; \phi (x, \xi )=0 \;\; \Leftrightarrow \;\; 0\in f(x, \xi ) + {\mathcal {N}}_X(x), \; \text{ for } \text{ a.e. } \xi \in \varXi . \end{aligned}$$

    Different with the “wait-and-see model,” the solutions of the ERM formulation do not depend on the further uncertain scenario and the decision makers can make decisions before they observe further uncertainty. We call the solutions “here-and-now” solutions. Moreover, the ERM formulation can quantify the quality of the “here-and-now” solution. The value of \(\phi (x, \xi )\) can be considered as the “loss” due to failure of the equilibrium and hence can measure the quality of the “here-and-now” solution at scenario \(\xi \). Then, the expect value of \(\phi (x, \xi )\) measures the quality of the “here-and-now” solution in the sense of expectation. In [11], Chen et al. further investigated the ERM in the case when X depends on random vector \(\xi \).

  3. 3.

    The expected-value (EV) formulation [12,13,14,15]: Find \(x\in \mathbb {R}^n\) such that

    $$\begin{aligned} 0\in {E}[f(x, \xi )] + {\mathcal {N}}_X(x). \end{aligned}$$
    (2.6)

    The solutions of EV formulation are also “here-and-now” solutions. Similar as deterministic VI, the EV formulation can be used to represent the first-order optimality conditions of one-stage stochastic optimization and describe one-stage stochastic Nash equilibrium. But the “here-and-now” solution of the EV formulation is made in the sense of expectation, for different scenarios in the further, the “here-and-now” solution may not be a good solution. Moreover, when we set \(G(x):={E}[f(x, \xi )]\), then the EV formulation is the same as the deterministic VI: \(0\in G(x)+{\mathcal {N}}_X(x)\). Similar as the ERM formulation, we can also reformulate the EV formulation as a minimization problem

    $$\begin{aligned} \min _{x\in X} \;\; \theta (x), \end{aligned}$$

    where \(\theta : X\rightarrow \mathbb {R}_+\) is a residual function of the deterministic VI, that is,

    $$\begin{aligned} \theta (\cdot ) \geqslant 0, \;\;\;\;\; \text{ and } \;\;\;\;\; \theta (x)=0 \;\; \Leftrightarrow \;\; 0\in G(x) + {\mathcal {N}}_X(x). \end{aligned}$$

    One popular residual function is the regularized gap function [16] as follows:

    $$\begin{aligned} \theta (x) = \max _{v\in X} \;\;\langle x-v, G(x) \rangle - \frac{\alpha }{2}\Vert x-v\Vert ^2. \end{aligned}$$
    (2.7)

Moreover, let \(g:\mathbb {R}^n\times \varXi \rightarrow \mathbb {R}\) be a Lipschitz continuous function w.r.t. x and \( f(x,\xi ) \in \partial _x g(x, \xi ) \), where \(\partial \) denotes the Clarke subdifferential. Consider one-stage stochastic program

$$\begin{aligned} \min _{x\in X}\;\; {E}[g(x, \xi )]. \end{aligned}$$
(2.8)

Then, \({E}[f(x, \xi )] \in {E}[\partial _xg(x, \xi )]\) and (2.6) is the first-order necessary optimal condition of (2.8). Moreover, (2.8) has wide applications in a lot of areas. For example, in risk management

$$\begin{aligned} \min _{x\in X} \;\; \mathrm{CVaR}_\alpha (l(x, \xi )) \end{aligned}$$

with some \(\alpha \in (0,1]\) and the loss function \(l:\mathbb {R}^n\times \varXi \rightarrow \mathbb {R}\) can be reformulated as (2.8), where CVaR denotes “conditional value of risk” [17]. For another example, in machine learning and compressed sensing, the formulation of Lasso [18]

$$\begin{aligned} \min _{x\in \mathbb {R}^n} {E}[(a(\xi )^\mathrm{T} x-b(\xi ))^2] + \lambda \Vert x\Vert _1 \end{aligned}$$

can also be formulated as (2.8), where \(a:\varXi \rightarrow \mathbb {R}^n\) and \(b:\varXi \rightarrow \mathbb {R}^n\) are given functions.

Since the solutions of the EV formulation and ERM formulation are all “here-and-now” solutions, we can call the two formulations “here-and-now” model.

In the case when we consider two-stage decisions under uncertain environment, the mathematical tools of one-stage SVI are not enough. To elicit two-stage SVI, we consider an extension of Example 1:

Example 2

(Motivation of two-stage SVI) Similar as Example 1, we consider a duopoly market where two firms compete to supply a homogeneous product (or service) noncooperatively in future. The only difference with Example 1 is that neither of the firms has an existing capacity, and thus, they must make a decision at the present time on their capacity for future supply of quantities in order to have enough time to build the necessary facilities.

For \(i=1,2\), firm i’s cost function for building up capacity \(x_i\) is \(C_i(x_i)\) and the other settings are the same as in the situation 1 of Example 1. Assuming each firm aims to maximize the expected profit, we can then develop a mathematical model for their decision making: For \(i=1,2\), find \((x^*_i, y^*_i(\cdot ))\) such that it solves the following two-stage stochastic programming problem:

$$\begin{aligned} \begin{array}{ll} \displaystyle \max _{x_i,y_i(\cdot )} &{}\quad {E}[p(y_i(\xi )+y^*_{-i}(\xi ),\xi )y_i(\xi ) -H_i(y_i(\xi ), \xi )] - C_i(x_i) \\ \text{ s.t. } &{}\quad 0\leqslant y_i(\xi ) \leqslant x_i, \; \text{ for } \text{ a.e. } \xi \in \varXi . \end{array} \end{aligned}$$
(2.9)

Note that when we know (or fix) the first-stage decision variables \(x_i\), the problem becomes: for \(i=1,2\), find \(y^*_i\) such that

$$\begin{aligned} \begin{array}{lcl} y_i^*(\xi )\in &{}\displaystyle \arg \max _{y_i} &{}\quad p(y_i+y^*_{-i},\xi )y_i -H_i(y_i, \xi ) \\ &{}\text{ s.t. } &{}\quad 0\leqslant y_i \leqslant x_i, \; \; \text{ for } \text{ a.e. } \xi \in \varXi , \end{array} \end{aligned}$$
(2.10)

and this is exactly the “wait-and-see” model in Example 1.

In this situation, we fix \(x_i\), \(i=1,2\) and handle the second-stage game problem firstly. Then, the second-stage SVI representation of (2.9) is (2.10) and we denote \({\bar{y}}(x, \xi )\) the solution of (2.10). With the second-stage equilibrium \({\bar{y}}(x,\xi )\), we are ready to write down the first-stage decision-making problem for player i:

$$\begin{aligned} \begin{array}{ll} \displaystyle \max _{x_i} &{}\quad {E}[ v_i(x,\xi )] - C_i(x_i) \\ \text{ s.t. } &{}\quad x_i \geqslant 0, \end{array} \end{aligned}$$
(2.11)

where

$$\begin{aligned} v_i(x,\xi ) := p({\bar{y}}_i(x,\xi )+{\bar{y}}_{-i}(x,\xi ),\xi ){\bar{y}}_i(x,\xi ) -H_i({\bar{y}}_i(x,\xi ), \xi ). \end{aligned}$$

A 4-tuple \((x_1^*,x_2^*, y_1^*(\cdot ), y_2^*(\cdot ))\) with \((y_1^*(\cdot ), y_2^*(\cdot )) = ({\bar{y}}_1(x^*, \cdot ), {\bar{y}}_2(x^*, \cdot ))\) is called a two-stage stochastic equilibrium if \((x_i^*,x_{-i}^*)\) solves (2.11). Then, the first-stage equilibrium \((x^*_i, x^*_{-i})\) satisfies

$$\begin{aligned} x_i^* \in \arg \max _{x_i\geqslant 0} {E}[ v_i(x_i,x_{-i}^*,\xi )]- C_i(x_i), \quad i=1,2. \end{aligned}$$
(2.12)

Assuming that \(C_i(\cdot )\) is continuously differentiable, we may write down the first-order optimality condition of (2.12):

$$\begin{aligned} 0\in C_i'(x_i) -\partial _{x_i} {E}[ v_i(x,\xi )] + {\mathcal {N}}_{[0,\infty )}(x_i), \quad i=1,2. \end{aligned}$$
(2.13)

In the case when the inverse demand function \(p(q,\xi )\) satisfies the following conditions:

  1. (i)

    \(p(q,\xi )\) is twice continuously differentiable in q and \(p_q'(q,\xi )<0\) for \(q\geqslant 0\) and \(\xi \in \varXi ;\)

  2. (ii)

    \(p'_q(q,\xi )+qp''_{qq}(q,\xi )\leqslant 0\), for \(q\geqslant 0\) and \(\xi \in \varXi \).

It follows by Ralph and Xu [19, Lemma 5.2] that the KKT system of second-stage problem (2.10) has a unique solution; \(v(x,\xi )\) is continuously differentiable w.r.t. \(x_i\) for \(x_i>0\) and

$$\begin{aligned} \nabla _{x_i}v_i(x,\xi ) = \frac{L(y_i(\xi ),\lambda _i(\xi ),\mu _i(\xi ),{ x}_i)}{dx_i}= \mu _i(\xi ), \end{aligned}$$

where

$$\begin{aligned} L(y_i(\xi ),\lambda _i(\xi ),\mu _i(\xi ),{ x}_i):= & {} p(y_1(\xi )+y_2(\xi ),\xi )y_i(\xi ) \\&-H_i(y_i(\xi ), \xi ) +\lambda _i(\xi )y_i(\xi ) - \mu _i(\xi )(y_i(\xi )- x_i). \end{aligned}$$

Note that in the case when \(x_i=0\), \(y_i(\xi )\equiv 0\), we have

$$\begin{aligned} \partial _{x_i}v_i(x,\xi ) =\{\mu _i(\xi ): \mu _i(\xi ) \geqslant (-g_i(0, 0,\xi ))_+, \forall \xi \in \varXi \;\; \text{ and } \;\; {E}[\mu _i(\xi )] \geqslant C_i'(x_i)\}. \end{aligned}$$

Summarizing the discussions above, we can rewrite (2.13) as

$$\begin{aligned} 0\leqslant x_i\perp C_i'(x_i)- {E}[\mu _i(\xi )]\geqslant 0, \quad \,\, i=1,2, \end{aligned}$$
(2.14)

and derive the following two-stage stochastic linear complementarity problem:

$$\begin{aligned} \left\{ \begin{array}{ll} 0\leqslant x_i\perp C_i'(x_i)- {E}[\mu _i(\xi )] \geqslant 0, &{} \quad i=1,2,\\ 0\leqslant \begin{pmatrix} y_i(\xi )\\ \mu _i(\xi ) \end{pmatrix} \perp \begin{pmatrix} g_i(y_i(\xi ),y_{-i}(\xi ),\xi )+ \mu _i(\xi )\\ x_i - y_i(\xi ) \end{pmatrix}\geqslant 0, \;\;&\quad \text{ for } \;\;\text{ a.e. }\;\; \xi \in \varXi , \;\; i=1,2, \end{array} \right. \nonumber \\ \end{aligned}$$
(2.15)

where

$$\begin{aligned} g_i(y_i(\xi ),y_{-i}(\xi ),\xi )= & {} -p(y_i(\xi )+y_{-i}(\xi ),\xi )- y_i(\xi )p_q'(y_i(\xi )\\&+y_{-i}(\xi ),\xi )+H_i'(y_i(\xi ), \xi ) \end{aligned}$$

and (2.15) can be considered as an example of two-stage SVI.

In what follows, we give the definition of two-stage SVI.

Definition 2

(Two-stage SVI) Let \(\xi \) be a random vector defined as above and \(\mathcal{Y}\) be the measurable function space defined on \(\varXi \). The two-stage SVI is to find an \((x^*,y^*(\cdot ))\in D\times \mathcal{Y}\) such that

$$\begin{aligned}&0\in {E}[\varPhi (x, y(\xi ), \xi )]+ {\mathcal {N}}_D(x), \end{aligned}$$
(2.16)
$$\begin{aligned}&0\in \varPsi (x, y(\xi ), \xi )+{\mathcal {N}}_{C(\xi )}(y(\xi )), \;\; \text{ for } \text{ a.e. } \; \xi \in \varXi , \end{aligned}$$
(2.17)

where \(D\subset \mathbb {R}^n\) and \(C(\xi )\subset \mathbb {R}^m\), a.e. \(\xi \in \varXi \) are nonempty closed convex sets, \(\varPhi :\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^d\rightarrow \mathbb {R}^n\) and \(\varPsi :\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^d\rightarrow \mathbb {R}^m\), and \(\Gamma _1:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\), \(\Gamma _2:\mathbb {R}^m\times \varXi \rightrightarrows \mathbb {R}^m\) are multifunctions (point-to-set mappings). We assume throughout the paper that \(y(\cdot ) \in \mathcal{Y}\) with \(\mathcal{Y}\) being the space of measurable functions from \(\varXi \) to \(\mathbb {R}^m\) such that the expected value in (2.16) is well defined.

Moreover, if the sets D and \(C(\xi )\), \(\xi \in \varXi \), are closed convex cones, then

$$\begin{aligned} {\mathcal {N}}_D(x)=\{x^*\in D^*: x^\top x^* =0\},\;x\in D, \end{aligned}$$

where \(D^*=\{x^*: x^\top x^* \leqslant 0,\;\forall x\in D\}\) is the (negative) dual of cone D. In that case, SVI (2.16)–(2.17) reduces to the following two-stage stochastic cone VI:

$$\begin{aligned}&D\ni x\perp {E}[\varPhi (x, y(\xi ), \xi )]\in - D^*, \\&C(\xi )\ni y(\xi )\perp \varPsi (x, y(\xi ), \xi )\in - C^*(\xi ), \;\; \text{ for } \text{ a.e. } \xi \in \varXi . \end{aligned}$$

In particular when \(D:=\mathbb {R}^n_+\) with \(D^*=-\mathbb {R}^n_+\), and \(C(\xi ):= \mathbb {R}^m_+\) with \(C^*(\xi )=-\mathbb {R}^m_+\) for all \(\xi \in \varXi \), SVI (2.16)–(2.17) reduces to the two-stage stochastic nonlinear complementarity problem (SNCP):

$$\begin{aligned}&0\leqslant x\perp {E}[\varPhi (x, y(\xi ), \xi )] \geqslant 0, \\&0\leqslant y(\xi )\perp \varPsi (x, y(\xi ), \xi ) \geqslant 0, \,\, \text{ for } \text{ a.e. } \xi \in \varXi , \end{aligned}$$

which is a generalization of the two-stage stochastic linear complementarity problem (SLCP):

$$\begin{aligned}&0\leqslant x\perp Ax + {E}[B(\xi )y(\xi )] +q_1 \geqslant 0,\end{aligned}$$
(2.18)
$$\begin{aligned}&0\leqslant y(\xi )\perp N(\xi )x +M(\xi )y(\xi ) +q_2(\xi ) \geqslant 0, \,\, \text{ for } \text{ a.e. } \xi \in \varXi , \end{aligned}$$
(2.19)

where \(A\in \mathbb {R}^{n\times n}\), \(B: \varXi \rightarrow \mathbb {R}^{n\times m},\) \(N: \varXi \rightarrow \mathbb {R}^{m\times n},\) \(M: \varXi \rightarrow \mathbb {R}^{m\times m},\) \(q_1\in \mathbb {R}^n, q_2: \varXi \rightarrow \mathbb {R}^{m}.\) More specifically, when \(D:=\mathbb {R}^n\) and \(C(\xi ):=\mathbb {R}^m\) for all \(\xi \in \varXi \), SVI (2.16)–(2.17) reduces to the two-stage stochastic equations:

$$\begin{aligned}&{E}[\varPhi (x, y(\xi ), \xi )] = 0, \\&\varPsi (x, y(\xi ), \xi )=0, \,\, \text{ for } \text{ a.e. } \xi \in \varXi . \end{aligned}$$

The two-stage SVI is basically an infinite dimension VI and not easy to handle. In [20], Chen et. al. proposed the two-stage SVI model to deal with random variables in VI and formulate this model as a two-stage stochastic programming with recourse by using an ERM solution procedure. We will introduce the procedure in the next section.

In [20], Chen et. al. considered a two-stage SVI as follows:

$$\begin{aligned}&0\in G(x)+ {\mathcal {N}}_D (x), \end{aligned}$$
(2.20)
$$\begin{aligned}&0\in F({\bar{y}}(x, \xi ), \xi )+{\mathcal {N}}_{C(\xi )}({\bar{y}}(x, \xi )), \;\; \text{ for } \text{ a.e. } \; \xi \in \varXi , \end{aligned}$$
(2.21)

where \(G: \mathbb {R}^n\rightarrow \mathbb {R}^n\), \(F: \mathbb {R}^n\times \varXi \rightarrow \mathbb {R}^m\), \({\bar{y}}: D\times \varXi \rightarrow \mathbb {R}^m\), D and \(C(\xi )\), \(\forall \xi \in \varXi \) are nonempty closed convex subsets of \(\mathbb {R}^n\) and \(\mathbb {R}^m\). Formulations (2.20)–(2.21) are special cases of two-stage SVI.

Moreover, Chen et. al. [20] also considered a special case of two-stage SVI (2.20)–(2.21) as follows:

$$\begin{aligned}&0\in G(x)+ {\mathcal {N}}_D(x) ,\end{aligned}$$
(2.22)
$$\begin{aligned}&0\in F(u_\xi , \xi )+ {\mathcal {N}}_{C(\xi )}(u_\xi ), \; \text{ for } \text{ a.e. } \xi \in \varXi , \end{aligned}$$
(2.23)

where the first-stage problem (2.22) is an EV form of the one-stage SVI and the second-stage problem (2.23) is the “wait-and-see” model of the one-stage SVI. In general, the solution sets of the variational inequalities (2.22) and (2.23) can have multiple solutions. Naturally, a “here-and-now” solution x should have minimum total distances to the solution set of (2.23) for almost all observations \(\xi \in \varXi \). Then, the problem can be written as a mathematical programming with equilibrium constraints (MPEC) [21] as the following:

$$\begin{aligned} \begin{array}{ll} \min &{} {E}[\Vert u_\xi - x \Vert ^2]\\ \hbox {s.t.} &{} 0\in G(x)+ {\mathcal {N}}_D(x), \;\; 0\in F(u_\xi , \xi )+ {\mathcal {N}}_{C(\xi )}(u_\xi ), \; \text{ for } \text{ a.e. } \xi \in \varXi . \end{array} \end{aligned}$$
(2.24)

2.2 Multistage SVI

In [22], Rockafellar and Wets first proposed the formwork of the multistage SVI. The multistage SVI is an extension of the two-stage SVI and can deal with multistage problems of optimization and equilibrium in a stochastic setting which involves actions that respond to increasing levels of information. Moreover, Rockafellar and Sun [23, 24] extended the progressive hedging method to solve multistage SVIs.

We begin with two-stage SVI setting in (2.16)–(2.17) and explain the equivalence between the formulations in [22] and in the previous section. Let \(\mathcal{L}_{n+m}=\mathcal{L}_{n}\times \mathcal{L}_m\), where \(\mathcal{L}_{n}\) and \(\mathcal{L}_m\) are function spaces of all functions \(x(\cdot ):\varXi \rightarrow \mathbb {R}^{n}\) and \(y(\cdot ):\varXi \rightarrow \mathbb {R}^{m}\). Let

$$\begin{aligned} \mathcal{C}:=\{(x(\cdot ), y(\cdot ))\in \mathcal{L}_{n+m} |~ x(\xi )\in D, y(\xi )\in C(\xi ) \text{ for } \text{ a.e. } \xi \in \varXi \} \end{aligned}$$

and

$$\begin{aligned} {\mathscr {N}}:=\{ (x(\cdot ), y(\cdot ))\in \mathcal{L}_{n+m} |~ x(\xi ) \text{ is } \text{ the } \text{ same } \text{ for } \text{ all } \xi \in \varXi \} \end{aligned}$$

be nonanticipativity subspaces of \(\mathcal{L}_{n+m}\). Consider a mixed case of nonanticipativity: The solution pair (xy) with two-stage variables x (here-and-now) and y (wait-and-see) is in the function space \(\mathcal{L}_{n+m}\) with \(x(\cdot ): \varXi \rightarrow \mathbb {R}^n\) and \(y(\cdot ):\varXi \rightarrow \mathbb {R}^m\) and \((x,y)\in \mathcal{C}\cap {\mathscr {N}}\). Then, the basic form of the two-stage SVI introduced in [22] is

$$\begin{aligned} 0\in \mathcal{F}(x(\cdot ), y(\cdot )) + N_{\mathcal{C}\cap {\mathscr {N}}}(x(\cdot ), y(\cdot )), \end{aligned}$$
(2.25)

where \(\mathcal{F}: \mathcal{L}_{n+m}\rightarrow \mathcal{L}_{n+m}\) is a continuous mapping such that for \((x(\cdot ), y(\cdot ))\in \mathcal{L}_{n+m}\), \(\mathcal{F}(x(\cdot ), y(\cdot ))\) is the function in \(\mathcal{L}_{n+m}\) that takes \(\xi \in \varXi \) to \((\varPhi (x(\xi ), y(\xi ), \xi ), \varPsi (x(\xi ), y(\xi ), \xi ))\in \mathbb {R}^{n+m}\).

Moreover, let

$$\begin{aligned}&{\mathscr {M}} = \{w(\cdot )=(w_1(\cdot ), w_2(\cdot ))\in \mathcal{L}_{n+m} |~ w_1(\cdot )\in \mathcal{L}_{n}, \\&\quad w_2(\cdot )\in \mathcal{L}_{m}, {E}[w_1(\xi )]=0, w_2(\xi )=0, \text{ a.e. } \xi \in \varXi \}. \end{aligned}$$

It is easy to verify that \({\mathscr {M}}\) is the orthogonal complement of \({\mathscr {N}}\) and can be considered as the space of nonanticipativity multipliers. Then, the extensive form of the two-stage SVI in [22] is:

\((x(\cdot ), y(\cdot ))\in {\mathscr {N}}\) and there exists \(w(\cdot )\in {\mathscr {M}}\) such that

$$\begin{aligned} -\begin{pmatrix} \varPhi (x(\xi ), y(\xi ), \xi )\\ \varPsi (x(\xi ), y(\xi ), \xi )) \end{pmatrix} - \begin{pmatrix} w_1(\xi )\\ w_2(\xi ) \end{pmatrix} \in \begin{pmatrix} N_{D}(x(\xi ))\\ N_{C(\xi )}(y(\xi )) \end{pmatrix}, \;\; a.e. \; \xi \in \varXi . \end{aligned}$$
(2.26)

Under the constraints qualification (CQ):

$$\begin{aligned} \text{ there } \text{ exists } \text{ some } {\hat{x}}(\cdot )\in {\mathscr {N}} \text{ such } \text{ that } {\hat{x}}(\xi )\in \mathrm{ri} C(\xi ) \text{ a.e. } \xi \in \varXi , \end{aligned}$$
(2.27)

the equivalence between (2.25) and (2.26) has been proved in [22, Theorem 3.2] for multistage SVI. Now we consider the equivalence between (2.26) and (2.16)–(2.17).

Under (2.27), suppose \((x^*(\xi ), y^*(\xi ), w_1^*(\xi ))\) (\(w_2(\xi )\equiv 0\)) is a solution of (2.26) with \(x^*(\xi )=x^*\) for all \(\xi \in \varXi \) and \({E}[w_1^*(\xi )]=0\). Then, taking expectation at first line of (2.26), we have (2.16)–(2.17). Conversely, suppose \((x^*, y^*(\xi ))\) is a solution of (2.16)–(2.17). By (2.16), \(-{E}[\varPhi (x^*, y^*(\xi ), \xi )]\in N_{D}(x^*)\), there exists \(w_1^*(\xi )\) such that \(-\varPhi (x^*, y^*(\xi ), \xi )-w_1^*(\xi )\in N_{D}(x^*)\) and \({E}[w_1^*(\xi )]=0\), which implies \((x^*(\xi ), y^*(\xi ), w_1^*(\xi ))\) with \(x^*(\xi )=x^*\) and \({E}[w_1^*(\xi )]=0\) is a solution of (2.26). Then, (2.16)–(2.17) and (2.26) are equivalence.

Then, we consider the multistage SVI. We adopt an N-stage pattern

$$\begin{aligned} x_1, \xi _1, x_2, \xi _2, \cdots , x_N, \xi _N, \end{aligned}$$

where \(x_k\in \mathbb {R}^{n_k}\), \(n=n_1+\cdots +n_N\) and \(\xi _k\in \varXi _{k}\), \(x_k\) is the decision to be taken at the kth stage before we observe the information \(\xi _i\), \(i\geqslant k\) and \(\xi _k\) stands for the information revealed after that decision, but before the next. Let

$$\begin{aligned} \xi =(\xi _1, \cdots , \xi _N)\in \varXi =\varXi _1\times \cdots \times \varXi _N \end{aligned}$$

and

$$\begin{aligned} {\tilde{x}}(\xi )=(x_1, x_2(\xi _1), x_3(\xi _1, \xi _2), \cdots , x_N(\xi _1, \cdots , \xi _{N-1}))\in \mathbb {R}^n=\mathbb {R}^{n_1}\times \cdots \times \mathbb {R}^{n_N}. \end{aligned}$$

Similar as in two-stage SVI case, we use \(x(\cdot ): \xi \rightarrow (x_1(\xi ), \cdots , x_N(\xi ))\) in \(\mathcal{L}_n\) to replace \({\tilde{x}}(\xi )\) and restrict it to nonanticipativity subspace of \(\mathcal{L}_n\):

$$\begin{aligned} {\mathscr {N}} = \{x(\cdot ) = (x_1(\cdot ), \cdots , x_N(\cdot )) | ~ x_k(\xi ) \text{ does } \text{ not } \text{ depend } \text{ on } \xi _k, \cdots , \xi _N\}. \end{aligned}$$
(2.28)

Corresponding nonantipativity multipliers will again come from a subspace \({\mathscr {M}}\) of \(\mathcal{L}_n\) as follows

$$\begin{aligned} {\mathscr {M}} = \{w(\cdot ) = (w_1(\cdot ), \cdots , w_N(\cdot )) | ~ {E}_{\xi _k, \cdots , \xi _N}[w_k(\xi _1, \cdots , \xi _{k-1}, \xi _{k}, \cdots , \xi _N)]=0\}, \end{aligned}$$
(2.29)

where the expectation is the conditional expectation knowing the initial components \(\xi _1, \cdots , \xi _{k-1}\). Note also that \({\mathscr {N}}\) and \({\mathscr {M}}\) are orthogonal complement of each other, that is, \({\mathscr {N}}^{\perp } = {\mathscr {M}}\). Moreover, let \(C(\xi )\) be a nonempty closed convex subset of \(\mathbb {R}^n\), a.e. \(\xi \in \varXi \),

$$\begin{aligned} \mathcal{C} = \{x(\cdot )\in \mathcal{L}_n | ~ x(\xi )\in C(\xi ), \text{ a.e. } \xi \in \varXi \} \end{aligned}$$

be a nonempty closed convex subset of \(\mathcal{L}_n\), \(F(x, \xi ) = (F_1(x, \xi ), \cdots , F_N(x, \xi ))\) be a continuous vector valued function w.r.t. \(x\in \mathbb {R}^n\) with \(F_k(x, \xi )\in \mathbb {R}^{n_k}\) and \(\mathcal{F}\) from \(\mathcal{L}_n\) to \(\mathcal{L}_n\) such that

$$\begin{aligned} \mathcal{F}(x(\cdot )) : \xi \rightarrow F(x(\xi ), \xi ) = (F_1(x(\xi ), \xi ), \cdots , F_N(x(\xi ), \xi )). \end{aligned}$$

Then, the basic and extensive forms of multistage SVI are as follows:

Definition 3

The basic form of multistage SVI is to find \(x(\cdot )\in \mathcal{L}_n\) such that

$$\begin{aligned} 0 \in \mathcal{F}(x(\cdot )) + {\mathcal {N}}_{\mathcal{C}\cap {\mathscr {N}}}(x(\cdot )), \end{aligned}$$
(2.30)

where the extensive form of multistage SVI is

\(x(\cdot )\in {\mathscr {N}}\) and there exists \(w(\cdot )\in {\mathscr {M}}\) such that

$$\begin{aligned} 0\in F(x(\xi ),\xi )+w(\xi )+ {\mathcal {N}}_{C(\xi )}(x(\xi )), \text{ a.e. } \xi \in \varXi . \end{aligned}$$
(2.31)

[22, Theorem 3.2] gives the proof of equivalence between (2.30) and (2.31) under the CQ (2.27). Moreover, if

$$\begin{aligned} C(\xi )=C_1\times C_2(\xi _1) \times \cdots \times C_N(\xi _1,\cdots ,\xi _{N-1}), \end{aligned}$$

(2.31) can be written as

$$\begin{aligned}&0\in {E}_{\xi _1, \cdots ,\xi _N}[F_1(x_1, x_2(\xi _1),\cdots ,x_N(\xi _1, \cdots , \xi _{N-1}), \xi )]+ {\mathcal {N}}_{C_1}(x_1) , \end{aligned}$$
(2.32)
$$\begin{aligned}&0\in {E}_{\xi _2,\cdots ,\xi _N}[F_2(x_1, x_2(\xi _1),\cdots ,x_N(\xi _1, \cdots , \xi _{N-1}), \xi )]\nonumber \\&+{\mathcal {N}}_{C_2(\xi _1)}(x_2(\xi _1)), \text{ for } \text{ a.e. }\; \xi _1\in \varXi _1, \end{aligned}$$
(2.33)
$$\begin{aligned}&0\in {E}_{\xi _3,\cdots ,\xi _N}[F_3(x_1, x_2(\xi _1),\cdots ,x_N(\xi _1, \cdots , \xi _{N-1}), \xi )]\nonumber \\&+{\mathcal {N}}_{C_3(\xi _1,\xi _2)}(x_3(\xi _1, \xi _2)), \end{aligned}$$
(2.34)
$$\begin{aligned}&\text{ for } \text{ a.e. }\; (\xi _1, \xi _2)\in \varXi _1\times \varXi _2,\nonumber \\&\ldots \nonumber \\&0\in {E}_{\xi _N}[F_N(x_1, x_2(\xi _1),\cdots ,x_N(\xi _1, \cdots , \xi _{N-1}), \xi )]\nonumber \\&\quad ~~+N_{C_N(\xi _1, \cdots ,\xi _{N-1})}(x_N(\xi _1, \cdots , \xi _{N-1})), \nonumber \\&\text{ for } \text{ a.e. }\; (\xi _1, \cdots ,\xi _{N-1})\in \varXi _1\times \cdots \times \varXi _{N-1}. \end{aligned}$$
(2.35)

The equivalence between (2.31) and (2.32)–(2.35) is proved in [22, Theorem 3.4]

Moreover, if, in (2.30), the sets \(C(\xi )\) are specified by

$$\begin{aligned} x\in C(\xi ) \Longleftrightarrow x\in B(\xi ) \text{ and } f_i(x, \xi ) \left\{ \begin{array}{ll} \leqslant 0, &{}\quad \text{ for } i=1, \cdots , r,\\ =0, &{}\quad \text{ for } i=r+1, \cdots , m, \end{array} \right. \end{aligned}$$

where \(B(\xi )\) is a nonempty closed convex set and \(f_i(x,\xi )\) is differentiable and convex in x for \(i=1, \cdots ,r\) and affine in x for \(i=r+1, \cdots , m\), then under the CQ

$$\begin{aligned}&\text{ there } \text{ exists } {\hat{x}}(\cdot )\in {\mathscr {N}} \text{ such } \text{ that } {\hat{x}}(\xi )\in \mathrm{ri} B(\xi ) \text{ and } \nonumber \\ {}&\quad f_i({\hat{x}} (\xi ), \xi )\left\{ \begin{array}{ll} <0, &{}\quad i\leqslant r,\\ =0, &{}\quad i>r, \end{array} \right. \text{ a.e. } \xi \in \varXi . \end{aligned}$$
(2.36)

The representation of \(N_\mathcal{C}(x(\cdot ))\) ( [22, Theorem 3.7]) is

$$\begin{aligned}&\exists y(\cdot ) \in \mathcal{L}_m, z(\cdot )\in \mathcal{L}_n, \text{ such } \text{ that } \nonumber \\&\begin{array}{lll} v(\xi ) = \sum _{i=1}^m y_i(\xi )\nabla _x f_i(x(\xi ), \xi ) + z(\xi ) \text{ with } \\ f(x(\xi ), \xi ) \in N_{Y}(y(\xi )), z(\xi )\in N_{B(\xi )}(x(\xi )), \end{array} \end{aligned}$$
(2.37)

where \(Y=[0, +\infty ]^r\times (-\infty , \infty )^{m-r}\) and the Lagrangian basic form of the associated SVI is

$$\begin{aligned}&x(\cdot )\in {\mathscr {N}}, y(\cdot )\in \mathcal{L}_m, \text{ and } \exists w(\cdot ) \in {\mathscr {M}} \text{ such } \text{ that } \nonumber \\&0\in \left( F(x(\xi ), \xi ) + \sum _{i=1}^s y_i(\xi ) \nabla _xf_i(x(\xi ), \xi ) -f(x(\xi ), \xi )\right) \nonumber \\&\quad +\,(w(\xi ),0) + {\mathcal {N}}_{B(\xi )\times Y}(x(\xi ), y(\xi )). \end{aligned}$$
(2.38)

3 Algorithms and Approximation Methods

In this section, we consider the algorithms and approximation methods for two-stage and multistage SVI. We first introduced the ERM formulation of (2.20)–(2.21) as follows.

3.1 ERM Solution Procedure

As we introduced in Sect. 2.1, the ERM formulation is one of three important formulations of one-stage SVI. Chen et. al. [11] first introduced second-stage recourse variables into SVI, that is, find x and \(u(x, \xi )\) such that

$$\begin{aligned} 0\in f(u(x, \xi ), \xi ) + {\mathcal {N}}_{C(\xi )}(u(x,\xi )). \end{aligned}$$

In [20], Chen et. al. extended the ERM formulation from one-stage SVI to two-stage SVI (2.20)–(2.21). For this propose, they need an extension of residual function as follows.

Definition 4

(SVI-residual function) Given a closed, convex set \(D\in \mathbb {R}^n\) and the random vector \(\xi \), let us consider the following collection of VI (SVI):

$$\begin{aligned} \begin{array}{c} \mathrm{find } \; {\bar{x}}\in \mathbb {R}^n, u: D\times X\rightarrow \mathbb {R}^m, \hbox {P-measurable in } \xi \text{ such that } \\ 0\in F({\bar{u}}(x, \xi ), \xi )+{\mathcal {N}}_{C(\xi )}({\bar{u}}(x, \xi )). \end{array} \end{aligned}$$

A function \(r: \mathbb {R}^m\times \varXi \rightarrow \mathbb {R}\) is a residual function for these inclusions if the following conditions are satisfied:

  1. 1.

    \(r(u,\xi )\geqslant 0\) for all \(u\in C(\xi )\), \( \text{ a.e. } \xi \in \varXi \);

  2. 2.

    For any \(u: C\times \varXi \rightarrow \mathbb {R}^n\), it holds that

    $$\begin{aligned}&0\in F( {\bar{u}}(x, \xi ), \xi )+{\mathcal {N}}_{C(\xi )}({\bar{u}}(x, \xi )) \\&\quad \Leftrightarrow r({\bar{u}}(x,\xi ),\xi ) = 0 \text{ and } {\bar{u}}(x,\xi )\in C(\xi ), \text{ for } \text{ a.e. } \xi \in \varXi . \end{aligned}$$

One popular SVI-residual function which used in [20] is the regularized gap function [16]

$$\begin{aligned} r(u,\xi ) : = \max _{z\in C(\xi )} \left\{ \langle u-z, F(u, \xi ) \rangle - \frac{\alpha }{2}\Vert u-z\Vert ^2 \right\} . \end{aligned}$$
(3.1)

The use of residual functions \(\theta \) defined in (2.7) and r above for two-stage SVI (2.20)-(2.21) leads us to seeking a solution of the stochastic program

$$\begin{aligned}&\displaystyle {\min _{x\in X}} \,\theta (x) + \lambda {E}[r({\bar{u}}(x, \xi ), \xi ) + Q(x, \xi )],\nonumber \\&\mathrm{where } \,\, {\bar{u}}(x, \xi ) = x + Wu_\xi ^*, \;\; Q(x, \xi ) = \frac{1}{2}\langle u_\xi ^*, Hu_\xi ^* \rangle , \;\; \text{ for } \text{ a.e. } \xi \in \varXi ,\nonumber \\&\quad u_\xi ^* = \arg \min \{\frac{1}{2}\langle u, Hu \rangle | x+Wu\in C(\xi )\}. \end{aligned}$$
(3.2)

Assumption 1

Assume (i) W has full row rank and (ii) \(C(\xi )\subseteq K\), a compact convex set for all \(\xi \).

Theorem 1

Suppose Assumption 1 holds and r is a residual function defined in Definition 4. Then for any \(x\in D\) and a.e. \(\xi \in \varXi \), the function \(r({\bar{y}}(x,\xi ), \xi )+Q(x, \xi )\) in (3.2) is finite, nonnegative with

$$\begin{aligned} v({\bar{y}}(x, \xi ), \xi ) = \mathrm{prj}_{C(\xi )} ({\bar{y}}(x, \xi ) - \frac{1}{\alpha }F({\bar{y}}(x, \xi ), \xi )) \end{aligned}$$

as the unique maximizer of the maximization problem in (3.1).

Theorem 1 means that problem (3.2) is a two-stage stochastic program with complete recourse. However, the objective function of problem (3.2) involves minimizers of constrained quadratic programs for \(\xi \in \varXi \) and is not necessarily differentiable even when the sample is finite.

Assumption 2

The functions \(F(\cdot , \xi )\) and \(G(\cdot )\) are continuously differentiable for all \(\xi \in \varXi \). Moreover, for any compact set \(Y\subset \mathbb {R}^m\), there are functions \(d, \rho : \varXi \rightarrow \mathbb {R}_+\) such that \(\Vert F(y, \xi )\Vert \leqslant d_\xi \) and \(\Vert \nabla F(y, \xi )\Vert \leqslant \rho _\xi \) for all \(y\in Y\), where \(d\in L_1^\infty \) and \(\rho \in L_1^1\).

Lemma 1

Suppose Assumptions 12 hold. Then for a.e. \(\xi \), \(r(\cdot , \xi )\) is continuously differentiable and its gradient is given by

$$\begin{aligned} \nabla _y r(y, \xi ) = F(y, \xi ) - (\nabla _y F(y, \xi ) - \alpha I)(v(y, \xi ) - y). \end{aligned}$$

Moreover, for any measurable \(y(\xi )\in _{a.e.} C(\xi )\), both \(\xi \rightarrow r(y(\xi ), \xi )\) and \(\xi \rightarrow \nabla _y r(y(\xi ), \xi )\) are not only measurable but actually summable uniformly in \(y(\xi )\). In particular, this means that the objective function in (3.2) is well defined at any \(x\in D\), and the optimal value of (3.2) is finite.

By the idea of the L-shaped algorithm [25], they lead to the following problem, whose objective is smooth when the sample is finite:

$$\begin{aligned}&\displaystyle {\min _{x\in D}}\, \theta (x) + \lambda {E}\left[ r(y(\xi ), \xi ) + \frac{1}{2}\langle y(\xi ) -x, B(y(\xi )-x) \rangle \right] \nonumber \\&\mathrm{s.t.} y(\xi )\in _{a.e.} C(\xi ), \end{aligned}$$
(3.3)

and equivalent to

$$\begin{aligned}&\displaystyle {\min _{x\in D}}&\theta (x) + \lambda {E}\left[ r(y(\xi ), \xi ) + \frac{1}{2}\langle y(\xi ) -x, B(y(\xi )-x) \rangle + \delta _{C(\xi )}(y(\xi ))\right] , \end{aligned}$$
(3.4)

where \(B=(WH^{-1}W^T)^{-1}\), \(\delta _{C(\xi )}(\cdot )\) is the indicator function of the set \(C(\xi )\) which is zero in the inside of the set and is infinity otherwise. It is not hard to see that the optimal value of (3.4) is smaller than that of (3.2) since fewer restrictions are imposed on \(y(\xi )\). Hence, it follows from Lemma 1 that the optimal value of (3.4) is also finite. Moreover, by the interchange of minimization and integration [20, Lemma 3.6] and [26, Theorem 14.60], problem (3.4) is equivalent to

$$\begin{aligned}&\displaystyle {\min _{x\in D}} \,\phi (x):= \theta (x) + \lambda {E}[\psi (x,\xi )]\nonumber \\&\mathrm{s.t.} \psi (x,\xi ):=\min _{y(\xi )\in C(\xi )} r(y(\xi ), \xi ) + \frac{1}{2}\langle y(\xi ) -x, B(y(\xi )-x) \rangle . \end{aligned}$$
(3.5)

Theorem 2

Suppose Assumptions 12 hold. Then, problems (3.2) and (3.4) are solvable. Let \(v_1\) and \(v_2\) be the optimal values of (3.2) and (3.4), respectively. Then, \(v_1\geqslant v_2\). Moreover, if for any \(x\in D\) and \(x+Wy, x+Wz\in C(\xi )\), we have

$$\begin{aligned} |r(x+Wy, \xi ) - r(x+Wz, \xi )|\leqslant \frac{1}{2}|\langle y,Hy \rangle - \langle z,Hz\rangle |, \text{ for } \text{ a.e. } \xi \in \varXi , \end{aligned}$$

then the two problems have the same optimal value.

Note that since problem (3.4) is equivalent to (3.5), we can replace (3.4) by (3.5) in Theorem 2.

Under Assumptions 1-2, similar as Lemma 1, Chen et. al. [20] proved the smoothness of the objective function in the second-stage problem of (3.5) ([20, Proposition 3.8]). Moreover, they also considered the convexity of (3.5) ([20, Proposition 3.9 and Corollary 3.11]).

Let \(\{\xi _1, \cdots , \xi _N\}\) be the independent and identically distributed (i.i.d.) samples of \(\xi \) and

$$\begin{aligned} \begin{array}{l} G_N(x) = \frac{1}{N}\sum _{i=1}^N f(x, \xi _i),\\ \theta _N=\max _{v\in C}\langle x-v, G_N(x) \rangle - \frac{\alpha }{2}\Vert x-v\Vert ^2,\\ \psi (x, \xi _i) = \min _{y\in C(\xi )} r(\xi , y) + \frac{1}{2}\langle y-x, B(y-x) \rangle . \end{array} \end{aligned}$$

Then, the SAA problem of (3.5) is

$$\begin{aligned}&\displaystyle {\min _{x\in D}}&\phi _N(x):= \theta _N(x) + \frac{\lambda }{N}\sum _{i=1}^N\psi (x,\xi _i). \end{aligned}$$
(3.6)

In [20] , Chen et. al. gave the following convergence result.

Theorem 3

(Convergence theorem) Suppose Assumptions 12 hold. Then, \(\phi _N\) converges to \(\phi \) a.s.-uniformly on the compact set \({\bar{D}}\) such that \(S, S^*\subseteq {\bar{D}}\). Let \(\{x_N\}\) be a sequence of minimizers of problem (3.6) generated by iid samples. Then, \(\{x_N\}\) is \({\mathcal {P}}\)-a.e. bounded and any accumulation point \(x^*\) of \(\{x_N\}\) as \(v\rightarrow \infty \) is \({\mathcal {P}}\)-a.e. a solution of (3.5).

Chen et. al. [20] also considered an ERM formulation to solve the MPEC reformulation (2.24) of the two-stage SVI (2.22)–(2.23) as follows:

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{1}{\lambda \rho }\theta (x) + \frac{1}{\rho }{E}[r( u_\xi , \xi )] + {E}[\Vert u_\xi - x \Vert ^2]\\ \hbox {s.t.} &{} x\in D, u_\xi \in C(\xi ), \text{ for } \text{ a.e. } \xi \in \varXi , \end{array} \end{aligned}$$
(3.7)

where \(\theta (\cdot )\) and \(r( \cdot , \xi )\) are residual functions of \(0\in G(x)+ {\mathcal {N}}_D(x)\) and \(0\in _{a.e.} F(u_\xi , \xi )+ {\mathcal {N}}_{C(\xi )}(u_\xi )\), respectively. They then propose a nonconvex Douglas–Rachford splitting method [27, 28] to solve problem (3.7). The detailed model and the algorithm are given in [20, Section 5].

3.2 Progressive Hedging Algorithm for SVI

In the case when the random vectors follow a discrete distribution, for solving two-stage SVI (2.16)–(2.17) and multistage SVI (2.31), Rockafellar and Sun [23, 24] extended the progressive hedging algorithm (PHA) [29] for multistage stochastic optimization problem to multistage SVI and multistage stochastic Lagrangian variational inequalities. With the notation in Sect. 2.2, the progressive hedging algorithm for SVI is as follows:

figure a

The convergence of Algorithm 1 is given in [23] as follows:

Theorem 4

[23, Theorem 2] As long as the (monotone) variational inequality (2.30) has at least one solution, the sequence of pairs \((x^{\nu }(\cdot ), w^{\nu }(\cdot ))\) generated by Algorithm 1 will converge to pair \(({\bar{x}}(\cdot ), {\bar{w}}(\cdot ))\) satisfying (2.31) and thus furnish \({\bar{x}}(\cdot )\) as a solution to (2.30). The decrease will surely be at a linear rate if, in particular, the sets \(C(\xi )\) are polyhedral and functions \(F(\cdot , \xi )\) are affine.

Moreover, in [24], Rockefallar and Sun considered the multistage Lagrangian stochastic variational inequality (LSVI) problem. With the notation in Sect. 2.2, consider an N stage problem. Let \(X(\xi )\subset \mathbb {R}^n\) and \(Y(\xi )\subset \mathbb {R}^m\) be a pair of nonempty closed convex sets and

$$\begin{aligned} \mathcal{X}=\{x(\cdot )\in \mathcal{L}_n | x(\xi )\in X(\xi ), \forall \xi \}, \;\; \mathcal{Y}=\{y(\cdot )\in \mathcal{L}_m | y(\xi )\in Y(\xi ), \forall \xi \}, \end{aligned}$$

where, similar as \(x(\cdot )\), \(y(\cdot ): \xi \rightarrow y(\xi ) = (y_1(\xi ), \cdots , y_N(\xi ))\in \mathbb {R}^{m_1}\times \cdots \times \mathbb {R}^{m_N}=\mathbb {R}^m\). Likewise \({\mathscr {N}}_n\subset \mathcal{L}_n\) and its complement \({\mathscr {M}}_n\), we define the nonanticipativity subspace \({\mathscr {N}}_m\subset \mathcal{L}_m\) and its complement \({\mathscr {M}}_m\). Then, we introduce continuously differentiable functions

$$\begin{aligned} L(\cdot , \cdot , \xi ) \text{ on } X(\xi ) \times Y(\xi ) \text{ such } \text{ that } L(x, y,\xi ) \text{ is } \text{ convex } \text{ in } x \text{ and } \text{ concave } \text{ in } y \end{aligned}$$

and define

$$\begin{aligned} \varLambda (x(\cdot ), y(\cdot ))= & {} {E}_\xi [L(x(\xi ), y(\xi ), \xi )] \\= & {} \sum _{\xi \in \varXi } \pi (\xi )L(x(\xi ), y(\xi ), \xi ) \text{ for } x(\cdot )\in \mathcal{X}, y(\cdot )\in \mathcal{Y} \end{aligned}$$

and

$$\begin{aligned} \mathcal{F}(x(\cdot ), y(\cdot )) = (\nabla _{x(\cdot )}\varLambda ({\bar{x}}(\cdot ), {\bar{y}}(\cdot )), -\nabla _{y(\cdot )}\varLambda (x(\cdot ), y(\cdot ))) \end{aligned}$$

arises from component mappings

$$\begin{aligned} F(x,y,\xi )=(-\nabla _xL(x,y,\xi ), \nabla _yL(x,y,\xi )). \end{aligned}$$

Then, the multistage LSVI is as follows

$$\begin{aligned}&\text{ find } x(\cdot )\in \mathscr {N}_n, y(\cdot )\in \mathscr {N}_m, \text{ for } \text{ which } \exists \bar{w}\in \mathscr {M}_n, \bar{z}(\cdot )\in \mathscr {M}_m, \text{ such } \text{ that } \nonumber \\&-\nabla _x\varLambda ({\bar{x}}, {\bar{y}}) - {\bar{w}}(\cdot ) \in N_X({\bar{x}}), \nabla _y\varLambda ({\bar{x}}, {\bar{y}})+{\bar{z}}(\cdot ) \in N_Y({\bar{y}}), \end{aligned}$$
(3.9)

which is equivalent to having, for all scenarios \(\xi \in \varXi \),

$$\begin{aligned}&-\nabla _x L({\bar{x}}(\xi ), {\bar{y}}(\xi ), \xi ) - {\bar{w}}(\xi )\in N_{X(\xi )}({\bar{x}}(\xi )), \\&\nabla _yL({\bar{x}}(\xi ), {\bar{y}}(\xi ), \xi ) +{\bar{z}}(\xi )\in N_{Y(\xi )}({\bar{y}}(\xi )). \end{aligned}$$

The related progressive hedging algorithm is given in [24] as follows.

figure b

This version of progressive hedging inherits from the one for general stochastic variational inequalities in the preceding section the property that, as long as a solution exists, the sequence of iterates \((x^{\nu }(\cdot ), y^{\nu }(\cdot ), w^{\nu }(\cdot ), z^{\nu }(\cdot ))\) will converge to a particular solution \(({\bar{x}}(\cdot ),{\bar{y}}(\cdot ),{\bar{w}}(\cdot ),{\bar{z}}(\cdot ))\). They also consider the variant version of the algorithm when the parameters r are different in the x part and y part and apply Algorithm 2 to multistage stochastic optimization problem.

3.3 Discrete Approximation Methods

When random vectors follow a continuous distribution, the PHA cannot be applied to the two-stage SVI. In this case, Chen et al. [30] proposed a discrete approximation method for two-stage SLCP and Chen et al. [31] investigated the sample average approximation (SAA) method for two-stage stochastic generalized equation (SGE).

Discrete approximation for two-stage SLCP For the two-stage SLCP (2.18)–(2.19), Chen et al. [30] firstly investigated the existence and uniqueness of a solution under the assumptions as follows.

Assumption 3

There exists a positive continuous function \(\kappa (\xi )\) such that \({E}[\kappa (\xi )]<+\infty \) and for a.e. \(\xi \),

$$\begin{aligned} \begin{pmatrix} z^\mathrm{T}, u^\mathrm{T} \end{pmatrix} \begin{pmatrix} A &{} B(\xi )\\ N(\xi ) &{} M(\xi ) \end{pmatrix}\begin{pmatrix} z\\ u \end{pmatrix} \geqslant \kappa (\xi )(\Vert z\Vert ^2+\Vert u\Vert ^2), \; \;\; \forall z\in \mathbb {R}^n, \; u\in \mathbb {R}^{m}. \end{aligned}$$
(3.13)

Moreover, \({E}[\Vert B(\xi )\Vert ]<\infty \), \({E}[\Vert M(\xi )\Vert ]<\infty \), \({E}[\Vert N(\xi )\Vert ]<\infty \) and \({E}[\Vert q(\xi )\Vert ]<\infty \).

Under Assumption 3, some properties of the two-stage SLCP are given in [30] as follows:

Proposition 1

Let Assumption 3 hold. For any given x and \(\xi \in \varXi \), let \(D(x,\xi )\) be an m-dimensional diagonal matrix with

$$\begin{aligned} D_{jj}(x,\xi ) : = \left\{ \begin{array}{ll} 1, &{}\quad \mathrm{if } \; \big (M(\xi )y(\xi ) + N(\xi )x + q_2(\xi )\big )_j \leqslant y_j(\xi ),\\ 0, &{}\quad \mathrm{otherwise }. \end{array} \right. \end{aligned}$$

Let

$$\begin{aligned} W(x, \xi ) := [I - D(x, \xi ) (I- M(\xi ))]^{-1}D(x, \xi ) \end{aligned}$$
(3.14)

and

$$\begin{aligned} J(x, \xi ) := \{j : (M(\xi )y(\xi ) + N(\xi )x + q_2(\xi ))_j \leqslant y_j(\xi )\}. \end{aligned}$$

Then, the following assertions hold.

  1. (i)

    The two-stage SLCP (2.18)–(2.19) has a unique solution \((x^*, y^*(\cdot ))\in \mathbb {R}^n\times \mathcal{Y}\).

  2. (ii)

    The solution to the second stage of SLCP (2.18)–(2.19) can be written as

    $$\begin{aligned} {\bar{y}}(x,\xi ) =- W(x, \xi )(N(\xi )x + q_2(\xi )) \end{aligned}$$
    (3.15)

    and \({\bar{y}}(\cdot , \xi )\) is globally Lipschitz continuous w.r.t x.

  3. (iii)

    The first equation of SLCP (2.18)–(2.19) can be reformulated as

    $$\begin{aligned} 0\leqslant & {} x \; \perp \; (A - {E}[B(\xi ) W(x,\xi )N(\xi )])x \nonumber \\&-{E}[B(\xi ) W(x,\xi )q_2(\xi )] + q_1\geqslant 0, \end{aligned}$$
    (3.16)

    where

    $$\begin{aligned} \Vert (A - {E}[B(\xi ) W(x,\xi )N(\xi )])^{-1}\Vert \leqslant \frac{1}{{E}[\kappa (\xi )]}<+\infty . \end{aligned}$$
  4. (iv)

    Let

    $$\begin{aligned} F(x):= & {} \min \big (x, (A - {E}[B(\xi ) W(x,\xi )N(\xi )])x\nonumber \\&-{E}[B(\xi ) W(x,\xi )q_2(\xi )] + q_1\big ). \end{aligned}$$
    (3.17)

    Then, F is Lipschitz continuous and every matrix \(V_x\) in the Clarke generalized Jacobian \(\partial F(x)\) (see definition in [32, Section 2.6]) is nonsingular with \(\Vert V_x^{-1}\Vert \leqslant {\bar{d}}\) for some constant \({\bar{d}}>0\) which is independent of x.

Besides the existence and uniqueness of the two-stage SVI, the globally Lipschitz continuity and formulation (3.15) of \({\bar{y}}(\cdot , \xi )\) allow us to substitute \({\bar{y}}(\cdot , \xi )\) into the first-stage stochastic function \(A x + {E}[B(\xi )y(\xi )] + q_1\) and rewrite the two-stage SVI as one-stage SVI (3.16). Then, they proposed their discrete approximation method in [30] as follows.

Suppose \(\varXi \) is a compact and convex set. Let \(\{\varXi ^K_i\}\) be a partition of the support set \(\varXi \), where \(\varXi ^K_i\) is a compact and convex subset of \(\varXi \) such that

$$\begin{aligned} \bigcup _{i=1}^K \varXi ^K_i = \varXi , \;\; \text{ int } \varXi ^K_i \cap \text{ int } \varXi ^K_j = \varnothing , \;\;\forall \; i \ne j, \;\; i, j=1, \ldots , K, \end{aligned}$$

where \(\text{ int } S\) denotes the interior of S and K denotes the number of partitions. Note that since \(\varXi \) is assumed to be a compact set, each \(\varXi ^K_i\) is also a compact set. Let

$$\begin{aligned} {E}_{\varXi ^K_i}[H(\xi )] := \frac{1}{p_i^K}\int _{\xi \in \varXi ^K_i}H(\xi ) {\mathcal {P}}(d\xi ) \;\; \text{ with } \;\; p_i^K = {\mathcal {P}}(\varXi ^K_i) \end{aligned}$$
(3.18)

for \(H(\xi ) = M(\xi ), N(\xi ), B(\xi )\) or \(q_2(\xi )\). Let

$$\begin{aligned} \varDelta (\varXi _i^K):= \displaystyle {\max _{\xi _1, \xi _2\in \varXi _i^K}}\Vert \xi _1- \xi _2\Vert \end{aligned}$$
(3.19)

denote the diameter of \(\varXi _i^K\). We require \(\max _{i\in {\bar{K}}} \varDelta (\varXi _i^K) \rightarrow 0\) as \(K\rightarrow \infty \). Let \(p_i^K:=P(\varXi _i^K)\). Then, a discrete approximation of two-stage SLCP (2.18)–(2.19) is

$$\begin{aligned}&0\leqslant x \perp A x + \sum _{i=1}^Kp_i^K{E}_{\varXi _i^K}[B(\xi )]\mathbf{y}_i + q_1\geqslant 0, \end{aligned}$$
(3.20)
$$\begin{aligned}&0\leqslant \mathbf{y}_i \perp {E}_{\varXi ^K_i}[M(\xi )]\mathbf{y}_i + {E}_{\varXi ^K_i}[N(\xi )]x + {E}_{\varXi ^K_i}[q_2(\xi )]\geqslant 0, \nonumber \\&\quad i \in 1, \ldots , K. \end{aligned}$$
(3.21)

Moreover, let \((x^K, \mathbf{y}^K)\) denote the solution of (3.20)–(3.21) whereby we write \(\mathbf{y}^K\) for \((\mathbf{y}_1^K,\cdots ,\mathbf{y}_K^K)\). Let

$$\begin{aligned} y^K(\xi ) := \sum _{i=1}^K \mathbf{y}_i^K{\mathbf {1}}_{\varXi ^K_i}(\xi ). \end{aligned}$$
(3.22)

The following theorem states the convergence of \((x^K, y^K( \xi ))\) to \((x^*, y^*(\xi ))\), the true solution of the two-stage SLCP (2.18)–(2.19), as \(K\rightarrow \infty \).

Theorem 5

Under Assumption 3, the following assertions hold.

  1. (i)

    The complementarity problem (3.20)–(3.21) has a unique solution \((x^K, \mathbf{y}^K)\).

  2. (ii)

    If, in addition, \(\max _{i\in \{1, \ldots , K\}} \varDelta (\varXi _i^K) \rightarrow 0\), then \(\{(x^K, y^K( \cdot ))\}\) is bounded on \(\mathbb {R}^n\times \mathcal{Y}\), where the boundedness of \(y^K( \cdot )\) is in the sense of the norm topology of \({\mathcal {L}}_1(\mathcal{Y})\).

  3. (iii)

    \(\{x^K, y^K( \cdot )\}\) converges to the true solution \((x^*, y^*( \cdot ))\) of problem (2.18)–(2.19), where the convergence of \(\{y^K( \cdot )\} \rightarrow y^*( \cdot ) \) is in the sense of the norm topology of \({\mathcal {L}}_2(\mathcal{Y})\).

To establish the quantitative convergence analysis, we need more assumptions.

Assumption 4

\(M(\cdot )\), \(N(\cdot )\), \(q_2(\cdot )\) and \(B(\cdot )\) are Lipschitz continuous over a compact set containing \(\varXi \) with Lipschitz constant L.

Theorem 6

Under Assumptions 3 and 4, there exists a positive number \(\gamma \geqslant 0\) and nonnegative integrably bounded functions \(c(\xi )\) and \(h(\xi )\) such that

$$\begin{aligned} \Vert x^K - x^*\Vert \leqslant \gamma {E}[\Vert B(\xi )\Vert c(\xi )]L\max _{i\in {\bar{K}}} \varDelta (\varXi _i^K) \end{aligned}$$
(3.23)

and

$$\begin{aligned} \Vert y^K( \xi ) - y^*( \xi )\Vert \leqslant h(\xi )L\max _{i\in {\bar{K}}}\varDelta (\varXi _i^K), \;\; \text{ for } \; \text{ a.e. } \xi \in \varXi . \end{aligned}$$
(3.24)

Sample average approximation method for two-stage SGE. In Chen et al. [31] , the authors considered the following two-stage (SGE):

$$\begin{aligned}&0\in {E}[\varPhi (x, y(\xi ), \xi )]+ \Gamma _1(x),\;x\in X, \end{aligned}$$
(3.25)
$$\begin{aligned}&0\in \varPsi (x, y(\xi ), \xi )+\Gamma _2(y(\xi ), \xi ), \;\; \text{ for } \text{ a.e. }\; \xi \in \varXi , \end{aligned}$$
(3.26)

where \(X\subseteq \mathbb {R}^n\) is a nonempty closed convex set, \(\xi \) is a random vector with support set \(\varXi \subset \mathbb {R}^d\) defined as above, \(\varPhi :\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^d\rightarrow \mathbb {R}^n\) and \(\varPsi :\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^d\rightarrow \mathbb {R}^m\) such that \(\varPhi (\cdot ,\cdot ,\xi )\) and \(\varPsi (\cdot ,\cdot ,\xi )\) are Lipschitz continuous with Lipschitz modulus \(\kappa _{\varPhi }(\xi )\) and \(\kappa _{\varPsi }(\xi )\), respectively, \(\Gamma _1:\mathbb {R}^n\rightrightarrows \mathbb {R}^n\), \(\Gamma _2:\mathbb {R}^m\times \varXi \rightrightarrows \mathbb {R}^m\) are multifunctions (point-to-set mappings) and \(y(\cdot ) \in \mathcal{Y}\) with \(\mathcal{Y}\) being the space of measurable functions from \(\varXi \) to \(\mathbb {R}^m\) such that the expected value in (3.25) is well defined.

Without assuming relatively complete recourse, the authors of [31] studied convergence and the exponential rate of convergence of the SAA

$$\begin{aligned}&0\in N^{-1}\sum _{j=1}^N \varPhi (x, y_j, \xi ^j) + \Gamma _1(x),\;x\in X, \end{aligned}$$
(3.27)
$$\begin{aligned}&0\in \varPsi (x, y_j, \xi ^j)+\Gamma _2(y_j, \xi ^j), \;\; j=1,...,N \end{aligned}$$
(3.28)

of the two-stage SGE (2.16)–(2.17) with \(y_j\) being a copy of the second-stage vector for \(\xi =\xi ^j\), \(j=1,\cdots ,N\), where N denotes the sample size, \(\xi ^1,\cdots ,\xi ^N\) is an independent identically distributed (iid) sample of random vector \(\xi \).

To investigate the convergence analysis without assuming relatively complete recourse, some notations and assumptions are needed. Denote by \(\mathcal{X}\) the set of \(x\in X\) such that the second-stage generalized equation (3.26) has a solution. For every \(\xi \in \varXi \), by \({\bar{\mathcal{X}}}(\xi )\), we denote the set of \(x\in X\) such that the second-stage problem

$$\begin{aligned} 0 \in \varPsi (x, y, \xi )+\Gamma _2(y, \xi ) \end{aligned}$$
(3.29)

has a solution, \(\cap _{\xi \in \varXi }{\bar{\mathcal{X}}}(\xi ) = \mathcal{X}\) and \({\bar{\mathcal{X}}}_N:=\cap _{j=1}^N{\bar{\mathcal{X}}}(\xi ^j)\) the set of x such that problem (3.28) has a solution. The condition of relatively complete recourse means that \(X ={\bar{\mathcal{X}}}_N\).

Assumption 5

For a.e. \(\xi \in \varXi \), problem (3.29) has a unique solution for any \(x\in \mathcal{X}\).

Assumption 6

For every \(\xi \) and \(x\in {\bar{\mathcal{X}}}(\xi )\), there is a neighborhood \(\mathcal{V}\) of x and a measurable function \(v(\xi )\) such that \(\Vert {\hat{y}}(x',\xi )\Vert \leqslant v(\xi )\) for all \(x'\in \mathcal{V}\cap {\bar{\mathcal{X}}}(\xi )\).

Lemma 2

Suppose that Assumptions 5 and 6 hold, and for a.e. \(\xi \in \varXi \) the multifunction \(\Gamma _2(\cdot ,\xi )\) is closed. Then for a.e. \(\xi \in \varXi \), the solution \({\hat{y}}(x,\xi )\) is a continuous function of \(x\in \mathcal{X}\).

By Lemma 2, the first-stage problem (3.25) can be written as the following generalized equation:

$$\begin{aligned} 0\in \phi (x)+ \Gamma _1(x),\;x\in \mathcal{X}, \end{aligned}$$
(3.30)

where

$$\begin{aligned} {\hat{\varPhi }}(x,\xi ):= \varPhi (x,{\hat{y}}(x,\xi ),\xi ) \,\;\mathrm{and}\;\, \phi (x):={E}[{\hat{\varPhi }}(x,\xi )] \end{aligned}$$
(3.31)

are continuous functions. Then, consider the SAA problem (3.27)–(3.28). Similar as (3.30), by Lemma 2 the SAA problem (3.27)–(3.28) can be written as

$$\begin{aligned} 0\in {\hat{\phi }}_N(x)+ \Gamma _1(x),\;x\in {\bar{\mathcal{X}}}_N, \end{aligned}$$
(3.32)

where \({\hat{\phi }}_N(x):=N^{-1}\sum _{j=1}^N {\hat{\varPhi }}(x,\xi ^j)\) with \({\hat{\varPhi }}(x,\xi )\) defined in (3.31). Denote by \({\mathcal {S}}^*\) the set of solutions of the first-stage problem (3.30) and by \({\hat{{\mathcal {S}}}}_N\) the solution set of the SAA problem (3.32).

For \(\delta \in (0,1)\), consider a compact set \({\bar{\varXi }}_\delta \subset \varXi \) such that \({\mathcal {P}}({\bar{\varXi }}_\delta )\geqslant 1-\delta \), and the multifunction \(\varDelta _\delta : X\rightrightarrows {\bar{\varXi }}_\delta \) defined as

$$\begin{aligned} \varDelta _\delta (x) : = \{\xi \in {\bar{\varXi }}_\delta : x\in {\bar{\mathcal{X}}}(\xi )\}. \end{aligned}$$
(3.33)

Assumption 7

For any \(\delta \in (0,1)\), the multifunction \(\varDelta _\delta (\cdot )\) is outer semicontinuous.

The following lemma shows that this assumption holds under mild conditions.

Lemma 3

Suppose \(\varPsi (\cdot ,\cdot ,\cdot )\) is continuous, \(\Gamma _2(\cdot ,\cdot )\) is closed and Assumption 6 holds. Then, the multifunction \(\varDelta _\delta (\cdot )\) is outer semicontinuous.

Then, the almost sure convergence result is given as follows:

Theorem 7

Suppose that: (i) Assumptions 57 hold, (ii) the multifunctions \(\Gamma _1(\cdot )\) and \(\Gamma _2(\cdot ,\xi )\), \(\xi \in \varXi \), are closed, (iii) there is a compact subset \(X'\) of X such that \({\mathcal {S}}^*\subset X'\) and w.p.1 for all N large enough the set \({\hat{{\mathcal {S}}}}_N\) is nonempty and is contained in \(X'\), (iv) \(\Vert {\hat{\varPhi }}(x,\xi )\Vert _{x\in \mathcal{X}}\) is dominated by an integrable function, (v) the set \(\mathcal{X}\) is nonempty. Let \( {{\mathfrak {d}}}_N := {\mathbb {D}}\big ( {\bar{\mathcal{X}}}_N\cap X', \mathcal{X}\cap X' \big ). \) Then, \({\mathcal {S}}^*\) is nonempty and the following statements hold.

  1. (a)

    \({{\mathfrak {d}}}_N\rightarrow 0\) and \({\mathbb {D}}({\hat{{\mathcal {S}}}}_N,{\mathcal {S}}^*)\rightarrow 0\) w.p.1 as \(N\rightarrow \infty \).

  2. (b)

    In addition, assume that: (vi) for any \(\delta >0\), \(\tau >0\) and a.e. \(\omega \in \varOmega \), there exist \(\gamma >0\) and \(N_0\) such that for any \(x\in \mathcal{X}\cap X' + \gamma \,\mathcal{B}\) and \(N\geqslant N_0\), there exists \(z_x\in \mathcal{X}\cap X'\) such thatFootnote 1

    $$\begin{aligned} \Vert z_x - x\Vert \leqslant \tau ,\;\; \Gamma _1(x) \subseteq \Gamma _1(z_x) + \delta \mathcal{B},\;\; \mathrm{and}\;\; \Vert {\hat{\phi }}_N(z_x) - {\hat{\phi }}_N(x)\Vert \leqslant \delta . \end{aligned}$$
    (3.34)

    Then, w.p.1 for N large enough it follows that

    $$\begin{aligned} {\mathbb {D}}(\hat{{\mathcal {S}}}_N, {\mathcal {S}}^*) \leqslant \tau + R^{-1}\left( \,\sup _{x\in \mathcal{X}\cap X'} \Vert \phi (x) - {\hat{\phi }}_N(x)\Vert \right) , \end{aligned}$$
    (3.35)

    where for \(\epsilon \geqslant 0\) and \(t\geqslant 0\),

    $$\begin{aligned}&R(\epsilon ):= \inf _{x\in \mathcal{X}\cap X',\, d(x, {\mathcal {S}}^*)\geqslant \epsilon } d\big (0, \phi (x) + \Gamma _1(x)\big ), \\&R^{-1}(t): = \inf \{ \epsilon \in \mathbb {R}_+: R(\epsilon ) \geqslant t \}. \end{aligned}$$

Note that in the case when \(\Gamma _1(\cdot ) := {\mathcal {N}}_D(\cdot )\) with a nonempty polyhedral convex set D, the first and second inequalities of (3.34) hold automatically.

To drive the exponential rate of convergence based on uniform large deviations theorem (cf., [33,34,35]), more assumptions are needed.

Assumption 8

For a.e. \(\xi \in \varXi \), there exists a unique, parametrically CD-regular [36] solution \({\bar{y}} = {\hat{y}}({\bar{x}}, \xi )\) of the second-stage generalized equation (2.17) for all \({\bar{x}} \in \mathcal{X}\).

Assumption 9

The set \(\mathcal{X}\) is convex, its interior \(\mathrm{int}(\mathcal{X})\ne \varnothing \), and for a.e. \(\xi \in \varXi \), the generalized equation,

$$\begin{aligned} 0\in G_{{\bar{x}}}(y) = \varPsi ({\bar{x}}, {\bar{y}}, \xi ) + J ( y - {\bar{y}} ) + \Gamma _2(y,\xi ), \; \text{ for } \text{ which } \; G_{{\bar{x}}}({\bar{y}}) \ni 0, \end{aligned}$$

has a locally Lipschitz continuous solution function at 0 for \({\bar{y}}\) with Lipschitz constant \(\kappa _{G}({\bar{x}},\xi )\) for any \({\bar{x}}\in \mathcal{X}\) and there exists a measurable function \({\bar{\kappa }}_G: \varXi \rightarrow \mathbb {R}_+\) such that \(\kappa _{G}({\bar{x}},\xi )\leqslant {\bar{\kappa }}_G(\xi )\) and \({E}[{\bar{\kappa }}_G(\xi )\kappa _\varPsi (\xi )]<~\infty \).

Let

$$\begin{aligned} M^i_x(t):={E}\left\{ \mathrm{exp}\big (t[{\hat{\varPhi }}_i(x,\xi )-\phi _i(x)]\big )\right\} \end{aligned}$$

be the moment generating function of the random variable \({\hat{\varPhi }}_i(x,\xi )-\phi _i(x)\), \(i=1, \dots , n\), and

$$\begin{aligned} M_\kappa (t):={E}\left\{ \mathrm{exp}\left( t\big [\kappa _\varPhi (\xi )+ \kappa _\varPhi (\xi )\kappa (\xi ) - {E}\big [\kappa _\varPhi (\xi )+ \kappa _\varPhi (\xi )\kappa (\xi )\big ]\big ] \right) \right\} . \end{aligned}$$

Assumption 10

For every \(x\in \mathcal{X}\) and \(i=1, \cdots , n\), the moment generating functions \(M^i_x(t)\) and \(M_\kappa (t)\) have finite values for all t in a neighborhood of zero.

Theorem 8

Suppose: (i) Assumptions 5, 710 hold, (ii) \({\mathcal {S}}^*\) is nonempty and w.p.1 for N large enough, \({\hat{{\mathcal {S}}}}_N\) are nonempty, (iii) the multifunctions \(\Gamma _1(\cdot )\) and \(\Gamma _2(\cdot ,\xi )\), \(\xi \in \varXi \), are closed and monotone. Then, the following statements hold.

  1. (a)

    For sufficiently small \(\epsilon >0\), there exist positive constants \(\varrho =\varrho (\epsilon )\) and \(\varsigma =\varsigma (\epsilon )\), independent of N, such that

    $$\begin{aligned} P \left\{ \sup _{x\in \mathcal{X}}\big \Vert {\hat{\phi }}_N(x)-\phi (x)\big \Vert \geqslant \epsilon \right\} \leqslant \varrho (\epsilon ) \mathrm{e}^{-N\varsigma (\epsilon )}. \end{aligned}$$
    (3.36)
  2. (b)

    Assume in addition: (iv) the condition of part (b) in Theorem 7 holds and w.p.1 for N sufficiently large,

    $$\begin{aligned} {\mathcal {S}}^*\cap \mathrm{cl}\big (\mathrm{bd}(\mathcal{X}) \cap \mathrm{int}({\bar{\mathcal{X}}}_N)\big ) =\varnothing . \end{aligned}$$
    (3.37)

    (v) \(\phi (\cdot )\) has the following strong monotonicity property for every \(x^*\in {\mathcal {S}}^*\):

    $$\begin{aligned} (x-x^*)^\top (\phi (x)-\phi (x^*)) \geqslant g(\Vert x-x^*\Vert ),\;\forall x\in \mathcal{X}, \end{aligned}$$
    (3.38)

    where \(g:\mathbb {R}_+\rightarrow \mathbb {R}_+\) is such a function that function \({{\mathfrak {r}}}(\tau ):=g(\tau )/\tau \) is monotonically increasing for \(\tau >0\).

    Then, \({\mathcal {S}}^*=\{x^*\}\) is a singleton, and for any sufficiently small \(\epsilon >0\), there exists N sufficiently large such that

    $$\begin{aligned} P \left\{ {\mathbb {D}}(\hat{{\mathcal {S}}}_N, {\mathcal {S}}^*)\geqslant \epsilon \right\} \leqslant \varrho \left( {{\mathfrak {r}}}^{-1}(\epsilon )\right) \exp \left( -N\varsigma \big ({{\mathfrak {r}}}^{-1}(\epsilon )\big )\right) , \end{aligned}$$
    (3.39)

    where \(\varrho (\cdot )\) and \(\varsigma (\cdot )\) are defined in (3.36), and \({{\mathfrak {r}}}^{-1}(\epsilon ):=\inf \{\tau >0:{{\mathfrak {r}}}(\tau )\geqslant \epsilon \}\) is the inverse of \({{\mathfrak {r}}}(\tau )\).

Moreover, Chen et al. [31] investigated the convergence properties of the two-stage SGE (2.16)–(2.17) when \(\varPhi (x, y, \xi )\) and \(\varPsi (x, y, \xi )\) are continuously differentiable w.r.t. (xy) for a.e. \(\xi \in \varXi \) and \(\Gamma _1(x):={\mathcal {N}}_{D}(x)\) and \(\Gamma _2(y):={\mathcal {N}}_{\mathbb {R}^m_+}(y)\) with \(D\subseteq \mathbb {R}^n\) being a nonempty, polyhedral, convex set. That is, they considered the mixed two-stage SVI-NCP

$$\begin{aligned}&0\in {E}[\varPhi (x, y(\xi ), \xi )] + {\mathcal {N}}_D(x), \end{aligned}$$
(3.40)
$$\begin{aligned}&0\leqslant y(\xi )\perp \varPsi (x, y(\xi ), \xi ) \geqslant 0, \;\; \text{ for } \text{ a.e. }\; \xi \in \varXi , \end{aligned}$$
(3.41)

and studied convergence analysis of its SAA problem

$$\begin{aligned}&0 \in N^{-1}\sum _{j=1}^N \varPhi (x, y(\xi ^j), \xi ^j) + {\mathcal {N}}_D(x), \end{aligned}$$
(3.42)
$$\begin{aligned}&0\leqslant y(\xi ^j) \perp \varPsi (x, y(\xi ^j), \xi ^j)\geqslant 0, \;\; j=1,...,N. \end{aligned}$$
(3.43)

They first investigated the properties of the second-stage problems.

Assumption 11

For a.e. \(\xi \in \varXi \) and all \(x\in \mathcal{X}\cap D\), \( \varPsi (x, \cdot , \xi ) \) is strongly monotone, that is, there exists a positive-valued measurable \(\kappa _y(\xi )\) such that for all \(y,u\in \mathbb {R}^m\),

$$\begin{aligned} \left\langle \varPsi (x, y, \xi ) -\varPsi (x, u, \xi ), y - u \right\rangle \geqslant \kappa _y(\xi )\Vert y - u\Vert ^2 \end{aligned}$$

with \({E}[\kappa _y(\xi )] < +\infty \).

Theorem 9

Let \(\varPsi : \mathbb {R}^n\times \mathbb {R}^m\times \varXi \rightarrow \mathbb {R}^m\) be Lipschitz continuous and continuously differentiable over \(\mathbb {R}^n\times \mathbb {R}^m \) for a.e. \(\xi \in \varXi \). Suppose Assumption 11 holds and \(\varPhi (x, y, \xi )\) is continuously differentiable w.r.t. (xy) for a.e. \(\xi \in \varXi \). Then, for a.e. \(\xi \in \varXi \) and \(x\in \mathcal{X}\), the following holds.

  1. (a)

    The second-stage SNCP (3.41) has a unique solution \({\hat{y}}(x, \xi )\) which is parametrically CD-regular and the mapping \(x \mapsto {\hat{y}}(x, \xi )\) is Lipschitz continuous over \(\mathcal{X}\cap X'\), where \(X'\) is a compact subset of \(\mathbb {R}^n\).

  2. (b)

    The Clarke Jacobian of \({\hat{y}}(x, \xi )\) w.r.t. x is as follows

    $$\begin{aligned} \begin{array}{lll} &{}\partial {\hat{y}}(x,\xi )= \text{ conv }\left\{ \displaystyle {\lim _{z\rightarrow x}} \nabla _z {\hat{y}}(z,\xi ) : \nabla _z {\hat{y}}(z, \xi )\right. \\ &{}\left. ~~~\,~~\;\;\;\;\;\;\;= -[I - J_{\alpha }(I-M(z, {\hat{y}}(z,\xi ), \xi ))]^{-1}D_{\alpha }L(z, {\hat{y}}(z,\xi ), \xi ) \right\} , \end{array} \end{aligned}$$

    where \(M(x, y, \xi ) = \nabla _y \varPsi (x, y, \xi )\), \(L(x, {\hat{y}}(x, \xi ), \xi ) = \nabla _x \varPsi (x, {\hat{y}}(x, \xi ), \xi )\),

    $$\begin{aligned} \alpha = \{i:({\hat{y}}(x, \xi ))_i>(\varPsi (x, {\hat{y}}(x, \xi ), \xi ))_i\}, \end{aligned}$$

    \(J_{\alpha } \) is an m-dimensional diagonal matrix and

    $$\begin{aligned} (J_{\alpha })_{jj} : = \left\{ \begin{array}{ll} 1, &{}\quad \mathrm{if } \; j\in \alpha ,\\ 0, &{}\quad \mathrm{otherwise }. \end{array} \right. \end{aligned}$$
    (3.44)

Under Assumption 11, the two-stage SVI-NCP can be reformulated as a single-stage SVI with \({\hat{\varPhi }}(x, \xi ) = \varPhi (x, {\hat{y}}(x, \xi ), \xi )\) and \(\phi (x) = {E}[{\hat{\varPhi }}(x, \xi )]\) as follows

$$\begin{aligned} 0 \in \phi (x) + {\mathcal {N}}_C(x). \end{aligned}$$
(3.45)

To investigate the properties of two-stage SVI-NCP, more assumptions are needed. Let

$$\begin{aligned} \Theta (x, y(\xi ), \xi )= \begin{pmatrix} \varPhi (x, y(\xi ), \xi )\\ \varPsi (x, y(\xi ), \xi ) \end{pmatrix} \end{aligned}$$

and \(\nabla \Theta (x, y, \xi )\) be the Jacobian of \(\Theta \). Then,

$$\begin{aligned} \nabla \Theta (x, y, \xi ) =\begin{pmatrix} A(x, y, \xi ) &{} B(x, y, \xi )\\ L(x, y, \xi ) &{} M(x, y, \xi ) \end{pmatrix}, \end{aligned}$$

where \(A(x, y, \xi ) = \nabla _x \varPhi (x, y, \xi )\), \(B(x, y, \xi ) = \nabla _y \varPhi (x, y, \xi )\), \(L(x, y, \xi ) = \nabla _x \varPsi (x, y, \xi )\) and \(M(x, y, \xi ) = \nabla _y \varPsi (x, y, \xi )\).

Assumption 12

For a.e. \(\xi \in \varXi \), \(\Theta (x, y(\xi ), \xi )\) is strongly monotone with parameter \(\kappa (\xi )\) at \((x, y(\cdot ))\in C\times \mathcal{Y}\), where \({E}[\kappa (\xi )] < +\infty \).

Theorem 10

Let \(\mathrm{Sol}^*\) be the solution set of the mixed SVI-NCP (3.40)–(3.41). Suppose (i) there exists a compact set \(X'\) such that \(\mathrm{Sol}^*\cap X'\times \mathcal{Y}\) is nonempty, (ii) Assumption 12 holds over Sol\(^*\cap X'\times \mathcal{Y}\) and (iii) assume

$$\begin{aligned}&{E}[\Vert A(x, {\hat{y}}(x, \xi ), \xi ) - B(x, {\hat{y}}(x, \xi ), \xi ) M(x, {\hat{y}}(x, \xi ), \xi )^{-1}L(x, {\hat{y}}(x, \xi ), \xi )\Vert ]\nonumber \\&<+\infty \end{aligned}$$
(3.46)

over \(\mathcal{X}\cap X'\). Then,

  1. (a)

    For any \((x, y(\cdot ))\in \mathrm{Sol}^*\), every matrix in \(\partial {\hat{\varPhi }}(x)\) is positive definite, and \({\hat{\varPhi }}\) and \(\phi \) are strongly monotone at x.

  2. (b)

    Any solution \(x^*\in {\mathcal {S}}^*\cap X'\) of SVI (3.45) is CD-regular and an isolate solution.

  3. (c)

    Moreover, if replacing conditions (i) and (ii) by (iv) Assumption 12 holds over \(\mathbb {R}^n\times \mathcal{Y}\), then SVI (3.45) has a unique solution \(x^*\) and the solution is CD-regular.

Here the definition of CD-regular can be found in [36].

Then, the properties and the convergence analysis of SAA problem can be investigated. Define

$$\begin{aligned} {\mathcal {G}}_N(x, y(\cdot )) := \begin{pmatrix} N^{-1}\sum _{j=1}^N \varPhi (x, y(\xi ^j), \xi ^j)\\ \varPsi (x, y(\xi ^1), \xi ^1)\\ \vdots \\ \varPsi (x, y(\xi ^N), \xi ^N) \end{pmatrix}. \end{aligned}$$

Theorem 11

Suppose Assumption 12 holds over \(C \times \mathcal{Y}\) and \(\varPhi (x, y, \xi )\), and \(\varPsi (x, y, \xi )\) are continuously differentiable w.r.t. (xy) for a.e. \(\xi \in \varXi \). Then,

  1. (a)

    \({\mathcal {G}}_N: C\times \mathcal{Y}\rightarrow C\times \mathcal{Y}\) is strongly monotone with \(N^{-1}\sum _{j=1}^N\kappa (\xi ^j)\) and hemicontinuous.

  2. (b)

    The SAA two-stage SVI (3.42)–(3.43) has a unique solution.

Note that function \( {\hat{\varPhi }}(x,\xi )=\varPhi (x, {\hat{y}}(x,\xi ),\xi ), \) where \({\hat{y}}(x,\xi )\) is a solution of the second-stage problem (3.41). Then, the first stage of SAA problem with second-stage solution can be written as

$$\begin{aligned} 0\in N^{-1}\sum _{j=1}^N {\hat{\varPhi }}(x,\xi ^j) + {\mathcal {N}}_D(x). \end{aligned}$$
(3.47)

In what follows, the almost sure and exponential rate of convergence are established in [31] as follows.

Theorem 12

Suppose conditions (i)–(iii) of Theorem 10 hold. Let \(x^*\) be a solution of SVI (3.45) and \(X'\) be a compact set such that \(x^*\in \mathrm{int}(X')\). Assume there exists \(\epsilon >0\) such that for N sufficiently large,

$$\begin{aligned} x^*\notin \mathrm{cl}( \mathrm{bd}(\mathcal{X}) \cap \mathrm{int}( {\bar{\mathcal{X}}}_N\cap X')). \end{aligned}$$
(3.48)

Then, there exists a solution \({\hat{x}}_N\) of the SAA problem (3.47) and a positive scalar \({\delta }\) such that \(\Vert {\hat{x}}_N - x^*\Vert \rightarrow 0\) as \(N\rightarrow \infty \) w.p.1 and for N sufficiently large w.p.1

$$\begin{aligned} \Vert {\hat{x}}_N - x^*\Vert \leqslant {\delta } \sup _{x\in \mathcal{X}\cap X'} \Vert {\hat{\phi }}_N(x) - \phi (x) \Vert . \end{aligned}$$
(3.49)

Theorem 13

Let \(X'\subset D\) be a convex compact subset such that \(\mathcal{B}_\delta (x^*)\subset X'\). Suppose the conditions in Theorem 12 and Assumption 10 hold. Then for any \(\epsilon >0\), there exist positive constants \({\delta }>0\) (independent of \(\epsilon \)), \(\varrho =\varrho (\epsilon )\) and \(\varsigma =\varsigma (\epsilon )\) (independent of N) such that

$$\begin{aligned} {{P}} \left\{ \sup _{x\in \mathcal{X}}\big \Vert {\hat{\phi }}_N(x)-\phi (x)\big \Vert \geqslant \epsilon \right\} \leqslant \varrho (\epsilon ) \mathrm{e}^{-N\varsigma (\epsilon )}, \end{aligned}$$
(3.50)

and

$$\begin{aligned} {{P}} \left\{ \Vert x_N-x^*\Vert \geqslant \epsilon \right\} \leqslant \varrho (\epsilon /{\delta }) \mathrm{e}^{-N\varsigma (\epsilon /{\delta })}. \end{aligned}$$
(3.51)

4 Applications

Two-stage SVI has wide applications in economics, traffic network, electricity markets, supply chain problems, finance, risk management under uncertain environment. One type of two-stage SVI involves making a “here-and-now” decisions at the present time to meet the uncertainty that are revealed at a later time. This is one of a motivation of both two-stage stochastic optimization and two-stage stochastic Nash equilibrium problem (SNEP), such as Example 2. [37] discussed a scenario-based dynamic oligopolistic problem under uncertainty. In electricity market, [15, 38,39,40] considered capacity expansion problem under uncertain environment. [41] investigated the supply-side risk in uncertainty power market. [42, 43] discussed two-settlement markets consisting of a deterministic (first-stage) spot market and a stochastic (second-stage) market known as the forward. [44] presented a stochastic complementarity model of equilibrium in an electric power market with uncertainty power demand. [45] presented Nash equilibrium models of perfectly competitive capacity expansion involving risk-averse participants in the presence of discrete state uncertainties and pricing mechanisms of different kinds. [46] modeled a production and supply competition of a homogenous product under uncertainty in an oligopolistic market by a two-stage SVI, and used the model to describe the market share observation in the world market of crude oil. Here we introduce two important applications which form traffic equilibrium problems and noncooperative multi-agent games under uncertain environment.

Traffic equilibrium problems Recently, Chen et al. [20] considered a traffic equilibrium problem under uncertain environment (uncertain capacity and demand) by following the Ferris–Pang multicommodity formulation [47] which associates a (different) commodity with each destination node \(d\in \mathcal{D}\subset \mathcal{G}\). Let \(N({\mathcal {G}}, {\mathcal {A}})\) be an oriented network, the random vectors \(c_\xi ^j:=((c_\xi ^j)_1, \ldots , (c_\xi ^j)_{|\mathcal{A}|})^\mathrm{T}\) with \((c_\xi ^j)_a\) the maximum flow capacity for each commodity j arc a and demand \(d_\xi ^j=((d_\xi ^j)_1, (d_\xi ^j)_2, \ldots , (d_\xi ^j)_{|\mathcal{G}|})\) with \((d_\xi ^j)_i\) the demand for each commodity j at each node i, \(a=1, \cdots , |\mathcal{A}|\), \(j=1, \cdots , |\mathcal{D}|\), \(i=1, \cdots , |\mathcal{G}|\). Let \(h^{od}_\xi \) be the demand for each origin(O)-destination(D) pairs. \(R^{od}_\xi \) are all (acyclic) routes r connecting o to d with V being the arcs(a)/routes(r) incidence matrix, i.e., \(V_{a,r}=1\) if arc \(a\in r\). A route flow \(f_\xi =\{f^r_\xi , r\in \cup _{od} R^{od}_\xi \}\) results in an arc flow \(u^a_\xi = \langle V_a, f_\xi \rangle \). Then, \(Vu^j_\xi = d_\xi ^j\), \(u^j_\xi \geqslant 0\), \(j\in \mathcal{D}\). Let

$$\begin{aligned} A= & {} \begin{pmatrix} V &{} &{}\\ &{} \ddots &{}\\ &{} &{} V \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||{\mathcal {N}}|\times |\mathcal{D}||\mathcal{A}|}, \; \; u_\xi =\begin{pmatrix} u^1_\xi \\ \vdots \\ u^{|\mathcal{D}|}_\xi \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||\mathcal{A}|},\\ b_\xi= & {} \begin{pmatrix} d_\xi ^1\\ \vdots \\ d_\xi ^{|\mathcal{D}|}. \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||{\mathcal {N}}|}, c_\xi =\begin{pmatrix} c^1_\xi \\ \vdots \\ c^{|\mathcal{A}|}_\xi \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||\mathcal{A}|}. \end{aligned}$$

Then, the flow conservation constraints for each realization \(\xi \) can be written as

$$\begin{aligned} C(\xi ):=\{Au_\xi =b_\xi , \; 0\leqslant u_\xi , \; Pu_\xi \leqslant c_\xi \}, \end{aligned}$$

where \(P=(I, \cdots , I)\in \mathbb {R}^{|\mathcal{A}|\times |\mathcal{D}||\mathcal{A}|}\) and I is the \(|\mathcal{A}|\times |\mathcal{A}|\) identity matrix. We then consider the cost of the traffic equilibrium problem. The arc travel time function \(h(\xi , \cdot ):{\mathcal {R}}^{|\mathcal{D}||\mathcal{A}|}\rightarrow {\mathcal {R}}^{|\mathcal{A}|}\) is a stochastic vector and each of its entries \(h_a(u_\xi , \xi )\) is assumed to follow a generalized Bureau of Public Roads (GBPR) function,

$$\begin{aligned} h_a(\xi , u_\xi ) = \left( \eta _a + \tau _a\left( \frac{(Pu_\xi )_a}{(\gamma _\xi )_a}\right) ^{n_a} \right) , \;\; a=1, \ldots , |\mathcal{A}|, \end{aligned}$$

where \(\eta _a, \tau _a, (\gamma _\xi )^a\) are given positive parameters. Let \(F(u_\xi , \xi )= P^Th(\xi , u_\xi )\). Then for \(n_a=1\)

$$\begin{aligned} \nabla _u F(u_\xi , \xi )= P^\mathrm{T}\mathrm{diag}\left( \frac{\tau _a}{(\gamma _\xi )_a}\right) P \end{aligned}$$

is symmetric positive semi-definite for any \(u_\xi \in \mathbb {R}^{|\mathcal{D}||\mathcal{A}|}_+ \supseteq C(\xi )\). Then, the stochastic VI formulation for Wardrop’s user equilibrium seeks an equilibrium arc flow \(u_\xi \in C(\xi )\) for a known event \(\xi \in \varXi \), such that

$$\begin{aligned} -F(\xi , u_\xi ) \in _{a.s.} {\mathcal {N}}_{C(\xi )}(u_\xi ). \end{aligned}$$
(3.1)

Moreover, let

$$\begin{aligned} x=\begin{pmatrix} x^1\\ \vdots \\ x^{|\mathcal{D}|} \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||\mathcal{A}|}, \;\; {\bar{b}}=\begin{pmatrix} {{\bar{d}}}^1\\ \vdots \\ {{\bar{d}}}^{|\mathcal{D}|}. \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||{\mathcal {N}}|}, {{\bar{c}}}_\xi =\begin{pmatrix} {{\bar{c}}}^1\\ \vdots \\ {{\bar{c}}}^{|\mathcal{A}|} \end{pmatrix}\in \mathbb {R}^{|\mathcal{D}||\mathcal{A}|}, \end{aligned}$$

and

$$\begin{aligned} D:=\{Ax={{\bar{b}}}, \; 0\leqslant x, \; Px\leqslant {{\bar{c}}}\}, \;\; G(x) = P^\mathrm{T}{\bar{h}}(x), \end{aligned}$$

where \({{\bar{d}}}^i = {E}[d_\xi ^i]\), \(i=1, \ldots , |\mathcal{D}|\), \({{\bar{c}}}^a = {E}[c^a_\xi ]\), \(a=1, \ldots , |\mathcal{A}|\) and \({{\bar{h}}}\) is defined by

$$\begin{aligned} {\bar{h}}_a(x) = \eta _a + \tau (Px)_a^{n_a}{E}[(\gamma _\xi )_a^{-n_a}], \;\; a=1, \ldots , |\mathcal{A}|. \end{aligned}$$

Then, the deterministic VI formulation for Wardrop’s user equilibrium seeks a forecast arc flows \(x\in D\) satisfying

$$\begin{aligned} -G(x)\in {\mathcal {N}}_D(x). \end{aligned}$$
(3.2)

Similar as (2.22)–(2.23), we can consider (3.1)–(3.2) as a two-stage SVI and reformulate it as a stochastic MPEC:

$$\begin{aligned} \begin{array}{ll} \min &{} {E}[\Vert u_\xi - x \Vert ^2]\\ \hbox {s.t.} &{} 0\in G(x)+ {\mathcal {N}}_D(x), \;\; 0\in _{a.e.} F(u_\xi , \xi )+ {\mathcal {N}}_{C(\xi )}(u_\xi ), \; \xi \in \varXi , \end{array} \end{aligned}$$
(3.3)

and then, by ERM approach, reformulate it as

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{1}{\lambda \rho }\theta (x) + \frac{1}{\rho }{E}[r( u_\xi , \xi )] + {E}[\Vert u_\xi - x \Vert ^2]\\ \hbox {s.t.} &{} x\in D, \mu _\xi \in _{a.s.} C(\xi ), \xi \in \varXi , \end{array} \end{aligned}$$
(3.4)

where \(\theta (\cdot )\) and \(r( \cdot , \xi )\) are residual functions of \(0\in G(x)+ {\mathcal {N}}_D(x)\) and \(0\in _{a.e.} F(u_\xi , \xi )+ {\mathcal {N}}_{C(\xi )}(u_\xi )\), respectively.

Noncooperative multi-agent games In [48], Pang et al. formally introduced and studied a noncooperative multi-agent game under uncertainty and focused mainly on a two-stage setting of the game where each agent is risk-averse as follows.

Consider a noncooperative game with N risk-averse players, each of whom, labeled \(i = 1,\cdots ,n,\) has a private strategy set \(X_i\subset \mathbb {R}^{s_i}\) that is closed and convex, a (deterministic) first-stage objective function \(\theta _i\) that depends on all players’ strategies \(x:=\{x_i\}_{i=1}^n\) and a second-stage risk-averse recourse function \(\phi _i(x) = {\mathcal {R}}_i(\psi _i(x_i, x_{-i}, \omega ))\) with a kind of risk measure \({\mathcal {R}}_i\) and a realization of a random vector \(\xi \) defined as above. Then, the two-stage noncooperative games with risk-averse players is

$$\begin{aligned} \min _{x_i\in X_i}&\theta _i(x_i, x_{-i}) + {\mathcal {R}}_i(\psi _i(x_i, x_{-i}, \xi ) \end{aligned}$$
(3.5)

and

$$\begin{aligned} \psi _i(x, \omega ):=\min _{y_i\in Y_i} \;\; f_i(y_i, x, \xi ), \end{aligned}$$
(3.6)

for \(i=1, \cdots , n\).

When the risk measure \({\mathcal {R}}_i={E}\), the players are risk-neutral. Moreover, the risk measure \({\mathcal {R}}_i\), \(i=1, \cdots , n\) can be considered as several deviation measures include the standard deviation, lower and upper semideviations, mean absolute (semi)deviations, absolute (semi)deviations, the median and deviations derived from the CVaR. They then investigated several properties of mean-deviation composite games with quadratic recourse, such as the continuity, regularization and differentiability of the second-stage optimal value function, and the reformulation of mean-deviation composite game.

Pang et al. [48] bypassed the SVI framework and dealt with the risk-averse SNEP based largely on smoothing, regularization, sampled solution approach and the best-response scheme. But now, by [23, 24] and the sample average approximation method [31], we can solve the two-stage SVI by the progressive hedging method.

5 Final Remarks

Although the two-stage stochastic optimization has been investigated deeply and two-stage SNEP has been applied widely, the research of two-stage SVI and multistage SVI has just begun. There are several important questions have to be investigated, e.g.,

  • Can we solve non-monotone two-stage SVI using PHM?

  • How could we achieve better convergence rate by using sampling technology?

  • How to extend the discrete approximation methods to multistage SVI when The random vector follows a continuous distribution?

  • How to extend the SIV to dynamic two-stage SVI?

  • How could we solve the two-stage or even multistage SVI more effectively?

  • In the case when we only have limited information about the distribution of random vectors, could we model the two-stage variational inequalities in the sense of distributional robustness or even robustness?

In summary, there are many challenges and opportunities for developing both theoretical analysis and numerical algorithms when we handle the two-stage (multistage) VI in an uncertain environment.