Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Juditsky, Anatoli; Nemirovski, Arkadi

doi:10.1007/s10107-015-0876-3

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Full Length Paper
Series A
Published: 22 March 2015

Volume 156, pages 221–256, (2016)
Cite this article

Mathematical Programming Submit manuscript

Anatoli Juditsky¹ &
Arkadi Nemirovski²

707 Accesses
9 Citations
Explore all metrics

Abstract

The standard algorithms for solving large-scale convex–concave saddle point problems, or, more generally, variational inequalities with monotone operators, are proximal type algorithms which at every iteration need to compute a prox-mapping, that is, to minimize over problem’s domain X the sum of a linear form and the specific convex distance-generating function underlying the algorithms in question. (Relative) computational simplicity of prox-mappings, which is the standard requirement when implementing proximal algorithms, clearly implies the possibility to equip X with a relatively computationally cheap Linear Minimization Oracle (LMO) able to minimize over X linear forms. There are, however, important situations where a cheap LMO indeed is available, but where no proximal setup with easy-to-compute prox-mappings is known. This fact motivates our goal in this paper, which is to develop techniques for solving variational inequalities with monotone operators on domains given by LMO. The techniques we discuss can be viewed as a substantial extension of the proposed in Cox et al. (Math Program Ser B 148(1–2):143–180, 2014) method of nonsmooth convex minimization over an LMO-represented domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Adaptive Proximal Method for a Class of Variational Inequalities and Related Problems

Article 25 August 2020

Generalized Mirror Prox Algorithm for Monotone Variational Inequalities: Universality and Inexact Oracle

Article 05 July 2022

Computing Proximal Points of Convex Functions with Inexact Subgradients

Article 22 November 2016

Notes

In literature on v.i.’s, a variational inequality where a weak solution is sought is often called Minty’s v.i., and one where a strong solution is sought—Stampacchia’s v.i. Equivalence of the notions of weak and strong solutions in the case of continuous monotone operator is the finite dimensional version of the classical Minty’s Lemma (1967), see [6].
From now on, for a linear mapping $x\mapsto Bx: E\rightarrow F$, where $E,F$ are Euclidean spaces, $B^*$ denotes the conjugate of $B$, that is, a linear mapping $y\mapsto B^*y:F\rightarrow E$ uniquely defined by the identity $\langle Bx,y\rangle = \langle x,B^*y\rangle $ for all $x\in E,$ $y\in F$.
MD algorithm originates from [23, 24]; its modern proximal form was developed in [1]. MP was proposed in [25]. For the most present exposition of the algorithms, see [18, Chapters 5,6] and [27].
We could also take as $Y$ a convex compact subset of $F$ containing $Bx$ and define $y(x)$ as $Bx$ for $x\in X$ and as (any) strong solution to the ${\mathrm{VI}}(G(\cdot )-A^*x,Y)$ when $x\not \in X$.
As we have already mentioned, with our proximal setup, the $\omega $-size of $Y$ is $\le \sqrt{2}$, and (20) is satisfied with $M=2\sqrt{2}$.
For the primal v.i., (26) holds true for some $L>0$ and $M=0$. Moreover, with properly selected proximal setup for (45) the complexity bound (28) becomes ${\mathrm{Res}}(\mathcal{C}^N|Y)\le O(1)\sqrt{\ln (n)\ln (m)}/N$.

References

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Article MathSciNet MATH Google Scholar
Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)
Article Google Scholar
Candes, E.J., Plan, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Inf. Theory IEEE Trans. 57(4), 2342–2359 (2011)
Article MathSciNet Google Scholar
Castellani, M., Mastroeni, G.: On the duality theory for finite dimensional variational inequalities. In: Giannessi, F., Maugeri, A. (eds.) Variational Inequalities Network Equilibrium Problems, pp. 21–31. Plenum Publishing, New York (1995)
Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM. J. Optim. 3(3), 538–543 (1993)
Article MathSciNet MATH Google Scholar
Chipot, M.: Variational Inequalities and Flow in Porous Media. Springer, New York (1984)
Book MATH Google Scholar
Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. Ser. B 148(1–2), 143–180 (2014)
Article MathSciNet MATH Google Scholar
Daniele, P.: Dual variational inequality and appluications to asymmetric traffic equalibrium probnlems with capacity constraints. Le Mathematiche XLIX(2), 111–211 (1994)
MathSciNet Google Scholar
Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)
Google Scholar
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
Article MathSciNet MATH Google Scholar
Dudik, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS-Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol. 22, pp. 327–336 (2012)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Freund, R., Grigasy. P.: New analysis and results for the Conditional Gradient method (2013) submitted to Mathematical Programming, E-print: http://web.mit.edu/rfreund/www/FW-paper-final
Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3386–3393 (2012)
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional Gradient algorithms for norm-regularized smooth convex optimization. Mathematical Programming. Online First (2014). doi:10.1007/s10107-014-0778-9. E-print: arXiv:1302.2325
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 427–435 (2013)
Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 471–478 (2010)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, I: general purpose methods; II: utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, pp. 121–184. The MIT Press, Cambridge (2012)
Google Scholar
Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: Randomized first order algorithms with applications to $\ell _1$-minimization. Math. Program. Ser. A 142(1–2), 269–310 (2013)
Article MATH Google Scholar
Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: On unified view of nullspace-type conditions for recoveries associated with general sparsity structures. Linear Algebra Appl. 441(1), 124–151 (2014)
Article MathSciNet MATH Google Scholar
Lemarechal, C., Nemirovski, A., Nesterov, Yu.: New variants of bundle methods. Math. Program. 69(1), 111–148 (1995)
Article MATH Google Scholar
Mosco, U.: Dual variational inequalities. J. Math. Anal. Appl. 40, 202–206 (1972)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Nauka Publishers, Moscow (1978) (in Russian); John Wiley, New York (1983) (in English)
Nemirovskii, A.: Efficient iterative algorithms for variational inequalities with monotone operators. Ekonomika i Matematicheskie Metody 17(2), 344–359 (1981). (in Russian; Engllish translation: Matekon)
MathSciNet Google Scholar
Nemirovski, A.: Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Onn, S., Rothblum, U.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35, 52–78 (2010)
Article MathSciNet MATH Google Scholar
Nesterov, Yu., Nemirovski, A.: On first order algorithms for $\ell _1$/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)
Article MathSciNet MATH Google Scholar
Pshenichnyi, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)
Google Scholar
Rockafellar, R.T.: Minimax theorems and conjugate saddle functions. Math. Scand. 14, 151–173 (1964)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint (2011). E-print: arXiv:1106.1622

Download references

Author information

Authors and Affiliations

LJK, Université Grenoble Alpes, B.P. 53, 38041, Grenoble Cedex 9, France
Anatoli Juditsky
Georgia Institute of Technology, Atlanta, GA, 30332, USA
Arkadi Nemirovski

Authors

Anatoli Juditsky
View author publications
You can also search for this author in PubMed Google Scholar
Arkadi Nemirovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anatoli Juditsky.

Additional information

Research of the first author was supported by the CNRS-Mastodons project GARGANTUA, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). Research of the second author was supported by the NSF grants CMMI 1232623 and CCF 1415498.

Proofs

1.1 Proof of Theorems 1 and 2

We start with proving Theorem 2. In the notation of the theorem, we have

$$\begin{aligned} \begin{array}{lcl} &{}&{}\forall x\in X: \Phi (x)=Ay(x)+a,\\ (a):&{}&{}y(x)\in Y,\\ (b): &{}&{}\langle y(x)-y,A^*x-G(y(x))\rangle \ge 0\,\forall y\in Y. \end{array} \end{aligned}$$

(52)

For $\bar{x}\in X$, let $\bar{y}=y(\bar{x})$, and let $\widehat{y}=\sum \nolimits _t\lambda _ty_t$, so that $\bar{y},\widehat{y}\in Y$ by (52.a). Since $G$ is monotone, for all $t\in \{1,\ldots ,N\}$ we have

$$\begin{aligned} \begin{array}{lll} &{}&{}\langle \bar{y}-y_t,G(\bar{y})-G(y_t)\rangle \ge 0\\ \Rightarrow &{}&{}\langle \bar{y},G(\bar{y})\rangle \ge \langle y_t,G(\bar{y})\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \,\,\forall t\\ \Rightarrow &{} &{}\langle \bar{y},G(\bar{y})\rangle \ge \sum \limits _t\lambda _t\left[ \langle y_t,G(\bar{y})\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] \\ &{}&{}[\hbox {since}\,\, \lambda _t\ge 0 \,\,\hbox {and} \,\, \sum \limits _t\lambda _t=1], \end{array} \end{aligned}$$

and we conclude that

$$\begin{aligned} \langle \bar{y},G(\bar{y})\rangle -\langle \widehat{y},G(\bar{y})\rangle \ge \sum _{t=1}^N\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] . \end{aligned}$$

(53)

We now have

$$\begin{aligned}&\langle \Phi (\bar{x}),\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad =\langle A\bar{y}+a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle =\langle \bar{y},A^*\bar{x}-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad =\langle \bar{y},A^*\bar{x}-G(\bar{y})\rangle +\langle \bar{y},G(\bar{y})-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad \ge \langle \widehat{y},A^*\bar{x}-G(\bar{y})\rangle +\langle \bar{y},G(\bar{y})-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\qquad [\hbox {by (52.b) with}\,\, y=\widehat{y}\,\, \hbox {and due to} \,\,\bar{y}=y(\bar{x})]\\&\quad =\langle \widehat{y},A^*\bar{x}\rangle +\left[ \langle G(\bar{y}),\bar{y}\rangle -\langle G(\bar{y}),\widehat{y}\rangle \right] -\langle \bar{y},\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad \ge \langle \widehat{y},A^*\bar{x}\rangle +\sum \limits _t\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] -\langle \bar{y},\sum \limits _t\lambda _tA^*x_t\rangle \\&\qquad +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \,\,[\hbox {by (53)}]\\&\quad =\sum \limits _t\lambda _t\langle y_t,A^*\bar{x}\rangle +\sum \limits _t\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle -\langle \bar{y},A^*x_t\rangle +\langle a,\bar{x}-x_t\rangle \right] \\&\qquad [\hbox {since}\,\, \widehat{y}=\sum \limits _t\lambda _ty_t\,\, \hbox {and}\,\, \sum \limits _t\lambda _t=1]\\&\quad =\sum \limits _t\lambda _t\left[ \langle Ay_t,\bar{x}-x_t\rangle +\langle Ay_t,x_t\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right. \\&\qquad \left. -\langle \bar{y},A^*x_t\rangle +\langle a,\bar{x}-x_t\rangle \right] \\&\quad =\sum \limits _t\lambda _t\left[ \langle y_t,A^*x_t\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle -\langle \bar{y},A^*x_t\rangle \right] \\&\qquad + \sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \\&\quad ={\sum }_t\lambda _t\langle A^*x_t-G(y_t),y_t-\bar{y}\rangle +\sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \ge -\epsilon \\&\qquad +\sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \\&\qquad [\hbox {by} (15) \,\,\hbox {due to} \bar{y}=y(\bar{x})\in Y(X)]. \end{aligned}$$

The bottom line is that

$$\begin{aligned} \langle \Phi (\bar{x}),\widehat{x}-\bar{x}\rangle \le \epsilon +\sum _{t=1}^N\lambda _t\langle Ay_t+a,x_t-\bar{x}\rangle \,\forall \bar{x}\in X, \end{aligned}$$

as stated in (16). Theorem 2 is proved.

To prove Theorem 1, let $y_t\in Y$, $1\le t\le N$, and $\lambda _1,\ldots ,\lambda _N$ be from the premise of the theorem, and let $x_t$, $1\le t\le N$, be specified as $x_t=x(y_t)$, so that $x_t$ is the minimizer of the linear form $\langle Ay_t+a,x\rangle $ over $x\in X$. Due to the latter choice, we have $\sum _{t=1}^N\lambda _t\langle Ay_t+a,x_t-\bar{x}\rangle \le 0$ for all $\bar{x}\in X$, while $\epsilon $ as defined by (15) is nothing but ${\mathrm{Res}}(\{y_t,\lambda _t,-\Psi (x_t)\}_{t=1}^N|Y(X))$. Thus, (16) in the case in question implies that

$$\begin{aligned} \forall \bar{x}\in X: \langle \Phi (\bar{x}),\sum \limits _{t=1}^N\lambda _t x_t - \bar{x}\rangle \le {\mathrm{Res}}(\{y_t,\lambda _t,-\Psi (x_t)\}_{t=1}^N|Y(X)), \end{aligned}$$

and (13) follows. Relation (14) is an immediate corollary of (13) and Lemma 2 as applied to $X$ in the role of $Y$, $\Phi $ in the role of $H(\cdot )$, and $\{x_t,\lambda _t,\Phi (x_t)\}_{t=1}^N$ in the role of $\mathcal{C}^N$. $\square $

1.2 Proof of Proposition 1

Observe that the optimality conditions in the optimization problem specifying $v={{\mathrm{Prox}}}_y(\xi )$ imply that

$$\begin{aligned} \langle \xi -\omega '(y)+\omega '(v),z-v\rangle \ge 0,\,\,\forall z\in Y, \end{aligned}$$

or

$$\begin{aligned} \langle \xi ,v-z\rangle \le \langle \omega '(v)-\omega '(y),z-v\rangle =\langle V'_y(v),z-v\rangle ,\,\,\forall z\in Y, \end{aligned}$$

which, using a remarkable identity [5]

$$\begin{aligned} \langle V'_y(v),z-v\rangle =V_y(z)-V_v(z)-V_y(v), \end{aligned}$$

can be rewritten equivalently as

$$\begin{aligned} v={{\mathrm{Prox}}}_y(\xi )\Rightarrow \langle \xi ,v-z\rangle \le V_y(z)-V_v(z)-V_y(v)\,\,\forall z\in Y. \end{aligned}$$

(54)

Setting $y=y_t$, $\xi =\gamma _t H_t(y_t)$, which results in $v=y_{t+1}$, we get

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(y_t),y_{t+1}-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)-V_{y_t}(y_{t+1}), \end{aligned}$$

whence,

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(y_t),y_{t}-z\rangle\le & {} V_{y_t}(z)-V_{y_{t+1}}(z)+\underbrace{\left[ \gamma _t\langle H_t(y_t),y_t-y_{t+1}\rangle -V_{y_t}(y_{t+1})\right] }_{\le \gamma _t\Vert H_t(y_t)\Vert _*\Vert y_t-y_{t+1}\Vert -\frac{1}{2}\Vert y_t-y_{t+1}\Vert ^2} \\\le & {} V_{y_t}(z)-V_{y_{t+1}}(z)+\frac{1}{2}\gamma _t^2\Vert H_t(y_t)\Vert _*^2. \end{aligned}$$

Summing up these inequalities over $t=1,\ldots ,N$ and taking into account that for $z\in Y'$, we have $V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']$ and that $V_{y_{N+1}}(z)\ge 0$, we get (19). $\square $

1.3 Proof of Proposition 2

Applying (54) to $y=y_t$, $\xi =\gamma _t H_t(z_t)$, which results in $v=y_{t+1}$, we get

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(z_t),y_{t+1}-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)-V_{y_t}(y_{t+1}), \end{aligned}$$

whence, by the definition (23) of $d_t$,

$$\begin{aligned} \begin{array}{l} \forall z\in Y: \gamma _t\langle H_t(z_t),z_t-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)+d_t.\\ \end{array} \end{aligned}$$

(55)

Summing up the resulting inequalities over $t=1,\ldots ,N$ and taking into account that $V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']$ for all $z\in Y'$ and $V_{y_{N+1}}(z)\ge 0$, we get

$$\begin{aligned} \forall z\in Y': \sum _{t=1}^n\lambda ^N_t\langle H_t(z_t),z_t-z\rangle \le {\frac{1}{2}\Omega ^2[Y']+\sum _{t=1}^N d_t\over \sum _{t=1}^N\gamma _t}. \end{aligned}$$

The right hand side in the latter inequality is independent of $z\in Y'$. Taking supremum of the left hand side over $z\in Y'$, we arrive at (24).

Moreover, invoking (54) with $y=y_t$, $\xi =\gamma _t H_t(y_t)$ and specifying $z$ as $y_{t+1}$, we get

$$\begin{aligned} \gamma _t\langle H_t(y_t),z_t-y_{t+1}\rangle \le V_{y_t}(y_{t+1})-V_{z_t}(y_{t+1})-V_{y_t}(z_t), \end{aligned}$$

whence

$$\begin{aligned} \begin{aligned} d_t&=\gamma _t\langle H_t(z_t),z_t-y_{t+1}\rangle - V_{y_t}(y_{t+1})\le \gamma _t\langle H_t(y_t),z_t-y_{t+1}\rangle \\&\quad +\gamma _t\langle H_t(z_t)-H_t(y_t),z_t-y_{t+1}\rangle - V_{y_t}(y_{t+1})\\&\le -V_{z_t}(y_{t+1})-V_{y_t}(z_t)+\gamma _t\langle H_t(z_t)-H_t(y_t),z_t-y_{t+1}\rangle \\&\le \gamma _t\Vert H_t(z_t)-H_t(y_t)\Vert _*\Vert z_t-y_{t+1}\Vert -{\frac{1}{2}}\Vert z_t-y_{t+1}\Vert ^2-{\frac{1}{2}}\Vert y_t-z_t\Vert ^2\\&\le {\frac{1}{2}}\left[ \gamma _t^2\Vert H_t(z_t)-H_t(y_t)\Vert _*^2-\Vert y_t-z_t\Vert ^2\right] ,\\ \end{aligned} \end{aligned}$$

(56)

as required in (25). $\square $

1.4 Proof of Lemma 3

1 $^0$. We start with the following standard fact:

Lemma 4

Let $Y$ be a nonempty closed convex set in Euclidean space $F$, $\Vert \cdot \Vert $ be a norm on $F$, and $\omega (\cdot )$ be a continuously differentiable function on $Y$ which is strongly convex, modulus 1, w.r.t. $\Vert \cdot \Vert $. Given $b\in F$ and $y\in Y$, let us set

$$\begin{aligned} \begin{array}{rcl} g_y(\xi )&{}=&{}\max \limits _{z\in Y}\left[ \langle z,\omega '(y)-\xi \rangle -\omega (z)\right] :F\rightarrow {\mathbf {R}}, \\ {{\mathrm{Prox}}}_y(\xi )&{}=&{}\mathop {\hbox {argmax }}\limits _{z\in Y}\left[ \langle z,\omega '(y)-\xi \rangle -\omega (z)\right] . \\ \end{array} \end{aligned}$$

The function $g_y$ is convex with Lipschitz continuous gradient $\nabla g_y(\xi )=-{{\mathrm{Prox}}}_y(\xi )$:

$$\begin{aligned} \Vert \nabla g_y(\xi )-\nabla g_y(\xi ')\Vert \le \Vert \xi -\xi '\Vert _*\,\,\forall \xi ,\xi ', \end{aligned}$$

(57)

where $\Vert \cdot \Vert _*$ is the norm conjugate to $\Vert \cdot \Vert $.

Indeed, since $\omega $ is strongly convex and continuously differentiable on $Y$, ${{\mathrm{Prox}}}_y(\cdot )$ is well defined, and from optimality conditions it holds

$$\begin{aligned} \langle \omega '({{\mathrm{Prox}}}_y(\xi ))+\xi -\omega '(y),{{\mathrm{Prox}}}_y(\xi )-z\rangle \le 0\,\,\forall z\in Y. \end{aligned}$$

(58)

Consequently, $g_y(\cdot )$ is well defined; this function clearly is convex, and the vector $-{{\mathrm{Prox}}}_y(\xi )$ clearly is a subgradient of $g_y$ at $\xi $. If now $\xi ',\xi ''\in F$, then, setting $z'={{\mathrm{Prox}}}_y(\xi ')$, $z''={{\mathrm{Prox}}}_y(\xi '')$ and invoking (58), we get

$$\begin{aligned} \langle \omega '(z')+\xi '-\omega '(y),z'-z''\rangle \le 0,\,\,\langle -\omega '(z'')-\xi ''+\omega '(y),z'-z''\rangle \le 0 \end{aligned}$$

whence, summing the inequalities up,

$$\begin{aligned} \langle \xi '-\xi '',z'-z''\rangle \le \langle \omega '(z')-\omega '(z''),z''-z'\rangle \le -\Vert z'-z''\Vert ^2, \end{aligned}$$

implying that $\Vert z'-z''\Vert \le \Vert \xi '-\xi ''\Vert _*$. Thus, a subgradient field $-{{\mathrm{Prox}}}_y(\cdot )$ of $g_y(\cdot )$ is Lipschitz continuous with constant 1 from $\Vert \cdot \Vert _*$ into $\Vert \cdot \Vert $, whence $g_y$ is continuously differentiable and (57) takes place. $\square $

2 $^0$. To derive Lemma 3 from Lemma 4, note that $f_y(x)$ is obtained from $g_y(\cdot )$ by affine substitution of variables and adding linear form:

$$\begin{aligned} f_y(x)=g_y(\gamma [{G(y)}- A^*x])+\gamma \langle a,x\rangle {+\omega (y)-\langle \omega '(y),y\rangle .} \end{aligned}$$

whence $\nabla f_y(x)=-\gamma A\nabla g_y(\gamma [{G(y)}-A^*x])+\gamma a=\gamma A {{\mathrm{Prox}}}_y(\gamma [{G(y)}-A^*x])+\gamma a$, as required in (39), and

$$\begin{aligned}&\Vert \nabla f_y(x')-\nabla f_y(x'')\Vert _{E,*}\\&\quad =\gamma \Vert A\left[ \nabla g_y(\gamma [{G(y)}-A^*x'])-\nabla g_y(\gamma [{G(y)}-A^*x''])\right] \Vert _{E,*}\\&\quad \le (\gamma L_A)\Vert \nabla g_y(\gamma [{G(y)}-A^*x'])-\nabla g_y(\gamma [{G(y)}-A^*x''])\Vert \\&\quad \le (\gamma L_A)\Vert \gamma [{G(y)}-A^*x']-\gamma [{G(y)}-A^*x'']\Vert _*\\&\quad \le (\gamma L_A)^2\Vert x'-x''\Vert _E=(L_A/L_G)^2\Vert x'-x''\Vert _E \end{aligned}$$

[we have used (57) and equivalences in (38)], as required in (40). $\square $

1.5 Review of Conditional Gradient algorithm

The required description of CGA and its complexity analysis are as follows.

As applied to minimizing a smooth – with Lipschitz continuous gradient

$$\begin{aligned} \Vert \nabla f(u)-\nabla f(u')\Vert _{E,*}\le \mathcal{L}\Vert u-u'\Vert _E,\,\,\forall u,u'\in X, \end{aligned}$$

convex function $f$ over a convex compact set $X\subset E$, the generic CGA is the recurrence of the form

$$\begin{aligned} \begin{array}{rcl} u_1&{}\in &{} X\\ u_{s+1}&{}\in &{} X \,\,\hbox { satisfies }\,\, f(u_{s+1})\le f(u_s+\gamma _s [u_s^+-u_s]),\,s=1,2,\ldots \\ \gamma _s&{}=&{}{2\over s+1},\,u_s^+\in \mathop {\hbox {Argmin }}\nolimits _{u\in X}\langle f'(u_s),u\rangle .\\ \end{array} \end{aligned}$$

The standard results on this recurrence (see, e.g., proof of Theorem 1 in [15]) state that if $f_*=\min _X f$, then

$$\begin{aligned} \begin{array}{ll} (a)&{}\epsilon _{t+1}:=f(u_{t+1})-f_*\le \epsilon _{t}-\gamma _t\delta _t+{2\mathcal{L}R^2\gamma _t^2},\,t=1,2,...\\ &{}\delta _t:=\max \nolimits _{u\in X} \langle \nabla f(u_t),u_t-u\rangle ;\\ (b)&{}\epsilon _t\le {2\mathcal{L}R^2\over t+1},t=2,3,\ldots \\ \end{array} \end{aligned}$$

(59)

where $R$ is the smallest of the radii of $\Vert \cdot \Vert _E$-balls containing $X$. From (59.$a$) it follows that

$$\begin{aligned} \gamma _\tau \delta _\tau \le \epsilon _\tau -\epsilon _{\tau +1}+2\mathcal{L}R^2\gamma _\tau ^2,\,\tau =1,2,\ldots ; \end{aligned}$$

summing up these inequalities over $\tau =t,t+1,\ldots ,2t$, where $t>1$, we get

$$\begin{aligned} \left[ \min _{\tau \le 2t}\delta _\tau \right] \sum _{\tau =t}^{2t} \gamma _\tau \le \epsilon _t+2\mathcal{L}R^2\sum _{\tau =t}^{2t}\gamma _\tau ^2, \end{aligned}$$

which combines with (59.$b$) to imply that

$$\begin{aligned} \min _{\tau \le 2t}\delta _\tau \le O(1)\mathcal{L}R^2{{1\over t}+\sum _{\tau =t}^{2t}{1\over \tau ^2}\over \sum _{\tau =t}^{2t}{1\over \tau }}\le O(1){\mathcal{L}R^2\over t}. \end{aligned}$$

It follows that given $\epsilon <\mathcal{L}R^2$, it takes at most $O(1){\mathcal{L}R^2\over \epsilon }$ steps of CGA to generate a point $u^\epsilon \in X$ with $\max _{u\in X} \langle \nabla f(u^\epsilon ),u^\epsilon -u\rangle \le \epsilon $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Juditsky, A., Nemirovski, A. Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles. Math. Program. 156, 221–256 (2016). https://doi.org/10.1007/s10107-015-0876-3

Download citation

Received: 17 December 2013
Accepted: 14 February 2015
Published: 22 March 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10107-015-0876-3

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Abstract

Access this article

Similar content being viewed by others

On the Adaptive Proximal Method for a Class of Variational Inequalities and Related Problems

Generalized Mirror Prox Algorithm for Monotone Variational Inequalities: Universality and Inexact Oracle

Computing Proximal Points of Convex Functions with Inexact Subgradients

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Proofs

1.1 Proof of Theorems 1 and 2

1.2 Proof of Proposition 1

1.3 Proof of Proposition 2

1.4 Proof of Lemma 3

Lemma 4

1.5 Review of Conditional Gradient algorithm

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Abstract

Access this article

Similar content being viewed by others

On the Adaptive Proximal Method for a Class of Variational Inequalities and Related Problems

Generalized Mirror Prox Algorithm for Monotone Variational Inequalities: Universality and Inexact Oracle

Computing Proximal Points of Convex Functions with Inexact Subgradients

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Proofs

Proofs

1.1 Proof of Theorems 1 and 2

1.2 Proof of Proposition 1

1.3 Proof of Proposition 2

1.4 Proof of Lemma 3

Lemma 4

1.5 Review of Conditional Gradient algorithm

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation