Skip to main content

Advertisement

Log in

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The standard algorithms for solving large-scale convex–concave saddle point problems, or, more generally, variational inequalities with monotone operators, are proximal type algorithms which at every iteration need to compute a prox-mapping, that is, to minimize over problem’s domain X the sum of a linear form and the specific convex distance-generating function underlying the algorithms in question. (Relative) computational simplicity of prox-mappings, which is the standard requirement when implementing proximal algorithms, clearly implies the possibility to equip X with a relatively computationally cheap Linear Minimization Oracle (LMO) able to minimize over X linear forms. There are, however, important situations where a cheap LMO indeed is available, but where no proximal setup with easy-to-compute prox-mappings is known. This fact motivates our goal in this paper, which is to develop techniques for solving variational inequalities with monotone operators on domains given by LMO. The techniques we discuss can be viewed as a substantial extension of the proposed in Cox et al. (Math Program Ser B 148(1–2):143–180, 2014) method of nonsmooth convex minimization over an LMO-represented domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In literature on v.i.’s, a variational inequality where a weak solution is sought is often called Minty’s v.i., and one where a strong solution is sought—Stampacchia’s v.i. Equivalence of the notions of weak and strong solutions in the case of continuous monotone operator is the finite dimensional version of the classical Minty’s Lemma (1967), see [6].

  2. From now on, for a linear mapping \(x\mapsto Bx: E\rightarrow F\), where \(E,F\) are Euclidean spaces, \(B^*\) denotes the conjugate of \(B\), that is, a linear mapping \(y\mapsto B^*y:F\rightarrow E\) uniquely defined by the identity \(\langle Bx,y\rangle = \langle x,B^*y\rangle \) for all \(x\in E,\) \(y\in F\).

  3. MD algorithm originates from [23, 24]; its modern proximal form was developed in [1]. MP was proposed in [25]. For the most present exposition of the algorithms, see [18, Chapters 5,6] and [27].

  4. We could also take as \(Y\) a convex compact subset of \(F\) containing \(Bx\) and define \(y(x)\) as \(Bx\) for \(x\in X\) and as (any) strong solution to the \({\mathrm{VI}}(G(\cdot )-A^*x,Y)\) when \(x\not \in X\).

  5. As we have already mentioned, with our proximal setup, the \(\omega \)-size of \(Y\) is \(\le \sqrt{2}\), and (20) is satisfied with \(M=2\sqrt{2}\).

  6. For the primal v.i., (26) holds true for some \(L>0\) and \(M=0\). Moreover, with properly selected proximal setup for (45) the complexity bound (28) becomes \({\mathrm{Res}}(\mathcal{C}^N|Y)\le O(1)\sqrt{\ln (n)\ln (m)}/N\).

References

  1. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)

    Article  Google Scholar 

  3. Candes, E.J., Plan, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Inf. Theory IEEE Trans. 57(4), 2342–2359 (2011)

    Article  MathSciNet  Google Scholar 

  4. Castellani, M., Mastroeni, G.: On the duality theory for finite dimensional variational inequalities. In: Giannessi, F., Maugeri, A. (eds.) Variational Inequalities Network Equilibrium Problems, pp. 21–31. Plenum Publishing, New York (1995)

    Google Scholar 

  5. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM. J. Optim. 3(3), 538–543 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chipot, M.: Variational Inequalities and Flow in Porous Media. Springer, New York (1984)

    Book  MATH  Google Scholar 

  7. Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. Ser. B 148(1–2), 143–180 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Daniele, P.: Dual variational inequality and appluications to asymmetric traffic equalibrium probnlems with capacity constraints. Le Mathematiche XLIX(2), 111–211 (1994)

    MathSciNet  Google Scholar 

  9. Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)

    Google Scholar 

  10. Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dudik, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS-Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol. 22, pp. 327–336 (2012)

  12. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  13. Freund, R., Grigasy. P.: New analysis and results for the Conditional Gradient method (2013) submitted to Mathematical Programming, E-print: http://web.mit.edu/rfreund/www/FW-paper-final

  14. Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3386–3393 (2012)

  15. Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional Gradient algorithms for norm-regularized smooth convex optimization. Mathematical Programming. Online First (2014). doi:10.1007/s10107-014-0778-9. E-print: arXiv:1302.2325

  16. Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 427–435 (2013)

  17. Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 471–478 (2010)

  18. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, I: general purpose methods; II: utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, pp. 121–184. The MIT Press, Cambridge (2012)

    Google Scholar 

  19. Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: Randomized first order algorithms with applications to \(\ell _1\)-minimization. Math. Program. Ser. A 142(1–2), 269–310 (2013)

    Article  MATH  Google Scholar 

  20. Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: On unified view of nullspace-type conditions for recoveries associated with general sparsity structures. Linear Algebra Appl. 441(1), 124–151 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lemarechal, C., Nemirovski, A., Nesterov, Yu.: New variants of bundle methods. Math. Program. 69(1), 111–148 (1995)

    Article  MATH  Google Scholar 

  22. Mosco, U.: Dual variational inequalities. J. Math. Anal. Appl. 40, 202–206 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  23. Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Nauka Publishers, Moscow (1978) (in Russian); John Wiley, New York (1983) (in English)

  24. Nemirovskii, A.: Efficient iterative algorithms for variational inequalities with monotone operators. Ekonomika i Matematicheskie Metody 17(2), 344–359 (1981). (in Russian; Engllish translation: Matekon)

    MathSciNet  Google Scholar 

  25. Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. Nemirovski, A., Onn, S., Rothblum, U.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35, 52–78 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  27. Nesterov, Yu., Nemirovski, A.: On first order algorithms for \(\ell _1\)/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  28. Pshenichnyi, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)

    Google Scholar 

  29. Rockafellar, R.T.: Minimax theorems and conjugate saddle functions. Math. Scand. 14, 151–173 (1964)

    MathSciNet  MATH  Google Scholar 

  30. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  31. Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint (2011). E-print: arXiv:1106.1622

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anatoli Juditsky.

Additional information

Research of the first author was supported by the CNRS-Mastodons project GARGANTUA, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). Research of the second author was supported by the NSF grants CMMI 1232623 and CCF 1415498.

Proofs

Proofs

1.1 Proof of Theorems 1 and 2

We start with proving Theorem 2. In the notation of the theorem, we have

$$\begin{aligned} \begin{array}{lcl} &{}&{}\forall x\in X: \Phi (x)=Ay(x)+a,\\ (a):&{}&{}y(x)\in Y,\\ (b): &{}&{}\langle y(x)-y,A^*x-G(y(x))\rangle \ge 0\,\forall y\in Y. \end{array} \end{aligned}$$
(52)

For \(\bar{x}\in X\), let \(\bar{y}=y(\bar{x})\), and let \(\widehat{y}=\sum \nolimits _t\lambda _ty_t\), so that \(\bar{y},\widehat{y}\in Y\) by (52.a). Since \(G\) is monotone, for all \(t\in \{1,\ldots ,N\}\) we have

$$\begin{aligned} \begin{array}{lll} &{}&{}\langle \bar{y}-y_t,G(\bar{y})-G(y_t)\rangle \ge 0\\ \Rightarrow &{}&{}\langle \bar{y},G(\bar{y})\rangle \ge \langle y_t,G(\bar{y})\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \,\,\forall t\\ \Rightarrow &{} &{}\langle \bar{y},G(\bar{y})\rangle \ge \sum \limits _t\lambda _t\left[ \langle y_t,G(\bar{y})\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] \\ &{}&{}[\hbox {since}\,\, \lambda _t\ge 0 \,\,\hbox {and} \,\, \sum \limits _t\lambda _t=1], \end{array} \end{aligned}$$

and we conclude that

$$\begin{aligned} \langle \bar{y},G(\bar{y})\rangle -\langle \widehat{y},G(\bar{y})\rangle \ge \sum _{t=1}^N\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] . \end{aligned}$$
(53)

We now have

$$\begin{aligned}&\langle \Phi (\bar{x}),\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad =\langle A\bar{y}+a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle =\langle \bar{y},A^*\bar{x}-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad =\langle \bar{y},A^*\bar{x}-G(\bar{y})\rangle +\langle \bar{y},G(\bar{y})-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad \ge \langle \widehat{y},A^*\bar{x}-G(\bar{y})\rangle +\langle \bar{y},G(\bar{y})-\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\qquad [\hbox {by (52.b) with}\,\, y=\widehat{y}\,\, \hbox {and due to} \,\,\bar{y}=y(\bar{x})]\\&\quad =\langle \widehat{y},A^*\bar{x}\rangle +\left[ \langle G(\bar{y}),\bar{y}\rangle -\langle G(\bar{y}),\widehat{y}\rangle \right] -\langle \bar{y},\sum \limits _t\lambda _tA^*x_t\rangle +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \\&\quad \ge \langle \widehat{y},A^*\bar{x}\rangle +\sum \limits _t\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right] -\langle \bar{y},\sum \limits _t\lambda _tA^*x_t\rangle \\&\qquad +\langle a,\bar{x}-\sum \limits _t\lambda _tx_t\rangle \,\,[\hbox {by (53)}]\\&\quad =\sum \limits _t\lambda _t\langle y_t,A^*\bar{x}\rangle +\sum \limits _t\lambda _t\left[ \langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle -\langle \bar{y},A^*x_t\rangle +\langle a,\bar{x}-x_t\rangle \right] \\&\qquad [\hbox {since}\,\, \widehat{y}=\sum \limits _t\lambda _ty_t\,\, \hbox {and}\,\, \sum \limits _t\lambda _t=1]\\&\quad =\sum \limits _t\lambda _t\left[ \langle Ay_t,\bar{x}-x_t\rangle +\langle Ay_t,x_t\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle \right. \\&\qquad \left. -\langle \bar{y},A^*x_t\rangle +\langle a,\bar{x}-x_t\rangle \right] \\&\quad =\sum \limits _t\lambda _t\left[ \langle y_t,A^*x_t\rangle +\langle \bar{y},G(y_t)\rangle -\langle y_t,G(y_t)\rangle -\langle \bar{y},A^*x_t\rangle \right] \\&\qquad + \sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \\&\quad ={\sum }_t\lambda _t\langle A^*x_t-G(y_t),y_t-\bar{y}\rangle +\sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \ge -\epsilon \\&\qquad +\sum \limits _t\lambda _t\langle Ay_t+a,\bar{x}-x_t\rangle \\&\qquad [\hbox {by} (15) \,\,\hbox {due to} \bar{y}=y(\bar{x})\in Y(X)]. \end{aligned}$$

The bottom line is that

$$\begin{aligned} \langle \Phi (\bar{x}),\widehat{x}-\bar{x}\rangle \le \epsilon +\sum _{t=1}^N\lambda _t\langle Ay_t+a,x_t-\bar{x}\rangle \,\forall \bar{x}\in X, \end{aligned}$$

as stated in (16). Theorem 2 is proved.

To prove Theorem 1, let \(y_t\in Y\), \(1\le t\le N\), and \(\lambda _1,\ldots ,\lambda _N\) be from the premise of the theorem, and let \(x_t\), \(1\le t\le N\), be specified as \(x_t=x(y_t)\), so that \(x_t\) is the minimizer of the linear form \(\langle Ay_t+a,x\rangle \) over \(x\in X\). Due to the latter choice, we have \(\sum _{t=1}^N\lambda _t\langle Ay_t+a,x_t-\bar{x}\rangle \le 0\) for all \(\bar{x}\in X\), while \(\epsilon \) as defined by (15) is nothing but \({\mathrm{Res}}(\{y_t,\lambda _t,-\Psi (x_t)\}_{t=1}^N|Y(X))\). Thus, (16) in the case in question implies that

$$\begin{aligned} \forall \bar{x}\in X: \langle \Phi (\bar{x}),\sum \limits _{t=1}^N\lambda _t x_t - \bar{x}\rangle \le {\mathrm{Res}}(\{y_t,\lambda _t,-\Psi (x_t)\}_{t=1}^N|Y(X)), \end{aligned}$$

and (13) follows. Relation (14) is an immediate corollary of (13) and Lemma 2 as applied to \(X\) in the role of \(Y\), \(\Phi \) in the role of \(H(\cdot )\), and \(\{x_t,\lambda _t,\Phi (x_t)\}_{t=1}^N\) in the role of \(\mathcal{C}^N\). \(\square \)

1.2 Proof of Proposition 1

Observe that the optimality conditions in the optimization problem specifying \(v={{\mathrm{Prox}}}_y(\xi )\) imply that

$$\begin{aligned} \langle \xi -\omega '(y)+\omega '(v),z-v\rangle \ge 0,\,\,\forall z\in Y, \end{aligned}$$

or

$$\begin{aligned} \langle \xi ,v-z\rangle \le \langle \omega '(v)-\omega '(y),z-v\rangle =\langle V'_y(v),z-v\rangle ,\,\,\forall z\in Y, \end{aligned}$$

which, using a remarkable identity [5]

$$\begin{aligned} \langle V'_y(v),z-v\rangle =V_y(z)-V_v(z)-V_y(v), \end{aligned}$$

can be rewritten equivalently as

$$\begin{aligned} v={{\mathrm{Prox}}}_y(\xi )\Rightarrow \langle \xi ,v-z\rangle \le V_y(z)-V_v(z)-V_y(v)\,\,\forall z\in Y. \end{aligned}$$
(54)

Setting \(y=y_t\), \(\xi =\gamma _t H_t(y_t)\), which results in \(v=y_{t+1}\), we get

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(y_t),y_{t+1}-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)-V_{y_t}(y_{t+1}), \end{aligned}$$

whence,

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(y_t),y_{t}-z\rangle\le & {} V_{y_t}(z)-V_{y_{t+1}}(z)+\underbrace{\left[ \gamma _t\langle H_t(y_t),y_t-y_{t+1}\rangle -V_{y_t}(y_{t+1})\right] }_{\le \gamma _t\Vert H_t(y_t)\Vert _*\Vert y_t-y_{t+1}\Vert -\frac{1}{2}\Vert y_t-y_{t+1}\Vert ^2} \\\le & {} V_{y_t}(z)-V_{y_{t+1}}(z)+\frac{1}{2}\gamma _t^2\Vert H_t(y_t)\Vert _*^2. \end{aligned}$$

Summing up these inequalities over \(t=1,\ldots ,N\) and taking into account that for \(z\in Y'\), we have \(V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']\) and that \(V_{y_{N+1}}(z)\ge 0\), we get (19). \(\square \)

1.3 Proof of Proposition 2

Applying (54) to \(y=y_t\), \(\xi =\gamma _t H_t(z_t)\), which results in \(v=y_{t+1}\), we get

$$\begin{aligned} \forall z\in Y: \gamma _t\langle H_t(z_t),y_{t+1}-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)-V_{y_t}(y_{t+1}), \end{aligned}$$

whence, by the definition (23) of \(d_t\),

$$\begin{aligned} \begin{array}{l} \forall z\in Y: \gamma _t\langle H_t(z_t),z_t-z\rangle \le V_{y_t}(z)-V_{y_{t+1}}(z)+d_t.\\ \end{array} \end{aligned}$$
(55)

Summing up the resulting inequalities over \(t=1,\ldots ,N\) and taking into account that \(V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']\) for all \(z\in Y'\) and \(V_{y_{N+1}}(z)\ge 0\), we get

$$\begin{aligned} \forall z\in Y': \sum _{t=1}^n\lambda ^N_t\langle H_t(z_t),z_t-z\rangle \le {\frac{1}{2}\Omega ^2[Y']+\sum _{t=1}^N d_t\over \sum _{t=1}^N\gamma _t}. \end{aligned}$$

The right hand side in the latter inequality is independent of \(z\in Y'\). Taking supremum of the left hand side over \(z\in Y'\), we arrive at (24).

Moreover, invoking (54) with \(y=y_t\), \(\xi =\gamma _t H_t(y_t)\) and specifying \(z\) as \(y_{t+1}\), we get

$$\begin{aligned} \gamma _t\langle H_t(y_t),z_t-y_{t+1}\rangle \le V_{y_t}(y_{t+1})-V_{z_t}(y_{t+1})-V_{y_t}(z_t), \end{aligned}$$

whence

$$\begin{aligned} \begin{aligned} d_t&=\gamma _t\langle H_t(z_t),z_t-y_{t+1}\rangle - V_{y_t}(y_{t+1})\le \gamma _t\langle H_t(y_t),z_t-y_{t+1}\rangle \\&\quad +\gamma _t\langle H_t(z_t)-H_t(y_t),z_t-y_{t+1}\rangle - V_{y_t}(y_{t+1})\\&\le -V_{z_t}(y_{t+1})-V_{y_t}(z_t)+\gamma _t\langle H_t(z_t)-H_t(y_t),z_t-y_{t+1}\rangle \\&\le \gamma _t\Vert H_t(z_t)-H_t(y_t)\Vert _*\Vert z_t-y_{t+1}\Vert -{\frac{1}{2}}\Vert z_t-y_{t+1}\Vert ^2-{\frac{1}{2}}\Vert y_t-z_t\Vert ^2\\&\le {\frac{1}{2}}\left[ \gamma _t^2\Vert H_t(z_t)-H_t(y_t)\Vert _*^2-\Vert y_t-z_t\Vert ^2\right] ,\\ \end{aligned} \end{aligned}$$
(56)

as required in (25). \(\square \)

1.4 Proof of Lemma 3

1 \(^0\). We start with the following standard fact:

Lemma 4

Let \(Y\) be a nonempty closed convex set in Euclidean space \(F\), \(\Vert \cdot \Vert \) be a norm on \(F\), and \(\omega (\cdot )\) be a continuously differentiable function on \(Y\) which is strongly convex, modulus 1, w.r.t. \(\Vert \cdot \Vert \). Given \(b\in F\) and \(y\in Y\), let us set

$$\begin{aligned} \begin{array}{rcl} g_y(\xi )&{}=&{}\max \limits _{z\in Y}\left[ \langle z,\omega '(y)-\xi \rangle -\omega (z)\right] :F\rightarrow {\mathbf {R}}, \\ {{\mathrm{Prox}}}_y(\xi )&{}=&{}\mathop {\hbox {argmax }}\limits _{z\in Y}\left[ \langle z,\omega '(y)-\xi \rangle -\omega (z)\right] . \\ \end{array} \end{aligned}$$

The function \(g_y\) is convex with Lipschitz continuous gradient \(\nabla g_y(\xi )=-{{\mathrm{Prox}}}_y(\xi )\):

$$\begin{aligned} \Vert \nabla g_y(\xi )-\nabla g_y(\xi ')\Vert \le \Vert \xi -\xi '\Vert _*\,\,\forall \xi ,\xi ', \end{aligned}$$
(57)

where \(\Vert \cdot \Vert _*\) is the norm conjugate to \(\Vert \cdot \Vert \).

Indeed, since \(\omega \) is strongly convex and continuously differentiable on \(Y\), \({{\mathrm{Prox}}}_y(\cdot )\) is well defined, and from optimality conditions it holds

$$\begin{aligned} \langle \omega '({{\mathrm{Prox}}}_y(\xi ))+\xi -\omega '(y),{{\mathrm{Prox}}}_y(\xi )-z\rangle \le 0\,\,\forall z\in Y. \end{aligned}$$
(58)

Consequently, \(g_y(\cdot )\) is well defined; this function clearly is convex, and the vector \(-{{\mathrm{Prox}}}_y(\xi )\) clearly is a subgradient of \(g_y\) at \(\xi \). If now \(\xi ',\xi ''\in F\), then, setting \(z'={{\mathrm{Prox}}}_y(\xi ')\), \(z''={{\mathrm{Prox}}}_y(\xi '')\) and invoking (58), we get

$$\begin{aligned} \langle \omega '(z')+\xi '-\omega '(y),z'-z''\rangle \le 0,\,\,\langle -\omega '(z'')-\xi ''+\omega '(y),z'-z''\rangle \le 0 \end{aligned}$$

whence, summing the inequalities up,

$$\begin{aligned} \langle \xi '-\xi '',z'-z''\rangle \le \langle \omega '(z')-\omega '(z''),z''-z'\rangle \le -\Vert z'-z''\Vert ^2, \end{aligned}$$

implying that \(\Vert z'-z''\Vert \le \Vert \xi '-\xi ''\Vert _*\). Thus, a subgradient field \(-{{\mathrm{Prox}}}_y(\cdot )\) of \(g_y(\cdot )\) is Lipschitz continuous with constant 1 from \(\Vert \cdot \Vert _*\) into \(\Vert \cdot \Vert \), whence \(g_y\) is continuously differentiable and (57) takes place. \(\square \)

2 \(^0\). To derive Lemma 3 from Lemma 4, note that \(f_y(x)\) is obtained from \(g_y(\cdot )\) by affine substitution of variables and adding linear form:

$$\begin{aligned} f_y(x)=g_y(\gamma [{G(y)}- A^*x])+\gamma \langle a,x\rangle {+\omega (y)-\langle \omega '(y),y\rangle .} \end{aligned}$$

whence \(\nabla f_y(x)=-\gamma A\nabla g_y(\gamma [{G(y)}-A^*x])+\gamma a=\gamma A {{\mathrm{Prox}}}_y(\gamma [{G(y)}-A^*x])+\gamma a\), as required in (39), and

$$\begin{aligned}&\Vert \nabla f_y(x')-\nabla f_y(x'')\Vert _{E,*}\\&\quad =\gamma \Vert A\left[ \nabla g_y(\gamma [{G(y)}-A^*x'])-\nabla g_y(\gamma [{G(y)}-A^*x''])\right] \Vert _{E,*}\\&\quad \le (\gamma L_A)\Vert \nabla g_y(\gamma [{G(y)}-A^*x'])-\nabla g_y(\gamma [{G(y)}-A^*x''])\Vert \\&\quad \le (\gamma L_A)\Vert \gamma [{G(y)}-A^*x']-\gamma [{G(y)}-A^*x'']\Vert _*\\&\quad \le (\gamma L_A)^2\Vert x'-x''\Vert _E=(L_A/L_G)^2\Vert x'-x''\Vert _E \end{aligned}$$

[we have used (57) and equivalences in (38)], as required in (40). \(\square \)

1.5 Review of Conditional Gradient algorithm

The required description of CGA and its complexity analysis are as follows.

As applied to minimizing a smooth – with Lipschitz continuous gradient

$$\begin{aligned} \Vert \nabla f(u)-\nabla f(u')\Vert _{E,*}\le \mathcal{L}\Vert u-u'\Vert _E,\,\,\forall u,u'\in X, \end{aligned}$$

convex function \(f\) over a convex compact set \(X\subset E\), the generic CGA is the recurrence of the form

$$\begin{aligned} \begin{array}{rcl} u_1&{}\in &{} X\\ u_{s+1}&{}\in &{} X \,\,\hbox { satisfies }\,\, f(u_{s+1})\le f(u_s+\gamma _s [u_s^+-u_s]),\,s=1,2,\ldots \\ \gamma _s&{}=&{}{2\over s+1},\,u_s^+\in \mathop {\hbox {Argmin }}\nolimits _{u\in X}\langle f'(u_s),u\rangle .\\ \end{array} \end{aligned}$$

The standard results on this recurrence (see, e.g., proof of Theorem 1 in [15]) state that if \(f_*=\min _X f\), then

$$\begin{aligned} \begin{array}{ll} (a)&{}\epsilon _{t+1}:=f(u_{t+1})-f_*\le \epsilon _{t}-\gamma _t\delta _t+{2\mathcal{L}R^2\gamma _t^2},\,t=1,2,...\\ &{}\delta _t:=\max \nolimits _{u\in X} \langle \nabla f(u_t),u_t-u\rangle ;\\ (b)&{}\epsilon _t\le {2\mathcal{L}R^2\over t+1},t=2,3,\ldots \\ \end{array} \end{aligned}$$
(59)

where \(R\) is the smallest of the radii of \(\Vert \cdot \Vert _E\)-balls containing \(X\). From (59.\(a\)) it follows that

$$\begin{aligned} \gamma _\tau \delta _\tau \le \epsilon _\tau -\epsilon _{\tau +1}+2\mathcal{L}R^2\gamma _\tau ^2,\,\tau =1,2,\ldots ; \end{aligned}$$

summing up these inequalities over \(\tau =t,t+1,\ldots ,2t\), where \(t>1\), we get

$$\begin{aligned} \left[ \min _{\tau \le 2t}\delta _\tau \right] \sum _{\tau =t}^{2t} \gamma _\tau \le \epsilon _t+2\mathcal{L}R^2\sum _{\tau =t}^{2t}\gamma _\tau ^2, \end{aligned}$$

which combines with (59.\(b\)) to imply that

$$\begin{aligned} \min _{\tau \le 2t}\delta _\tau \le O(1)\mathcal{L}R^2{{1\over t}+\sum _{\tau =t}^{2t}{1\over \tau ^2}\over \sum _{\tau =t}^{2t}{1\over \tau }}\le O(1){\mathcal{L}R^2\over t}. \end{aligned}$$

It follows that given \(\epsilon <\mathcal{L}R^2\), it takes at most \(O(1){\mathcal{L}R^2\over \epsilon }\) steps of CGA to generate a point \(u^\epsilon \in X\) with \(\max _{u\in X} \langle \nabla f(u^\epsilon ),u^\epsilon -u\rangle \le \epsilon \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Juditsky, A., Nemirovski, A. Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles. Math. Program. 156, 221–256 (2016). https://doi.org/10.1007/s10107-015-0876-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0876-3

Mathematics Subject Classification

Navigation