Abstract
The standard algorithms for solving large-scale convex–concave saddle point problems, or, more generally, variational inequalities with monotone operators, are proximal type algorithms which at every iteration need to compute a prox-mapping, that is, to minimize over problem’s domain X the sum of a linear form and the specific convex distance-generating function underlying the algorithms in question. (Relative) computational simplicity of prox-mappings, which is the standard requirement when implementing proximal algorithms, clearly implies the possibility to equip X with a relatively computationally cheap Linear Minimization Oracle (LMO) able to minimize over X linear forms. There are, however, important situations where a cheap LMO indeed is available, but where no proximal setup with easy-to-compute prox-mappings is known. This fact motivates our goal in this paper, which is to develop techniques for solving variational inequalities with monotone operators on domains given by LMO. The techniques we discuss can be viewed as a substantial extension of the proposed in Cox et al. (Math Program Ser B 148(1–2):143–180, 2014) method of nonsmooth convex minimization over an LMO-represented domain.
Similar content being viewed by others
Notes
In literature on v.i.’s, a variational inequality where a weak solution is sought is often called Minty’s v.i., and one where a strong solution is sought—Stampacchia’s v.i. Equivalence of the notions of weak and strong solutions in the case of continuous monotone operator is the finite dimensional version of the classical Minty’s Lemma (1967), see [6].
From now on, for a linear mapping \(x\mapsto Bx: E\rightarrow F\), where \(E,F\) are Euclidean spaces, \(B^*\) denotes the conjugate of \(B\), that is, a linear mapping \(y\mapsto B^*y:F\rightarrow E\) uniquely defined by the identity \(\langle Bx,y\rangle = \langle x,B^*y\rangle \) for all \(x\in E,\) \(y\in F\).
We could also take as \(Y\) a convex compact subset of \(F\) containing \(Bx\) and define \(y(x)\) as \(Bx\) for \(x\in X\) and as (any) strong solution to the \({\mathrm{VI}}(G(\cdot )-A^*x,Y)\) when \(x\not \in X\).
As we have already mentioned, with our proximal setup, the \(\omega \)-size of \(Y\) is \(\le \sqrt{2}\), and (20) is satisfied with \(M=2\sqrt{2}\).
References
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)
Candes, E.J., Plan, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Inf. Theory IEEE Trans. 57(4), 2342–2359 (2011)
Castellani, M., Mastroeni, G.: On the duality theory for finite dimensional variational inequalities. In: Giannessi, F., Maugeri, A. (eds.) Variational Inequalities Network Equilibrium Problems, pp. 21–31. Plenum Publishing, New York (1995)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM. J. Optim. 3(3), 538–543 (1993)
Chipot, M.: Variational Inequalities and Flow in Porous Media. Springer, New York (1984)
Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. Ser. B 148(1–2), 143–180 (2014)
Daniele, P.: Dual variational inequality and appluications to asymmetric traffic equalibrium probnlems with capacity constraints. Le Mathematiche XLIX(2), 111–211 (1994)
Demyanov, V., Rubinov, A.: Approximate Methods in Optimization Problems. Elsevier, Amsterdam (1970)
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
Dudik, M., Harchaoui, Z., Malick, J.: Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS-Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol. 22, pp. 327–336 (2012)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Freund, R., Grigasy. P.: New analysis and results for the Conditional Gradient method (2013) submitted to Mathematical Programming, E-print: http://web.mit.edu/rfreund/www/FW-paper-final
Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3386–3393 (2012)
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional Gradient algorithms for norm-regularized smooth convex optimization. Mathematical Programming. Online First (2014). doi:10.1007/s10107-014-0778-9. E-print: arXiv:1302.2325
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 427–435 (2013)
Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 471–478 (2010)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth large-scale convex minimization, I: general purpose methods; II: utilizing problem’s structure. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, pp. 121–184. The MIT Press, Cambridge (2012)
Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: Randomized first order algorithms with applications to \(\ell _1\)-minimization. Math. Program. Ser. A 142(1–2), 269–310 (2013)
Juditsky, A., Kilinç Karzan, F., Nemirovski, A.: On unified view of nullspace-type conditions for recoveries associated with general sparsity structures. Linear Algebra Appl. 441(1), 124–151 (2014)
Lemarechal, C., Nemirovski, A., Nesterov, Yu.: New variants of bundle methods. Math. Program. 69(1), 111–148 (1995)
Mosco, U.: Dual variational inequalities. J. Math. Anal. Appl. 40, 202–206 (1972)
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Nauka Publishers, Moscow (1978) (in Russian); John Wiley, New York (1983) (in English)
Nemirovskii, A.: Efficient iterative algorithms for variational inequalities with monotone operators. Ekonomika i Matematicheskie Metody 17(2), 344–359 (1981). (in Russian; Engllish translation: Matekon)
Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)
Nemirovski, A., Onn, S., Rothblum, U.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35, 52–78 (2010)
Nesterov, Yu., Nemirovski, A.: On first order algorithms for \(\ell _1\)/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)
Pshenichnyi, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)
Rockafellar, R.T.: Minimax theorems and conjugate saddle functions. Math. Scand. 14, 151–173 (1964)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Shalev-Shwartz, S., Gonen, A., Shamir, O.: Large-scale convex minimization with a low-rank constraint (2011). E-print: arXiv:1106.1622
Author information
Authors and Affiliations
Corresponding author
Additional information
Research of the first author was supported by the CNRS-Mastodons project GARGANTUA, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). Research of the second author was supported by the NSF grants CMMI 1232623 and CCF 1415498.
Proofs
Proofs
1.1 Proof of Theorems 1 and 2
We start with proving Theorem 2. In the notation of the theorem, we have
For \(\bar{x}\in X\), let \(\bar{y}=y(\bar{x})\), and let \(\widehat{y}=\sum \nolimits _t\lambda _ty_t\), so that \(\bar{y},\widehat{y}\in Y\) by (52.a). Since \(G\) is monotone, for all \(t\in \{1,\ldots ,N\}\) we have
and we conclude that
We now have
The bottom line is that
as stated in (16). Theorem 2 is proved.
To prove Theorem 1, let \(y_t\in Y\), \(1\le t\le N\), and \(\lambda _1,\ldots ,\lambda _N\) be from the premise of the theorem, and let \(x_t\), \(1\le t\le N\), be specified as \(x_t=x(y_t)\), so that \(x_t\) is the minimizer of the linear form \(\langle Ay_t+a,x\rangle \) over \(x\in X\). Due to the latter choice, we have \(\sum _{t=1}^N\lambda _t\langle Ay_t+a,x_t-\bar{x}\rangle \le 0\) for all \(\bar{x}\in X\), while \(\epsilon \) as defined by (15) is nothing but \({\mathrm{Res}}(\{y_t,\lambda _t,-\Psi (x_t)\}_{t=1}^N|Y(X))\). Thus, (16) in the case in question implies that
and (13) follows. Relation (14) is an immediate corollary of (13) and Lemma 2 as applied to \(X\) in the role of \(Y\), \(\Phi \) in the role of \(H(\cdot )\), and \(\{x_t,\lambda _t,\Phi (x_t)\}_{t=1}^N\) in the role of \(\mathcal{C}^N\). \(\square \)
1.2 Proof of Proposition 1
Observe that the optimality conditions in the optimization problem specifying \(v={{\mathrm{Prox}}}_y(\xi )\) imply that
or
which, using a remarkable identity [5]
can be rewritten equivalently as
Setting \(y=y_t\), \(\xi =\gamma _t H_t(y_t)\), which results in \(v=y_{t+1}\), we get
whence,
Summing up these inequalities over \(t=1,\ldots ,N\) and taking into account that for \(z\in Y'\), we have \(V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']\) and that \(V_{y_{N+1}}(z)\ge 0\), we get (19). \(\square \)
1.3 Proof of Proposition 2
Applying (54) to \(y=y_t\), \(\xi =\gamma _t H_t(z_t)\), which results in \(v=y_{t+1}\), we get
whence, by the definition (23) of \(d_t\),
Summing up the resulting inequalities over \(t=1,\ldots ,N\) and taking into account that \(V_{y_1}(z)\le \frac{1}{2}\Omega ^2[Y']\) for all \(z\in Y'\) and \(V_{y_{N+1}}(z)\ge 0\), we get
The right hand side in the latter inequality is independent of \(z\in Y'\). Taking supremum of the left hand side over \(z\in Y'\), we arrive at (24).
Moreover, invoking (54) with \(y=y_t\), \(\xi =\gamma _t H_t(y_t)\) and specifying \(z\) as \(y_{t+1}\), we get
whence
as required in (25). \(\square \)
1.4 Proof of Lemma 3
1 \(^0\). We start with the following standard fact:
Lemma 4
Let \(Y\) be a nonempty closed convex set in Euclidean space \(F\), \(\Vert \cdot \Vert \) be a norm on \(F\), and \(\omega (\cdot )\) be a continuously differentiable function on \(Y\) which is strongly convex, modulus 1, w.r.t. \(\Vert \cdot \Vert \). Given \(b\in F\) and \(y\in Y\), let us set
The function \(g_y\) is convex with Lipschitz continuous gradient \(\nabla g_y(\xi )=-{{\mathrm{Prox}}}_y(\xi )\):
where \(\Vert \cdot \Vert _*\) is the norm conjugate to \(\Vert \cdot \Vert \).
Indeed, since \(\omega \) is strongly convex and continuously differentiable on \(Y\), \({{\mathrm{Prox}}}_y(\cdot )\) is well defined, and from optimality conditions it holds
Consequently, \(g_y(\cdot )\) is well defined; this function clearly is convex, and the vector \(-{{\mathrm{Prox}}}_y(\xi )\) clearly is a subgradient of \(g_y\) at \(\xi \). If now \(\xi ',\xi ''\in F\), then, setting \(z'={{\mathrm{Prox}}}_y(\xi ')\), \(z''={{\mathrm{Prox}}}_y(\xi '')\) and invoking (58), we get
whence, summing the inequalities up,
implying that \(\Vert z'-z''\Vert \le \Vert \xi '-\xi ''\Vert _*\). Thus, a subgradient field \(-{{\mathrm{Prox}}}_y(\cdot )\) of \(g_y(\cdot )\) is Lipschitz continuous with constant 1 from \(\Vert \cdot \Vert _*\) into \(\Vert \cdot \Vert \), whence \(g_y\) is continuously differentiable and (57) takes place. \(\square \)
2 \(^0\). To derive Lemma 3 from Lemma 4, note that \(f_y(x)\) is obtained from \(g_y(\cdot )\) by affine substitution of variables and adding linear form:
whence \(\nabla f_y(x)=-\gamma A\nabla g_y(\gamma [{G(y)}-A^*x])+\gamma a=\gamma A {{\mathrm{Prox}}}_y(\gamma [{G(y)}-A^*x])+\gamma a\), as required in (39), and
[we have used (57) and equivalences in (38)], as required in (40). \(\square \)
1.5 Review of Conditional Gradient algorithm
The required description of CGA and its complexity analysis are as follows.
As applied to minimizing a smooth – with Lipschitz continuous gradient
convex function \(f\) over a convex compact set \(X\subset E\), the generic CGA is the recurrence of the form
The standard results on this recurrence (see, e.g., proof of Theorem 1 in [15]) state that if \(f_*=\min _X f\), then
where \(R\) is the smallest of the radii of \(\Vert \cdot \Vert _E\)-balls containing \(X\). From (59.\(a\)) it follows that
summing up these inequalities over \(\tau =t,t+1,\ldots ,2t\), where \(t>1\), we get
which combines with (59.\(b\)) to imply that
It follows that given \(\epsilon <\mathcal{L}R^2\), it takes at most \(O(1){\mathcal{L}R^2\over \epsilon }\) steps of CGA to generate a point \(u^\epsilon \in X\) with \(\max _{u\in X} \langle \nabla f(u^\epsilon ),u^\epsilon -u\rangle \le \epsilon \).
Rights and permissions
About this article
Cite this article
Juditsky, A., Nemirovski, A. Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles. Math. Program. 156, 221–256 (2016). https://doi.org/10.1007/s10107-015-0876-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0876-3