A framework for parallel second order incremental optimization algorithms for solving partially separable problems

Kaya, Kamer; Öztoprak, Figen; Birbil, Ş. İlker; Cemgil, A. Taylan; Şimşekli, Umut; Kuru, Nurdan; Koptagel, Hazal; Öztürk, M. Kaan

doi:10.1007/s10589-018-00057-7

A framework for parallel second order incremental optimization algorithms for solving partially separable problems

Published: 02 January 2019

Volume 72, pages 675–705, (2019)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Kamer Kaya¹,
Figen Öztoprak²,
Ş. İlker Birbil ORCID: orcid.org/0000-0001-7472-7032³,
A. Taylan Cemgil⁴,
Umut Şimşekli⁵,
Nurdan Kuru¹,
Hazal Koptagel⁴ &
…
M. Kaan Öztürk¹

596 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

We propose Hessian Approximated Multiple Subsets Iteration (HAMSI), which is a generic second order incremental algorithm for solving large-scale partially separable convex and nonconvex optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. We prove the convergence properties of our algorithm when the subset selection step is deterministic. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that a particular implementation of the proposed method based on L-BFGS updates converges more rapidly than a parallel gradient descent when both methods are used to solve large-scale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of gradient descent are applicable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

Notes

https://github.com/spartensor/hamsi-mf.
An already used random color is chosen if possible, otherwise, a new color is chosen.
cf. http://www.7-cpu.com/.

References

Berahas, A.S., Nocedal, J., Takáč, M.: A multi-batch L-BFGS method for machine learning. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5–10, 2016, Barcelona, Spain, pp. 1055–1063 (2016)
Bertsekas, D.P.: Incremental least squares methods and the extended Kalman filter. SIAM J. Optim. 6(3), 807–822 (1996)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 1–38, 2011 (2010)
Google Scholar
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)
Article MathSciNet MATH Google Scholar
Bozdağ, D., Çatalyürek, Ü.V., Gebremedhin, A.H., Manne, F., Boman, E.G., Özgüner, F.: Distributed-memory parallel algorithms for distance-2 coloring and related problems in derivative computation. SIAM J. Sci. Comput. 32(4), 2418–2446 (2010)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 63(1–3), 129–156 (1994)
Article MathSciNet MATH Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorization. Wiley, New York (2009)
Book MATH Google Scholar
Daneshmand, A., Facchinei, F., Kungurtsev, V., Scutari, G.: Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization. IEEE Trans. Signal Process. 63(15), 3914–3929 (2015)
Article MathSciNet MATH Google Scholar
Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)
Article MathSciNet MATH Google Scholar
Gebremedhin, A.H., Manne, F., Pothen, A.: Parallel distance-k coloring algorithms for numerical optimization. In: Euro-Par 2002 Parallel Processing—8th International Conference, pp. 912–921 (2002)
Gebremedhin, A.H., Manne, F., Pothen, A.: What color is your Jacobian? Graph coloring for computing derivatives. SIAM Rev. 47(4), 629–705 (2005)
Article MathSciNet MATH Google Scholar
Gebremedhin, A.H., Nguyen, D., Patwary, MdMA, Pothen, A.: ColPack: software for graph coloring and related problems in scientific computing. ACM Trans. Math. Softw. 40(1), 1:1–1:31 (2013)
Article MathSciNet MATH Google Scholar
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: ACM SIGKDD (2011)
Gower, R.M., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, New York, USA, 20–22 June 2016. PMLR, pp. 1869–1878
Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.: A globally convergent incremental Newton method. Math. Program. 151(1), 283–313 (2015)
Article MathSciNet MATH Google Scholar
Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5(4), 19:1–19:19 (2015)
Article Google Scholar
Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum–product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MathSciNet MATH Google Scholar
Lian, X., Huang, Y., Li, Y., Liu, J.: Asynchronous parallel stochastic gradient for nonconvex optimization. In: Advances in Neural Information Processing Systems, pp. 2737–2745 (2015)
Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16(1), 285–322 (2015)
MathSciNet MATH Google Scholar
Mangasarian, O.L., Solodov, M.V.: Serial and parallel backpropation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)
Article Google Scholar
Mareček, J., Richtárik, P., Takáč, M.: Distributed block coordinate descent for minimizing partially separable functions. In: Al-Baali, M., Grandinetti, L., Purnama, A. (eds.) Numerical Analysis and Optimization, pp. 261–288. Springer, Berlin (2015)
Chapter Google Scholar
Matula, D.W.: A min-max theorem for graphs with application to graph coloring. SIAM Rev. 10, 481–482 (1968)
Google Scholar
Mokhtari, A., Eisen, M., Ribeiro, A.: IQN: an incremental quasi-Newton method with local superlinear convergence rate. arXiv preprint arXiv:1702.00709 (2017)
Moritz, P., Nishihara, R., Jordan, M.I.: A linearly-convergent stochastic L-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258 (2016)
Mota, J.F.C., Xavier, J.M.F., Aguiar, P.M.Q., Püschel, M.: D-ADMM: a communication-efficient distributed algorithm for separable optimization. IEEE Trans. Signal Process. 61(10), 2718–2723 (2013)
Article MathSciNet MATH Google Scholar
Pan, X., Lam, M., Tu, S., Papailiopoulos, D., Zhang, S., Jordan, M.I., Ramchandran, K., Ré, C.: Cyclades: conflict-free asynchronous machine learning. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2568–2576. Curran Associates Inc, Red Hook (2016)
Google Scholar
Recht, B., Re, C., Wright, S., Feng, N.: HOGWILD: a lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 693–701. Curran Associates Inc., Red Hook (2011)
Google Scholar
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1–2), 433–484 (2016)
Article MathSciNet MATH Google Scholar
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence\_rate for finite training sets. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 2663–2671. Curran Associates Inc., Red Hook (2012)
Google Scholar
Scherrer, C., Halappanavar, M., Tewari, A., Haglin, D.: Scaling up coordinate descent algorithms for large $\ell _1$ regularization problems. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26–July 1 (2012)
Schraudolph, N.N., Yu, J., Gunter, S.: A stochastic quasi-Newton method for online convex optimization. In: Proceedings of the 11th International Conference Artificial Intelligence and Statistics (AISTATS), pp. 433–440 (2007)
Shamir, O., Srebro, N., Zhang, T.: Communication efficient distributed optimization using an approximate Newton-type method. In: International Conference on Machine Learning (ICML) (2014)
Singh, A.P., Gordon, G.J.: A unified view of matrix factorization models. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Part II, number 5212, pp. 358–373. Springer, Berlin (2008)
Sohl-Dickstein, J., Poole, B., Ganguli, S.: Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods. In: Proceedings of the 31th International Conference on Machine Learning (ICML), pp. 604–612 (2014)
Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1), 23–35 (1998)
Article MathSciNet MATH Google Scholar
Tseng, P.: An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998)
Article MathSciNet MATH Google Scholar
Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2), 927–956 (2017)
Article MathSciNet MATH Google Scholar
Yousefian, F., Nedić, A., Shanbhag, U.V.: Stochastic quasi-Newton methods for non-strongly convex problems: convergence and rate analysis. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4496–4503. IEEE (2016)
Zuckerman, D.: Linear degree extractors and the inapproximability of max clique and chromatic number. Theory Comput. 3, 103–128 (2007)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) Grant No. 113M492.

Author information

Authors and Affiliations

Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Istanbul, Turkey
Kamer Kaya, Nurdan Kuru & M. Kaan Öztürk
Department of Industrial Engineering, Istanbul Bilgi University, 34060, Istanbul, Turkey
Figen Öztoprak
Econometric Institute, Erasmus University Rotterdam, 3000 DR, Rotterdam, The Netherlands
Ş. İlker Birbil
Department of Computer Engineering, Boğaziçi University, 34342, Istanbul, Turkey
A. Taylan Cemgil & Hazal Koptagel
LTCI, Télécom ParisTech, Université Paris-Saclay, 75013, Paris, France
Umut Şimşekli

Authors

Kamer Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Figen Öztoprak
View author publications
You can also search for this author in PubMed Google Scholar
Ş. İlker Birbil
View author publications
You can also search for this author in PubMed Google Scholar
A. Taylan Cemgil
View author publications
You can also search for this author in PubMed Google Scholar
Umut Şimşekli
View author publications
You can also search for this author in PubMed Google Scholar
Nurdan Kuru
View author publications
You can also search for this author in PubMed Google Scholar
Hazal Koptagel
View author publications
You can also search for this author in PubMed Google Scholar
M. Kaan Öztürk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ş. İlker Birbil.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Omitted proofs

1.1 Proof of Lemma 1

By using Assumption A.2, we have

$$\begin{aligned} \begin{array}{rl} \left\| \nabla f_{S_{[l]}}(x^{(t,l-1)}) - \nabla f_{S_{[l]}}(x^{(t)}) \right\| &{} \le L\Vert x^{(t,l-1)} - x^{(t)} \Vert \\ &{} = L\Vert x^{(t,l-1)} - x^{(t, l-2)} + x^{(t, l-2)} - x^{(t, l-3)} \\ &{}\quad + \dots + x^{(t,1)} - x^{(t)}\Vert \\ &{} \le L \sum \limits _{j=1}^{\ell -1}\Vert x^{(t,j)} - x^{(t, j-1)}\Vert . \end{array} \end{aligned}$$

Note for $j=1, \dots , \ell -1$ that

$$\begin{aligned} \left\| x^{(t,j)} - x^{(t, j-1)}\right\| = \left\| x^{(t,j-1)} - (H^{(t)} + \beta _tI)^{-1}\nabla f_{S_{[j]}}(x^{(t,j-1)}) - x^{(t, j-1)}\right\| \le M_t C, \end{aligned}$$

where the last inequality holds by Assumption A.4. Therefore, we have

$$\begin{aligned} \left\| \nabla f_{S_{[l]}}(x^{(t,l-1)}) - \nabla f_{S_{[l]}}(x^{(t)}) \right\| \le LM_tC(\ell - 1). \end{aligned}$$

1.2 Proof of Lemma 2

At iteration $t + 1$, we have

$$\begin{aligned} \begin{array}{rl} x^{(t+1)} &{} = x^{(t)} - \sum \limits _{\ell =1}^{K+1} \left( H^{(t)} + \beta _tI\right) ^{-1}\nabla f_{S_{[\ell ]}}(x^{(t, \ell -1)}) \\ &{} = x^{(t)} - (H^{(t)} + \beta _tI)^{-1}\nabla f(x^{(t)}) \\ &{} \quad +\, (H^{(t)} + \beta _tI)^{-1}\sum \limits _{\ell =1}^{K+1} \left( \nabla f_{S_{[\ell ]}}(x^{(t)}) - \nabla f_{S_{[\ell ]}}(x^{(t,\ell -1)})\right) . \end{array} \end{aligned}$$

This shows that

$$\begin{aligned} x^{(t+1)} - x^{(t)} = \Delta _t - (H^{(t)} + \beta _tI)^{-1}\nabla f(x^{(t)}), \end{aligned}$$

(11)

where

$$\begin{aligned} \Delta _t\equiv (H^{(t)} + \beta _tI)^{-1}\sum _{\ell =1}^{K+1} \left( \nabla f_{S_{[\ell ]}}(x^{(t)}) - \nabla f_{S_{[\ell ]}}(x^{(t,\ell -1)})\right) . \end{aligned}$$

Using now (9) implies

$$\begin{aligned} \begin{array}{rl} \Vert \Delta _t\Vert &{} \le M_t\sum \limits _{\ell =1}^{K+1} \left\| \nabla f_{S_{[\ell ]}}(x^{(t)}) - \nabla f_{S_{[\ell ]}}(x^{(t,\ell -1)})\right\| \\ &{} \le M_t\sum \limits _{\ell =1}^{K+1}LM_tC(\ell -1) = \frac{1}{2}LM^2_tC K(K+1) = BM_t^2. \\ \end{array} \end{aligned}$$

(12)

Then, we obtain

$$\begin{aligned} \begin{array}{rl} \Vert x^{(t+1)} - x^{(t)}\Vert &{} = \Vert \Delta _t - (H^{(t)} + \beta _tI)^{-1}\nabla f(x^{(t)})\Vert \\ &{} \le \Vert \Delta _t\Vert + \Vert (H^{(t)} + \beta _tI)^{-1}\Vert \nabla f(x^{(t)})\Vert \\ &{} \le BM_t^2 + CM_t \le \frac{B + C(M+1)}{M+1}M_t. \end{array} \end{aligned}$$

1.3 Proof of Theorem 3

As f is a twice differentiable function, we have

$$\begin{aligned} f(x^{(t+1)}) - f(x^{(t)}) \le {\nabla f(x^{(t)})}^{T} (x^{(t+1)} - x^{(t)}) + \frac{LK}{2} \Vert x^{(t+1)} - x^{(t)}\Vert ^2. \end{aligned}$$

Using now Lemma 2 along with (11) and (12), we obtain

$$\begin{aligned} \begin{array}{rl} f(x^{(t+1)}) - f(x^{(t)}) &{} \le \nabla f(x^{(t)})^T \Delta _t - \nabla f(x^{(t)})^T (H^{(t)} \\ &{}\quad + \beta _tI)^{-1}\nabla f(x^{(t)}) + \frac{LK}{2}\Vert x^{(t+1)} - x^{(t)}\Vert ^2 \\ &{} \le \Vert \nabla f(x^{(t)})\Vert \Vert |\Delta _t\Vert - U_t\Vert \nabla f(x^{(t)}) \Vert ^2 \\ &{}\quad + \frac{LK}{2}\left( \frac{B + C(M+1)}{M+1}\right) ^2M^2_t \\ &{} \le - U_t\Vert \nabla f(x^{(t)}) \Vert ^2 + \bar{B}M^2_t, \end{array} \end{aligned}$$

(13)

where $\bar{B} \equiv CB + \frac{LK}{2}\left( \frac{B + C(M+1)}{M+1}\right) ^2$. Due to Assumption A.1, we can write $\inf _{x \in \mathbb {R}^n} f(x) = f^* > -\infty $. Thus, we obtain

$$\begin{aligned} 0 \le f(x^{(t+1)}) - f^* \le f(x^{(t)}) - f^* + \bar{B}M^2_t. \end{aligned}$$

Relation (10) and Lemma 2.2 in Mangasarian and Solodov [21] together show that the sequence $\{f(x^{(t)})\}$ converges. By using (13), we further have

$$\begin{aligned} \begin{array}{rl} f(x^{(1)}) - f^* &{} \ge f(x^{(1)}) - f(x^{(t)}) = \sum \limits _{j=1}^{t-1}\left( f(x^{(t)}) - f(x^{(t+1)})\right) \\ &{} \ge \sum \limits _{j=1}^{t-1}U_j \Vert \nabla f(x^{(j)}) \Vert ^2 - \bar{B}\sum \limits _{j=1}^{t-1}M^2_j \\ &{} \ge \underset{1 \le j \le t-1}{\inf } \Vert \nabla f(x^{(j)}) \Vert ^2 \sum \limits _{j=1}^{t-1}U_j - \bar{B}\sum \limits _{j=1}^{t-1}M^2_j. \end{array} \end{aligned}$$

Let now $t \rightarrow \infty $, then

$$\begin{aligned} f(x^{(1)}) - f^* \ge \inf _{j \ge 1}\Vert \nabla f(x^{(j)}) \Vert ^2 \sum _{j=1}^{\infty }U_j - \bar{B}\sum _{j=1}^{\infty }M^2_j. \end{aligned}$$

(14)

Using again Assumption A.1 and conditon (10), we obtain

$$\begin{aligned} \inf _{t \ge 1}\Vert \nabla f(x^{(t)}) \Vert = 0. \end{aligned}$$

(15)

Now, suppose for contradiction that the sequence $\{\nabla f(x^{(t)})\}$ does not converge to zero. Then, there exists an increasing sequence of integers such that for some $\varepsilon > 0$, we have $\Vert \nabla f(x^{(t_\tau )})\Vert \ge \varepsilon $ for all $\tau $. On the other hand, the relation (15) implies that there exist some $j > t_\tau $ such that $\Vert \nabla f(x^{(j)})\Vert \le \frac{\varepsilon }{2}$. Let $j_\tau $ be the least integer for each $\tau $ satisfying these inequalities. Then, we have

$$\begin{aligned} \begin{array}{rl} \frac{\varepsilon }{2} &{} \le \Vert \nabla f(x^{(t_\tau )})\Vert - \Vert \nabla f(x^{(j_\tau )})\Vert \\ &{} \le \Vert \nabla f(x^{(t_\tau )}) - \nabla f(x^{(j_\tau )})\Vert \\ &{} \le LK\Vert x^{(t_\tau )} - x^{(j_\tau )}\Vert \le LK\frac{B+C(M+1)}{M+1}\sum \limits _{k=t_\tau }^{j_\tau -1} M_k, \end{array} \end{aligned}$$

where the last inequality follows from Lemma 2. Since $0 < M \le U$, there exists $\zeta \le \frac{M}{U} \le 1$ such that

$$\begin{aligned} M + \beta _k \ge \zeta U + \beta _k \ge \zeta (U + \beta _k) \implies M_k \le \frac{1}{\zeta } U_k. \end{aligned}$$

Thus, we obtain

$$\begin{aligned} 0 < \hat{B} \equiv \frac{\varepsilon (M+1)\zeta }{2LK(B+C(M+1))} \le \sum _{k=t_\tau }^{j_\tau - 1} U_k. \end{aligned}$$

Then, using together with inequality (13), we obtain

$$\begin{aligned} \begin{array}{rl} f(x^{(t_\tau )}) - f(x^{(j_\tau )}) &{} \ge \sum \limits _{k=t_\tau }^{j_\tau -1} U_k\Vert \nabla f(x^{(k)})\Vert ^2 - \bar{B}\sum \limits _{k=t_\tau }^{j_\tau -1}M_k^2 \\ &{} \ge \hat{B} \underset{t_\tau \le k \le j_\tau -1}{\inf } \Vert \nabla f(x^{(k)})\Vert ^2 - \bar{B}\sum \limits _{k=t_\tau }^{\infty }M_k^2. \end{array} \end{aligned}$$

Since the left-hand-side of the inequality converges and the condition (10) holds, we have

$$\begin{aligned} \lim _{\tau \uparrow \infty }~\underset{t_\tau \le k \le j_\tau -1}{\inf } \Vert \nabla f(x^{(k)})\Vert ^2 = 0. \end{aligned}$$

(16)

But our choice of $t_\tau $ and $j_\tau $ guarantees $\Vert \nabla f(x^{(k)})\Vert > \frac{\varepsilon }{2}$ for all $t_\tau \le k \le j_\tau $, and hence, we arrive at a contradiction with (16). Therefore, ${\nabla f(x^{(t)})}$ converges, and with the continuity of the gradient, we conclude for each accumulation point $x^*$ of the sequence $\{x^{(t)}\}$ that $\nabla f(x^*) = 0$ holds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaya, K., Öztoprak, F., Birbil, Ş.İ. et al. A framework for parallel second order incremental optimization algorithms for solving partially separable problems. Comput Optim Appl 72, 675–705 (2019). https://doi.org/10.1007/s10589-018-00057-7

Download citation

Received: 15 November 2017
Published: 02 January 2019
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10589-018-00057-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for parallel second order incremental optimization algorithms for solving partially separable problems

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Omitted proofs

1.1 Proof of Lemma 1

1.2 Proof of Lemma 2

1.3 Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for parallel second order incremental optimization algorithms for solving partially separable problems

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Omitted proofs

Appendix A: Omitted proofs

1.1 Proof of Lemma 1

1.2 Proof of Lemma 2

1.3 Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation