Parallel stochastic gradient algorithms for large-scale matrix completion

Recht, Benjamin; Ré, Christopher

doi:10.1007/s12532-013-0053-8

Parallel stochastic gradient algorithms for large-scale matrix completion

Full Length Paper
Published: 21 April 2013

Volume 5, pages 201–226, (2013)
Cite this article

Mathematical Programming Computation Aims and scope Submit manuscript

Benjamin Recht¹ &
Christopher Ré¹

2153 Accesses
144 Citations
3 Altmetric
Explore all metrics

Abstract

This paper develops Jellyfish, an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and least-squares problems regularized by the nuclear norm or \(\gamma _2\)-norm. Jellyfish implements a projected incremental gradient method with a biased, random ordering of the increments. This biased ordering allows for a parallel implementation that admits a speed-up nearly proportional to the number of processors. On large-scale matrix completion tasks, Jellyfish is orders of magnitude more efficient than existing codes. For example, on the Netflix Prize data set, prior art computes rating predictions in approximately 4 h, while Jellyfish solves the same problem in under 3 min on a 12 core workstation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Parallel Nesterov’s method for large-scale minimization of partially separable functions

Article 05 March 2016

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms

Notes

We were also unable to find any implementations of similar algorithms written in a low-level language such as C. While coding directly in C or Fortran would likely yield a version of NNLS which is considerably faster than the Matlab version, we doubt that it would yield the hundred-fold speedups necessary to be competitive with Jellyfish.

References

Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35(4), 787–803 (2006)
Article MathSciNet MATH Google Scholar
Bach, F.R., Marial, J., Ponce, J.: Convex sparse matrix factorizations (2008). Preprint available at http://arxiv.org/abs/0812.1869
Balzano, L., Nowak, R., Recht, B.: Online identification and tracking of subspaces from highly incomplete information. In: Proceedings of the 48th Annual Allerton Conference (2010)
Bertsekas, D.P.: A hybrid incremental gradient method for least squares. SIAM J. Optim. 7, 913–925 (1997)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Program., 2nd edn. Athena Scientific, Belmont (1999)
Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. Neural Information Processing Systems (2008)
Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3), 427–444 (2005)
Article MathSciNet MATH Google Scholar
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Article Google Scholar
Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009)
Article Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Sim. 4(4), 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Mathe. Program. Ser. A 91, 201–213 (2002)
Article MATH Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (1999)
Funk, S.: Netflix update: Try this at home (2006). http://sifter.org/simon/journal/20061211.html
Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57, 1548–1566 (2011)
Article Google Scholar
Jameson, G.J.O.: Summing and Nuclear Norms in Banach Space Theory. No. 8 in London Mathematical Society Student Texts. Cambridge University Press, Cambridge, UK (1987)
Ji, S., Ye, J.: An accelerated gradient method for trace norm minimization. Proceedings of the ICML (2009)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2009)
Article MathSciNet Google Scholar
Knuth, D.E.: The Art of Computer Programming, 2nd edn. Addison-Wesley Professional, Boston (1998)
Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet MATH Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Lee, J., Recht, B., Srebro, N., Salakhutdinov, R.R., Tropp, J.A.: Practical large-scale optimization for max-norm regularization. In: Advances in Neural Information Processing Systems (2010)
Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31(3), 1235–1256 (2009)
Article MathSciNet Google Scholar
Luo, Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3(2), 226–245 (1991)
Article Google Scholar
Luo, Z.Q., Tseng, P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4, 85–101 (1994)
Article Google Scholar
Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming pp. 1–33 (2009). Published online first at http://dx.doi.org/10.1007/s10107-009-0306-5
Mangasarian, O.L., Solodov, M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)
Article Google Scholar
Nedic A., Bertsekas, D.P.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P.M. (eds) Stochastic Optimization: Algorithms and Applications, Kluwer Academic Publishers, pp. 263–304 (2000)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Tech. rep., CORE Discussion Paper (2007). Preprint available at http://www.optimization-online.org/DB_HTML/2007/09/1784.html
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
MathSciNet Google Scholar
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MathSciNet MATH Google Scholar
Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. Proceedings of the International Conference of Machine Learning (2005)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems (2008)
Srebro, N., Rennie, J., Jaakkola, T.: Maximum margin matrix factorization. In: Advances in Neural Information Processing Systems (2004)
Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: 18th Annual Conference on Learning Theory (COLT) (2005)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(1), 136–144 (2002)
Article Google Scholar
Toh, K.C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacific J. Math. 6, 615–640 (2010)
MathSciNet MATH Google Scholar
Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998)
Article MathSciNet MATH Google Scholar
Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Tech. rep., Rice University, CAAM Technical, Report TR10-07 (2010)

Download references

Acknowledgments

This work was supported in part by ONR Contract N00014-11-M-0478. BR is additionally supported by ONR award N00014-11-1-0723 and NSF award CCF-1139953. CR is additionally supported by the Air Force Research Laboratory (AFRL) under prime contract no. FA8750-09-C-0181, the NSF CAREER award under IIS-1054009, ONR award N000141210041, and gifts or research awards from Google, Greenplum, Johnson Controls, Inc., LogicBlox, and Oracle. Any opinions, findings, and conclusion or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of any of the above sponsors including DARPA, AFRL, or the US government.

Author information

Authors and Affiliations

Computer Sciences Department, University of Wisconsin-Madison, 1210 W Dayton St, Madison, WI, 53706, USA
Benjamin Recht & Christopher Ré

Authors

Benjamin Recht
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Ré
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Recht.

Appendix

See the Tables 3, 4, 5, 6, 7, 8, 9, 10, 11.

Table 3 \(1{,}000 \times 1{,}000\)

Full size table

Table 4 \(1{,}000 \times 5{,}000\)

Full size table

Table 5 \(1{,}000 \times 10{,}000\)

Full size table

Table 6 \(10{,}000 \times 10{,}000\)

Full size table

Table 7 \(10{,}000 \times 50{,}000\)

Full size table

Table 8 \(10{,}000 \times 100{,}000\)

Full size table

Table 9 \(100{,}000 \times 100{,}000\)

Full size table

Table 10 \(100{,}000 \times 500{,}000\)

Full size table

Table 11 \(100{,}000 \times 1{,}000{,}000\)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Recht, B., Ré, C. Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5, 201–226 (2013). https://doi.org/10.1007/s12532-013-0053-8

Download citation

Received: 26 April 2011
Accepted: 19 March 2013
Published: 21 April 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s12532-013-0053-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel stochastic gradient algorithms for large-scale matrix completion

Abstract

Access this article

Similar content being viewed by others

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Parallel Nesterov’s method for large-scale minimization of partially separable functions

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Parallel stochastic gradient algorithms for large-scale matrix completion

Abstract

Access this article

Similar content being viewed by others

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Parallel Nesterov’s method for large-scale minimization of partially separable functions

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86 Platforms

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation