Abstract
This paper develops Jellyfish, an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and least-squares problems regularized by the nuclear norm or \(\gamma _2\)-norm. Jellyfish implements a projected incremental gradient method with a biased, random ordering of the increments. This biased ordering allows for a parallel implementation that admits a speed-up nearly proportional to the number of processors. On large-scale matrix completion tasks, Jellyfish is orders of magnitude more efficient than existing codes. For example, on the Netflix Prize data set, prior art computes rating predictions in approximately 4 h, while Jellyfish solves the same problem in under 3 min on a 12 core workstation.
Similar content being viewed by others
Notes
We were also unable to find any implementations of similar algorithms written in a low-level language such as C. While coding directly in C or Fortran would likely yield a version of NNLS which is considerably faster than the Matlab version, we doubt that it would yield the hundred-fold speedups necessary to be competitive with Jellyfish.
References
Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35(4), 787–803 (2006)
Bach, F.R., Marial, J., Ponce, J.: Convex sparse matrix factorizations (2008). Preprint available at http://arxiv.org/abs/0812.1869
Balzano, L., Nowak, R., Recht, B.: Online identification and tracking of subspaces from highly incomplete information. In: Proceedings of the 48th Annual Allerton Conference (2010)
Bertsekas, D.P.: A hybrid incremental gradient method for least squares. SIAM J. Optim. 7, 913–925 (1997)
Bertsekas, D.P.: Nonlinear Program., 2nd edn. Athena Scientific, Belmont (1999)
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. Neural Information Processing Systems (2008)
Burer, S., Monteiro, R.D.C.: Local minima and convergence in low-rank semidefinite programming. Math. Program. 103(3), 427–444 (2005)
Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2008)
Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)
Candès, E.J., Tao, T.: The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Sim. 4(4), 1168–1200 (2005)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Mathe. Program. Ser. A 91, 201–213 (2002)
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (1999)
Funk, S.: Netflix update: Try this at home (2006). http://sifter.org/simon/journal/20061211.html
Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57, 1548–1566 (2011)
Jameson, G.J.O.: Summing and Nuclear Norms in Banach Space Theory. No. 8 in London Mathematical Society Student Texts. Cambridge University Press, Cambridge, UK (1987)
Ji, S., Ye, J.: An accelerated gradient method for trace norm minimization. Proceedings of the ICML (2009)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2009)
Knuth, D.E.: The Art of Computer Programming, 2nd edn. Addison-Wesley Professional, Boston (1998)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Lee, J., Recht, B., Srebro, N., Salakhutdinov, R.R., Tropp, J.A.: Practical large-scale optimization for max-norm regularization. In: Advances in Neural Information Processing Systems (2010)
Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31(3), 1235–1256 (2009)
Luo, Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3(2), 226–245 (1991)
Luo, Z.Q., Tseng, P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4, 85–101 (1994)
Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming pp. 1–33 (2009). Published online first at http://dx.doi.org/10.1007/s10107-009-0306-5
Mangasarian, O.L., Solodov, M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)
Nedic A., Bertsekas, D.P.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P.M. (eds) Stochastic Optimization: Algorithms and Applications, Kluwer Academic Publishers, pp. 263–304 (2000)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Nesterov, Y.: Gradient methods for minimizing composite functions. Tech. rep., CORE Discussion Paper (2007). Preprint available at http://www.optimization-online.org/DB_HTML/2007/09/1784.html
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. Proceedings of the International Conference of Machine Learning (2005)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems (2008)
Srebro, N., Rennie, J., Jaakkola, T.: Maximum margin matrix factorization. In: Advances in Neural Information Processing Systems (2004)
Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: 18th Annual Conference on Learning Theory (COLT) (2005)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(1), 136–144 (2002)
Toh, K.C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacific J. Math. 6, 615–640 (2010)
Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998)
Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Tech. rep., Rice University, CAAM Technical, Report TR10-07 (2010)
Acknowledgments
This work was supported in part by ONR Contract N00014-11-M-0478. BR is additionally supported by ONR award N00014-11-1-0723 and NSF award CCF-1139953. CR is additionally supported by the Air Force Research Laboratory (AFRL) under prime contract no. FA8750-09-C-0181, the NSF CAREER award under IIS-1054009, ONR award N000141210041, and gifts or research awards from Google, Greenplum, Johnson Controls, Inc., LogicBlox, and Oracle. Any opinions, findings, and conclusion or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of any of the above sponsors including DARPA, AFRL, or the US government.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Recht, B., Ré, C. Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5, 201–226 (2013). https://doi.org/10.1007/s12532-013-0053-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-013-0053-8