Machine Learning

, Volume 106, Issue 3, pp 419–457 | Cite as

Online optimization for max-norm regularization

Article

Abstract

The max-norm regularizer has been extensively studied in the last decade as it promotes an effective low-rank estimation for the underlying data. However, such max-norm regularized problems are typically formulated and solved in a batch manner, which prevents it from processing big data due to possible memory bottleneck. In this paper, hence, we propose an online algorithm that is scalable to large problems. In particular, we consider the matrix decomposition problem as an example, although a simple variant of the algorithm and analysis can be adapted to other important problems such as matrix completion. The crucial technique in our implementation is to reformulate the max-norm to an equivalent matrix factorization form, where the factors consist of a (possibly overcomplete) basis component and a coefficients one. In this way, we may maintain the basis component in the memory and optimize over it and the coefficients for each sample alternatively. Since the size of the basis component is independent of the sample size, our algorithm is appealing when manipulating a large collection of samples. We prove that the sequence of the solutions (i.e., the basis component) produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Numerical study demonstrates encouraging results for the robustness of our algorithm compared to the widely used nuclear norm solvers.

Keywords

Low-rank matrix Max-norm Stochastic optimization Matrix factorization 

References

  1. Artač, M., Jogan, M., & Leonardis, A. (2002). Incremental PCA for on-line visual learning and recognition. In Proceedings of the 16th international conference on pattern recognition (Vol. 3, pp. 781–784).Google Scholar
  2. Bertsekas, D. P. (1999). Nonlinear programming. Massachusetts: Athena Scientific.Google Scholar
  3. Bhojanapalli, S., Kyrillidis, A., & Sanghavi, S. (2016). Dropping convexity for faster semi-definite optimization. In Proceedings of the 29th conference on learning theory (pp. 530–582).Google Scholar
  4. Bonnans, J. F., & Shapiro, A. (1998). Optimization problems with perturbations: A guided tour. SIAM Review, 40(2), 228–264.MathSciNetCrossRefMATHGoogle Scholar
  5. Bottou, L. (1998). Online learning and stochastic approximations. On-line Learning in Neural Networks, 17(9), 142.MATHGoogle Scholar
  6. Burer, S., & Monteiro, R. D. C. (2005). Local minima and convergence in low-rank semidefinite programming. Mathematical Programming, 103(3), 427–444.MathSciNetCrossRefMATHGoogle Scholar
  7. Cai, J., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.MathSciNetCrossRefMATHGoogle Scholar
  8. Cai, T. T., & Zhou, W. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. Journal of Machine Learning Research, 14(1), 3619–3647.MathSciNetMATHGoogle Scholar
  9. Cai, T. T., & Zhou, W. X. (2016). Matrix completion via max-norm constrained optimization. Electronic Journal of Statistics, 10(1), 1493–1525.MathSciNetCrossRefMATHGoogle Scholar
  10. Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11:1–11:37.Google Scholar
  11. Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772.MathSciNetCrossRefMATHGoogle Scholar
  12. Davenport, M. A., Plan, Y., van den Berg, E., & Wootters, M. (2014). 1-Bit matrix completion. Information and Inference, 3(3), 189–223.MathSciNetCrossRefMATHGoogle Scholar
  13. Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.MathSciNetCrossRefMATHGoogle Scholar
  14. Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American control conference (Vol. 6, pp. 4734–4739).Google Scholar
  15. Feng, J., Xu, H., & Yan, S. (2013). Online robust PCA via stochastic optimization. In Proceedings of the 27th annual conference on neural information processing systems (pp. 404–412).Google Scholar
  16. Foygel, R., Srebro, N., & Salakhutdinov, R. (2012). Matrix reconstruction with the local max norm. In Proceedings of the 26th annual conference on neural information processing systems (pp. 944–952).Google Scholar
  17. Jalali, A., & Srebro, N. (2012). Clustering using max-norm constrained optimization. In Proceedings of the 29th international conference on machine learning.Google Scholar
  18. Jolliffe, I. (2005). Principal component analysis. Hoboken: Wiley Online Library.CrossRefMATHGoogle Scholar
  19. Klopp, O. (2014). Noisy low-rank matrix completion with general sampling distribution. Bernoulli, 20(1), 282–303.MathSciNetCrossRefMATHGoogle Scholar
  20. Lee, J. D., Recht, B., Salakhutdinov, R., Srebro, N., & Tropp, J. A. (2010). Practical large-scale optimization for max-norm regularization. In Proceedings of the 24th annual conference on neural information processing systems (pp. 1297–1305).Google Scholar
  21. Liu, G., Lin, Z., & Yu, Y. (2010). Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (pp. 663–670).Google Scholar
  22. Mairal, J. (2013). Stochastic majorization-minimization algorithms for large-scale optimization. In Proceedings of the 27th annual conference on neural information processing systems (pp. 2283–2291).Google Scholar
  23. Mairal, J., Bach, F. R., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.MathSciNetMATHGoogle Scholar
  24. Neyshabur, B., Makarychev, Y., & Srebro, N. (2014). Clustering, hamming embedding, generalized LSH and the max norm. In Proceedings of the 25th international conference on algorithmic learning theory (pp. 306–320).Google Scholar
  25. Orabona, F., Argyriou, A., & Srebro, N. (2012). PRISMA: PRoximal Iterative SMoothing Algorithm. CoRR abs/1206.2372.Google Scholar
  26. Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.MathSciNetCrossRefMATHGoogle Scholar
  27. Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd international conference on machine learning (pp. 713–719).Google Scholar
  28. Salakhutdinov, R., & Srebro, N. (2010). Collaborative filtering in a non-uniform world: Learning with the weighted trace norm. In Proceedings of the 24th annual conference on neural information processing systems (pp. 2056–2064).Google Scholar
  29. Shen, J., Xu, H., & Li, P. (2014). Online optimization for max-norm regularization. In Proceedings of the 28th annual conference on neural information processing systems (pp. 1718–1726).Google Scholar
  30. Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Proceedings of the 18th annual conference on neural information processing systems (pp. 1329–1336).Google Scholar
  31. Srebro, N., & Shraibman, A. (2005). Rank, trace-norm and max-norm. In Proceedings of the 18th annual conference on learning theory (pp. 545–560).Google Scholar
  32. Van der Vaart, A. W. (2000). Asymptotic statistics (Vol. 3). Cambridge: Cambridge University Press.MATHGoogle Scholar
  33. Wang, H., & Banerjee, A. (2014). Randomized block coordinate descent for online and stochastic optimization. CoRR abs/1407.0107.Google Scholar
  34. Xu, H., Caramanis, C., & Mannor, S. (2010). Principal component analysis with contaminated data: The high dimensional case. In Proceedings of the 23rd conference on learning theory (pp. 490–502).Google Scholar
  35. Xu, H., Caramanis, C., & Mannor, S. (2013). Outlier-robust PCA: The high-dimensional case. IEEE Transactions on Information Theory, 59(1), 546–572.MathSciNetCrossRefGoogle Scholar
  36. Xu, H., Caramanis, C., & Sanghavi, S. (2012). Robust PCA via outlier pursuit. IEEE Transactions on Information Theory, 58(5), 3047–3064.MathSciNetCrossRefGoogle Scholar
  37. Zhou, Z., Li, X., Wright, J., Candès, E. J., & Ma, Y. (2010). Stable principal component pursuit. In Proceedings of the 2010 IEEE international symposium on information theory (pp. 1518–1522).Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceRutgers UniversityPiscatawayUSA
  2. 2.H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlantaUSA
  3. 3.Department of Statistics and BiostatisticsRutgers UniversityPiscatawayUSA

Personalised recommendations