1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation

Article

Abstract

We focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method works well in practice. We study the performance of this variational approximation through PAC-Bayesian learning bounds. Contrary to previous works that focused on upper bounds on the estimation error of M with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present a complete analysis, a complete simulation study and a test on the MovieLens data set. We also discuss a variational approximation to deal with the logistic loss.

Keywords

Matrix completion PAC-Bayesian bounds Variational Bayes Supervised classification Risk convexification Oracle inequalities 

References

  1. Alquier, P., Cottet, V., & Lecué, G. (2017). Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions. arXiv preprint arXiv:1702.01402.
  2. Alquier, P., Ridgway, J., & Chopin, N. (June 2015). On the properties of variational approximations of Gibbs posteriors. arXiv e-prints.Google Scholar
  3. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.MATHGoogle Scholar
  4. Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford: OUP.CrossRefMATHGoogle Scholar
  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.CrossRefMATHGoogle Scholar
  6. Cai, T., & Zhou, W.-X. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. Journal of Machine Learning Research, 14, 3619–3647.MathSciNetMATHGoogle Scholar
  7. Candès, E. J., & Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6), 925–936.CrossRefGoogle Scholar
  8. Candès, E. J., & Recht, B. (2012). Exact matrix completion via convex optimization. Communications of the ACM, 55(6), 111–119.CrossRefMATHGoogle Scholar
  9. Candès, E. J., & Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5), 2053–2080.MathSciNetCrossRefMATHGoogle Scholar
  10. Catoni, O. (2004). Statistical learning theory and stochastic optimization. In J. Picard (Ed.), Saint-Flour Summer School on probability theory 2001., Lecture notes in mathematics Berlin: Springer.Google Scholar
  11. Catoni, O. (2007). PAC-Bayesian supervised classification: The thermodynamics of statistical learning (Vol. 56)., Institute of mathematical statistics lecture notes—Monograph series Beachwood, OH: Institute of Mathematical Statistics.MATHGoogle Scholar
  12. Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. Annals of Statistics, 43(1), 177–214.MathSciNetCrossRefMATHGoogle Scholar
  13. Dalalyan, A., & Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning, 72(1), 39–61.CrossRefGoogle Scholar
  14. Davenport, M. A., Plan, Y., van den Berg, E., & Wootters, M. (2014). 1-Bit matrix completion. Information and Inference, 3(3), 189–223.MathSciNetCrossRefMATHGoogle Scholar
  15. Herbrich, R., & Graepel, T. (2002). A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48(12), 3140–3150.MathSciNetCrossRefMATHGoogle Scholar
  16. Herbster, M., Pasteris, S., & Pontil, M. (2016). Mistake bounds for binary matrix completion. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, & R. Garnett (Eds.), Proceedings of the 29th conference on neural information processing systems (NIPS 2016). Barcelona, Spain: NIPS Proceedings.Google Scholar
  17. Hsieh, C.-J., Natarajan, N., & Dhillon, I. S. (2015). PU learning for matrix completion. In Proceedings of the 32nd international conference on machine learning, pp. 2445–2453.Google Scholar
  18. Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.CrossRefGoogle Scholar
  19. Klopp, O., Lafond, J., Moulines, É., & Salmon, J. (2015). Adaptive multinomial matrix completion. Electronic Journal of Statistics, 9(2), 2950–2975.MathSciNetCrossRefMATHGoogle Scholar
  20. Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2), 369–412.MathSciNetCrossRefMATHGoogle Scholar
  21. Latouche, P., Robin, S., & Ouadah, S. (2015). Goodness of fit of logistic models for random graphs. arXiv preprint arXiv:1508.00286.
  22. Lim, Y. J. & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop.Google Scholar
  23. Mai, T. T., & Alquier, P. (2015). A bayesian approach for noisy matrix completion: Optimal rate under general sampling distribution. Electronic Journal of Statistics, 9(1), 823–841.MathSciNetCrossRefMATHGoogle Scholar
  24. Mammen, E., & Tsybakov, A. (1999). Smooth discrimination analysis. The Annals of Statistics, 27(6), 1808–1829.MathSciNetCrossRefMATHGoogle Scholar
  25. Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(Aug), 2287–2322.MathSciNetMATHGoogle Scholar
  26. McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on computational learning theory (pp. 230–234). New York, ACM.Google Scholar
  27. Park, T., & Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681–686.MathSciNetCrossRefMATHGoogle Scholar
  28. Recht, B., & Ré, C. (2013). Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2), 201–226.MathSciNetCrossRefMATHGoogle Scholar
  29. Salakhutdinov, R. & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on machine learning, pp. 880–887.Google Scholar
  30. Seldin, Y., & Tishby, N. (2010). PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11(Dec), 3595–3646.MathSciNetMATHGoogle Scholar
  31. Seldin, Y., Laviolette, F., Cesa-Bianchi, N., Shawe-Taylor, J., & Auer, P. (2012). PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12), 7086–7093.MathSciNetCrossRefMATHGoogle Scholar
  32. Shawe-Taylor, J., & Langford, J. (2003). PAC-Bayes and margins. Advances in Neural Information Processing Systems, 15, 439.Google Scholar
  33. Srebro, N., Rennie, J., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Advances in neural information processing systems, pp. 1329–1336.Google Scholar
  34. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.MATHGoogle Scholar
  35. Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.CREST, ENSAEUniversité Paris SaclayParisFrance

Personalised recommendations