1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation

Article
  • 114 Downloads

Abstract

We focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method works well in practice. We study the performance of this variational approximation through PAC-Bayesian learning bounds. Contrary to previous works that focused on upper bounds on the estimation error of M with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present a complete analysis, a complete simulation study and a test on the MovieLens data set. We also discuss a variational approximation to deal with the logistic loss.

Keywords

Matrix completion PAC-Bayesian bounds Variational Bayes Supervised classification Risk convexification Oracle inequalities 

Notes

Acknowledgements

We would like to thank Vincent Cottet’s Ph.D. supervisor Professor Nicolas Chopin, for his kind support during the project and the three anonymous referees for their helpful and constructive comments.

References

  1. Alquier, P., Cottet, V., & Lecué, G. (2017). Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions. arXiv preprint arXiv:1702.01402.
  2. Alquier, P., Ridgway, J., & Chopin, N. (June 2015). On the properties of variational approximations of Gibbs posteriors. arXiv e-prints.Google Scholar
  3. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.MATHGoogle Scholar
  4. Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford: OUP.CrossRefMATHGoogle Scholar
  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.CrossRefMATHGoogle Scholar
  6. Cai, T., & Zhou, W.-X. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. Journal of Machine Learning Research, 14, 3619–3647.MathSciNetMATHGoogle Scholar
  7. Candès, E. J., & Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6), 925–936.CrossRefGoogle Scholar
  8. Candès, E. J., & Recht, B. (2012). Exact matrix completion via convex optimization. Communications of the ACM, 55(6), 111–119.CrossRefMATHGoogle Scholar
  9. Candès, E. J., & Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5), 2053–2080.MathSciNetCrossRefMATHGoogle Scholar
  10. Catoni, O. (2004). Statistical learning theory and stochastic optimization. In J. Picard (Ed.), Saint-Flour Summer School on probability theory 2001., Lecture notes in mathematics Berlin: Springer.Google Scholar
  11. Catoni, O. (2007). PAC-Bayesian supervised classification: The thermodynamics of statistical learning (Vol. 56)., Institute of mathematical statistics lecture notes—Monograph series Beachwood, OH: Institute of Mathematical Statistics.MATHGoogle Scholar
  12. Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. Annals of Statistics, 43(1), 177–214.MathSciNetCrossRefMATHGoogle Scholar
  13. Dalalyan, A., & Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning, 72(1), 39–61.CrossRefGoogle Scholar
  14. Davenport, M. A., Plan, Y., van den Berg, E., & Wootters, M. (2014). 1-Bit matrix completion. Information and Inference, 3(3), 189–223.MathSciNetCrossRefMATHGoogle Scholar
  15. Herbrich, R., & Graepel, T. (2002). A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48(12), 3140–3150.MathSciNetCrossRefMATHGoogle Scholar
  16. Herbster, M., Pasteris, S., & Pontil, M. (2016). Mistake bounds for binary matrix completion. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, & R. Garnett (Eds.), Proceedings of the 29th conference on neural information processing systems (NIPS 2016). Barcelona, Spain: NIPS Proceedings.Google Scholar
  17. Hsieh, C.-J., Natarajan, N., & Dhillon, I. S. (2015). PU learning for matrix completion. In Proceedings of the 32nd international conference on machine learning, pp. 2445–2453.Google Scholar
  18. Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.CrossRefGoogle Scholar
  19. Klopp, O., Lafond, J., Moulines, É., & Salmon, J. (2015). Adaptive multinomial matrix completion. Electronic Journal of Statistics, 9(2), 2950–2975.MathSciNetCrossRefMATHGoogle Scholar
  20. Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2), 369–412.MathSciNetCrossRefMATHGoogle Scholar
  21. Latouche, P., Robin, S., & Ouadah, S. (2015). Goodness of fit of logistic models for random graphs. arXiv preprint arXiv:1508.00286.
  22. Lim, Y. J. & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop.Google Scholar
  23. Mai, T. T., & Alquier, P. (2015). A bayesian approach for noisy matrix completion: Optimal rate under general sampling distribution. Electronic Journal of Statistics, 9(1), 823–841.MathSciNetCrossRefMATHGoogle Scholar
  24. Mammen, E., & Tsybakov, A. (1999). Smooth discrimination analysis. The Annals of Statistics, 27(6), 1808–1829.MathSciNetCrossRefMATHGoogle Scholar
  25. Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11(Aug), 2287–2322.MathSciNetMATHGoogle Scholar
  26. McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on computational learning theory (pp. 230–234). New York, ACM.Google Scholar
  27. Park, T., & Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681–686.MathSciNetCrossRefMATHGoogle Scholar
  28. Recht, B., & Ré, C. (2013). Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2), 201–226.MathSciNetCrossRefMATHGoogle Scholar
  29. Salakhutdinov, R. & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on machine learning, pp. 880–887.Google Scholar
  30. Seldin, Y., & Tishby, N. (2010). PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11(Dec), 3595–3646.MathSciNetMATHGoogle Scholar
  31. Seldin, Y., Laviolette, F., Cesa-Bianchi, N., Shawe-Taylor, J., & Auer, P. (2012). PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12), 7086–7093.MathSciNetCrossRefMATHGoogle Scholar
  32. Shawe-Taylor, J., & Langford, J. (2003). PAC-Bayes and margins. Advances in Neural Information Processing Systems, 15, 439.Google Scholar
  33. Srebro, N., Rennie, J., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Advances in neural information processing systems, pp. 1329–1336.Google Scholar
  34. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.MATHGoogle Scholar
  35. Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.CREST, ENSAEUniversité Paris SaclayParisFrance

Personalised recommendations