1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation
We focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method works well in practice. We study the performance of this variational approximation through PAC-Bayesian learning bounds. Contrary to previous works that focused on upper bounds on the estimation error of M with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present a complete analysis, a complete simulation study and a test on the MovieLens data set. We also discuss a variational approximation to deal with the logistic loss.
KeywordsMatrix completion PAC-Bayesian bounds Variational Bayes Supervised classification Risk convexification Oracle inequalities
- Alquier, P., Cottet, V., & Lecué, G. (2017). Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions. arXiv preprint arXiv:1702.01402.
- Alquier, P., Ridgway, J., & Chopin, N. (June 2015). On the properties of variational approximations of Gibbs posteriors. arXiv e-prints.Google Scholar
- Catoni, O. (2004). Statistical learning theory and stochastic optimization. In J. Picard (Ed.), Saint-Flour Summer School on probability theory 2001., Lecture notes in mathematics Berlin: Springer.Google Scholar
- Herbster, M., Pasteris, S., & Pontil, M. (2016). Mistake bounds for binary matrix completion. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, & R. Garnett (Eds.), Proceedings of the 29th conference on neural information processing systems (NIPS 2016). Barcelona, Spain: NIPS Proceedings.Google Scholar
- Hsieh, C.-J., Natarajan, N., & Dhillon, I. S. (2015). PU learning for matrix completion. In Proceedings of the 32nd international conference on machine learning, pp. 2445–2453.Google Scholar
- Latouche, P., Robin, S., & Ouadah, S. (2015). Goodness of fit of logistic models for random graphs. arXiv preprint arXiv:1508.00286.
- Lim, Y. J. & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop.Google Scholar
- McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on computational learning theory (pp. 230–234). New York, ACM.Google Scholar
- Salakhutdinov, R. & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on machine learning, pp. 880–887.Google Scholar
- Shawe-Taylor, J., & Langford, J. (2003). PAC-Bayes and margins. Advances in Neural Information Processing Systems, 15, 439.Google Scholar
- Srebro, N., Rennie, J., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Advances in neural information processing systems, pp. 1329–1336.Google Scholar