An oracle inequality for quasi-Bayesian nonnegative matrix factorization


The aim of this paper is to provide some theoretical understanding of quasi-Bayesian aggregation methods of nonnegative matrix factorization. We derive an oracle inequality for an aggregated estimator. This result holds for a very general class of prior distributions and shows how the prior affects the rate of convergence.

This is a preview of subscription content, access via your institution.


  1. 1.

    G. I. Allen, L. Grosenick, and J. Taylor, “A Generalized Least-Square Matrix Decomposition”, J. Amer. Statist. Assoc. 109 (505), 145–159 (2014).

    MathSciNet  Article  Google Scholar 

  2. 2.

    P. Alquier, “Bayesian Methods for Low-Rank Matrix Estimation: Short Survey and Theoretical Study”, in Algorithmic Learning Theory 2013 (Springer, 2013), pp. 309–323.

    Google Scholar 

  3. 3.

    P. Alquier, V. Cottet, N. Chopin, and J. Rousseau, Bayesian Matrix Completion: Prior Specification, Preprint arXiv:1406.1440 (2014).

    Google Scholar 

  4. 4.

    P. Alquier, J. Ridgway, and N. Chopin, On the Properties of Variational Approximations of Gibbs Posteriors, J. Machine Learning Res., 17 (239), 1–41 (2016).

    MathSciNet  MATH  Google Scholar 

  5. 5.

    C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006), Chapter10.

    Google Scholar 

  6. 6.

    P. G. Bissiri, C. C. Holmes, and S. G. Walker, A General Framework for Updating Belief Distributions, J. Roy. Statist. Soc. Ser. B, 78 (5) (2016).

    Google Scholar 

  7. 7.

    V. Bittorf, B. Recht, C. Re, and J. Tropp, “Factoring Nonnegative Matrices with Linear Programs”, in Advances in Neural Information Processing Systems (2012), pp. 1214–1222.

    Google Scholar 

  8. 8.

    S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, Foundations and Trends in Machine Learning 3 (1), 1–122 (2011).

    Article  MATH  Google Scholar 

  9. 9.

    O. Catoni, A PAC-Bayesian Approach to Adaptive Classification, Preprint Laboratoire de Probabilités et Modèles Aléatoires, PMA-840 (2003).

    Google Scholar 

  10. 10.

    O. Catoni, Statistical Learning Theory and Stochastic Optimization, in Saint-Flour Summer School on Probability Theory 2001, Ed. by Jean Picard (Springer, 2004).

    Google Scholar 

  11. 11.

    O. Catoni, PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, in Inst. of Math. Statist. Lecture Notes—Monograph Series (IMS, Beachwood, OH, 2007), Vol.56.

  12. 12.

    A. T. Cemgil, “Bayesian Inference for Nonnegative Matrix Factorization Models”, Computational Intelligence and Neuroscience (2009).

    Google Scholar 

  13. 13.

    J. Corander and M. Villani, “Bayesian Assessment of Dimensionality in Reduced Rank Regression”, Statistica Neerlandica 58, 255–270 (2004).

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    A. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting, Sharp PAC-Bayesian Bounds and Sparsity”, Machine Learning 72 (1–2), 39–61 (2008).

    Article  Google Scholar 

  15. 15.

    A. S. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting and Sharp Oracle Inequalities”, in Lecture Notes in Computer Science, Vol. 4539: Learning Theory, Ed. by N. Bshouty and C. Gentile (Springer, Berlin–Heidelberg, 2007), pp. 97–111.

    Google Scholar 

  16. 16.

    D. Donoho and V. Stodden, “When Does Nonnegative Matrix Factorization Give a Correct Decomposition into Parts?”, in Advances in Neural Information Processing Systems (2003).

    Google Scholar 

  17. 17.

    C. Févotte, N. Bertin, and J.-L. Durrieu, “Nonnegative Matrix Factorization with the Itakura–Saito Divergence: With Application to Music Analysis”, Neural Computation 21 (3), 793–830 (2009).

    Article  MATH  Google Scholar 

  18. 18.

    I. Giulini, PAC-Bayesian Bounds for Principal Component Analysis in Hilbert Spaces, Preprint arXiv:1511.06263 (2015).

    Google Scholar 

  19. 19.

    Y. Golubev and D. Ostrovski, “Concentration Inequalities for the Exponential Weighting Method”, Math. Methods Statist. 23 (1), 20–37 (2014).

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    N. Guan, D. Tao, Z. Luo, and B. Yuan, “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization”, IEEE Trans. on Signal Processing 60 (6), 2882–2898 (2012).

    MathSciNet  Article  Google Scholar 

  21. 21.

    B. Guedj and P. Alquier, “PAC-Bayesian Estimation and Prevision in Sparse AdditiveModels”, Electronic J. Statist, 7, 264–291 (2013).

    Article  MATH  Google Scholar 

  22. 22.

    B. Guedj and S. Robbiano, PAC-Bayesian High Dimensional Bipartite Ranking, Preprint arXiv: 1511.02729 (2015).

    Google Scholar 

  23. 23.

    D. Guillamet and J. Vitria, “Classifying Faces withNonnegative Matrix Factorization”, in Proc. 5th Catalan Conference for Artificial Intelligence (2002), pp. 24–31.

    Google Scholar 

  24. 24.

    M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An Introduction to Variational Methods for GraphicalModels”, Machine Learning 37, 183–233 (1999).

    Article  MATH  Google Scholar 

  25. 25.

    D. Kim, S. Sra, and I. S. Dhillon, “Fast Projection-BasedMethods for the Least SquaresNonnegativeMatrix Approximation Problem”, Statist. Analysis and Data Mining 1 (1), 38–51 (2008).

    Article  Google Scholar 

  26. 26.

    Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems”, Computer 42 (8), 30–37 (2009).

    Article  Google Scholar 

  27. 27.

    N. D. Lawrence and R. Urtasun, “Nonlinear Matrix Factorization with Gaussian Processes”, in Proc. 26th Annual Internat. Conf. on Machine Learning (ACM, 2009), pp. 601–608.

    Google Scholar 

  28. 28.

    D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Nonnegative Matrix Factorization”, Nature 401 (6755), 788–791 (1999).

    Article  Google Scholar 

  29. 29.

    D. D. Lee and H. S. Seung, “Algorithms for Nonnegative Matrix Factorization”, in Adv. in Neural Inform. Processing Systems (2001), pp. 556–562.

    Google Scholar 

  30. 30.

    G. Leung and A. R. Barron, “Information Theory and Mixing Least-Squares Regressions”, IEEE Trans. Inform. Theory 52 (8), 3396–3410 (2006).

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    L. Li, B. Guedj, and S. Loustau, PAC-Bayesian Online Clustering, Preprint arXiv:1602.00522 (2016).

    Google Scholar 

  32. 32.

    Y. J. Lim and Y. W. Teh, “Variational Bayesian Approach to Movie Rating Prediction”, in Proc. KDD Cup and Workshop (2007), Vol. 7, pp. 5–21.

    Google Scholar 

  33. 33.

    C.-J. Lin, “Projected Gradient Methods for Nonnegative Matrix Factorization”, Neural Computation 19 (10), 2756–2779 (2007).

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    D. J. C. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2002).

    Google Scholar 

  35. 35.

    T. T. Mai and P. Alquier, “A Bayesian Approach for Matrix Completion: Optimal Rates under General Sampling Distributions”, Electronic J. Statist. 9, 823–841 (2015).

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    D. McAllester, “Some PAC-Bayesian Theorems”, in Proc. 11th Annual Conf. on Comput. Learning Theory (ACM, New York, 1998), pp. 230–234.

    Google Scholar 

  37. 37.

    S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret, “Separation of Nonnegative Mixture of Nonnegative Sources Using a Bayesian Approach and MCMC Sampling”, IEEE Trans. on Signal Processing 54 (11), 4133–4145 (2006).

    Article  Google Scholar 

  38. 38.

    A. Ozerov and C. Févotte, “Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation”, IEEE Trans. on Audio, Speech, and Language Processing 18 (3), 550–563 (2010).

    Article  Google Scholar 

  39. 39.

    J. Paisley, D. Blei, and M. I. Jordan, Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference, in Handbook of Mixed Membership Models and Their Applications (Chapman and Hall/CRC, 2015), Chapter11.

    Google Scholar 

  40. 40.

    R. Salakhutdinov and A. Mnih, “Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo”, in Proc. 25th Internat. Conf. on Machine Learning (ACM, 2008), pp. 880–887.

    Google Scholar 

  41. 41.

    M. N. Schmidt, O. Winther, and L. K. Hansen, “Bayesian Nonnegative Matrix Factorization”, in Independent Component Analysis and Signal Separation (Springer, 2009), pp. 540–547.

    Google Scholar 

  42. 42.

    F. Shahnaz, M.W. Berry, V. P. Pauca, and R. J. Plemmons, “Document Clustering UsingNonnegativeMatrix Factorization”, Inform. Processing & Management 42 (2), 373–386 (2006).

    Article  MATH  Google Scholar 

  43. 43.

    J. Shawe-Taylor and R. Williamson, “A PAC Analysis of a Bayes Estimator”, in Proc. 10th Annual Conf. on Comput. Learning Theory (ACM, New York, 1997), pp. 2–9.

    Google Scholar 

  44. 44.

    T. Suzuki, “Convergence Rate of Bayesian Tensor Estimator and Its Minimax Optimality”, in Proc. 32nd Internat. Conf. on Machine Learning (Lille, 2015) (2015), pp. 1273–1282.

    Google Scholar 

  45. 45.

    V. Y. Tan and C. Févotte, “Automatic Relevance Determination in Nonnegative Matrix Factorization”, in SPARS’09-Signal Processing with Adaptive Sparse Structured Representations (2009).

    Google Scholar 

  46. 46.

    W. Xu, X. Liu, and Y. Gong, “Document Clustering Based on Nonnegative Matrix Factorization”, in Proc. 26th Annual Internat. ACM SIGIR Conf. on Research and Development in Inform. Retrieval (ACM, 2003), pp. 267–273.

    Google Scholar 

  47. 47.

    Y. Xu and W. Yin, “A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion”, SIAMJ. on Imaging Sci. 6 (3), 1758–1789 (2013).

    MathSciNet  Article  MATH  Google Scholar 

  48. 48.

    Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors”, Frontiers of Mathematics in China 7 (2), 365–384 (2012).

    MathSciNet  Article  MATH  Google Scholar 

  49. 49.

    M. Zhong and M. Girolami, “Reversible Jump MCMC for Nonnegative Matrix Factorization”, in Internat. Conf. Artificial Intelligence and Statist. (2009), pp. 663–670.

    Google Scholar 

  50. 50.

    M. Zhou, C. Wang, M. Chen, J. Paisley, D. Dunson, and L. Carin, Nonparametric Bayesian Matrix Completion, in Proc. IEEE SAM (2010).

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to P. Alquier.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alquier, P., Guedj, B. An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Math. Meth. Stat. 26, 55–67 (2017).

Download citation


  • nonnegative matrix factorization
  • oracle inequality
  • PAC-Bayesian bounds

2010 Mathematics Subject Classification

  • primary 62H99
  • secondary 62F35
  • 68T05
  • 65C05