Mathematical Methods of Statistics

, Volume 26, Issue 1, pp 55–67 | Cite as

An oracle inequality for quasi-Bayesian nonnegative matrix factorization

  • P. AlquierEmail author
  • B. Guedj


The aim of this paper is to provide some theoretical understanding of quasi-Bayesian aggregation methods of nonnegative matrix factorization. We derive an oracle inequality for an aggregated estimator. This result holds for a very general class of prior distributions and shows how the prior affects the rate of convergence.


nonnegative matrix factorization oracle inequality PAC-Bayesian bounds 

2010 Mathematics Subject Classification

primary 62H99 secondary 62F35 68T05 65C05 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. I. Allen, L. Grosenick, and J. Taylor, “A Generalized Least-Square Matrix Decomposition”, J. Amer. Statist. Assoc. 109 (505), 145–159 (2014).MathSciNetCrossRefGoogle Scholar
  2. 2.
    P. Alquier, “Bayesian Methods for Low-Rank Matrix Estimation: Short Survey and Theoretical Study”, in Algorithmic Learning Theory 2013 (Springer, 2013), pp. 309–323.CrossRefGoogle Scholar
  3. 3.
    P. Alquier, V. Cottet, N. Chopin, and J. Rousseau, Bayesian Matrix Completion: Prior Specification, Preprint arXiv:1406.1440 (2014).Google Scholar
  4. 4.
    P. Alquier, J. Ridgway, and N. Chopin, On the Properties of Variational Approximations of Gibbs Posteriors, J. Machine Learning Res., 17 (239), 1–41 (2016).MathSciNetzbMATHGoogle Scholar
  5. 5.
    C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006), Chapter10.zbMATHGoogle Scholar
  6. 6.
    P. G. Bissiri, C. C. Holmes, and S. G. Walker, A General Framework for Updating Belief Distributions, J. Roy. Statist. Soc. Ser. B, 78 (5) (2016).Google Scholar
  7. 7.
    V. Bittorf, B. Recht, C. Re, and J. Tropp, “Factoring Nonnegative Matrices with Linear Programs”, in Advances in Neural Information Processing Systems (2012), pp. 1214–1222.Google Scholar
  8. 8.
    S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, Foundations and Trends in Machine Learning 3 (1), 1–122 (2011).CrossRefzbMATHGoogle Scholar
  9. 9.
    O. Catoni, A PAC-Bayesian Approach to Adaptive Classification, Preprint Laboratoire de Probabilités et Modèles Aléatoires, PMA-840 (2003).Google Scholar
  10. 10.
    O. Catoni, Statistical Learning Theory and Stochastic Optimization, in Saint-Flour Summer School on Probability Theory 2001, Ed. by Jean Picard (Springer, 2004).CrossRefzbMATHGoogle Scholar
  11. 11.
    O. Catoni, PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, in Inst. of Math. Statist. Lecture Notes—Monograph Series (IMS, Beachwood, OH, 2007), Vol.56.Google Scholar
  12. 12.
    A. T. Cemgil, “Bayesian Inference for Nonnegative Matrix Factorization Models”, Computational Intelligence and Neuroscience (2009).Google Scholar
  13. 13.
    J. Corander and M. Villani, “Bayesian Assessment of Dimensionality in Reduced Rank Regression”, Statistica Neerlandica 58, 255–270 (2004).MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    A. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting, Sharp PAC-Bayesian Bounds and Sparsity”, Machine Learning 72 (1–2), 39–61 (2008).CrossRefGoogle Scholar
  15. 15.
    A. S. Dalalyan and A. B. Tsybakov, “Aggregation by ExponentialWeighting and Sharp Oracle Inequalities”, in Lecture Notes in Computer Science, Vol. 4539: Learning Theory, Ed. by N. Bshouty and C. Gentile (Springer, Berlin–Heidelberg, 2007), pp. 97–111.Google Scholar
  16. 16.
    D. Donoho and V. Stodden, “When Does Nonnegative Matrix Factorization Give a Correct Decomposition into Parts?”, in Advances in Neural Information Processing Systems (2003).Google Scholar
  17. 17.
    C. Févotte, N. Bertin, and J.-L. Durrieu, “Nonnegative Matrix Factorization with the Itakura–Saito Divergence: With Application to Music Analysis”, Neural Computation 21 (3), 793–830 (2009).CrossRefzbMATHGoogle Scholar
  18. 18.
    I. Giulini, PAC-Bayesian Bounds for Principal Component Analysis in Hilbert Spaces, Preprint arXiv:1511.06263 (2015).Google Scholar
  19. 19.
    Y. Golubev and D. Ostrovski, “Concentration Inequalities for the Exponential Weighting Method”, Math. Methods Statist. 23 (1), 20–37 (2014).MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    N. Guan, D. Tao, Z. Luo, and B. Yuan, “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization”, IEEE Trans. on Signal Processing 60 (6), 2882–2898 (2012).MathSciNetCrossRefGoogle Scholar
  21. 21.
    B. Guedj and P. Alquier, “PAC-Bayesian Estimation and Prevision in Sparse AdditiveModels”, Electronic J. Statist, 7, 264–291 (2013).CrossRefzbMATHGoogle Scholar
  22. 22.
    B. Guedj and S. Robbiano, PAC-Bayesian High Dimensional Bipartite Ranking, Preprint arXiv: 1511.02729 (2015).Google Scholar
  23. 23.
    D. Guillamet and J. Vitria, “Classifying Faces withNonnegative Matrix Factorization”, in Proc. 5th Catalan Conference for Artificial Intelligence (2002), pp. 24–31.Google Scholar
  24. 24.
    M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An Introduction to Variational Methods for GraphicalModels”, Machine Learning 37, 183–233 (1999).CrossRefzbMATHGoogle Scholar
  25. 25.
    D. Kim, S. Sra, and I. S. Dhillon, “Fast Projection-BasedMethods for the Least SquaresNonnegativeMatrix Approximation Problem”, Statist. Analysis and Data Mining 1 (1), 38–51 (2008).CrossRefGoogle Scholar
  26. 26.
    Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems”, Computer 42 (8), 30–37 (2009).CrossRefGoogle Scholar
  27. 27.
    N. D. Lawrence and R. Urtasun, “Nonlinear Matrix Factorization with Gaussian Processes”, in Proc. 26th Annual Internat. Conf. on Machine Learning (ACM, 2009), pp. 601–608.Google Scholar
  28. 28.
    D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Nonnegative Matrix Factorization”, Nature 401 (6755), 788–791 (1999).CrossRefGoogle Scholar
  29. 29.
    D. D. Lee and H. S. Seung, “Algorithms for Nonnegative Matrix Factorization”, in Adv. in Neural Inform. Processing Systems (2001), pp. 556–562.Google Scholar
  30. 30.
    G. Leung and A. R. Barron, “Information Theory and Mixing Least-Squares Regressions”, IEEE Trans. Inform. Theory 52 (8), 3396–3410 (2006).MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    L. Li, B. Guedj, and S. Loustau, PAC-Bayesian Online Clustering, Preprint arXiv:1602.00522 (2016).Google Scholar
  32. 32.
    Y. J. Lim and Y. W. Teh, “Variational Bayesian Approach to Movie Rating Prediction”, in Proc. KDD Cup and Workshop (2007), Vol. 7, pp. 5–21.Google Scholar
  33. 33.
    C.-J. Lin, “Projected Gradient Methods for Nonnegative Matrix Factorization”, Neural Computation 19 (10), 2756–2779 (2007).MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    D. J. C. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2002).Google Scholar
  35. 35.
    T. T. Mai and P. Alquier, “A Bayesian Approach for Matrix Completion: Optimal Rates under General Sampling Distributions”, Electronic J. Statist. 9, 823–841 (2015).MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    D. McAllester, “Some PAC-Bayesian Theorems”, in Proc. 11th Annual Conf. on Comput. Learning Theory (ACM, New York, 1998), pp. 230–234.Google Scholar
  37. 37.
    S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret, “Separation of Nonnegative Mixture of Nonnegative Sources Using a Bayesian Approach and MCMC Sampling”, IEEE Trans. on Signal Processing 54 (11), 4133–4145 (2006).CrossRefGoogle Scholar
  38. 38.
    A. Ozerov and C. Févotte, “Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation”, IEEE Trans. on Audio, Speech, and Language Processing 18 (3), 550–563 (2010).CrossRefGoogle Scholar
  39. 39.
    J. Paisley, D. Blei, and M. I. Jordan, Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference, in Handbook of Mixed Membership Models and Their Applications (Chapman and Hall/CRC, 2015), Chapter11.Google Scholar
  40. 40.
    R. Salakhutdinov and A. Mnih, “Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo”, in Proc. 25th Internat. Conf. on Machine Learning (ACM, 2008), pp. 880–887.Google Scholar
  41. 41.
    M. N. Schmidt, O. Winther, and L. K. Hansen, “Bayesian Nonnegative Matrix Factorization”, in Independent Component Analysis and Signal Separation (Springer, 2009), pp. 540–547.CrossRefGoogle Scholar
  42. 42.
    F. Shahnaz, M.W. Berry, V. P. Pauca, and R. J. Plemmons, “Document Clustering UsingNonnegativeMatrix Factorization”, Inform. Processing & Management 42 (2), 373–386 (2006).CrossRefzbMATHGoogle Scholar
  43. 43.
    J. Shawe-Taylor and R. Williamson, “A PAC Analysis of a Bayes Estimator”, in Proc. 10th Annual Conf. on Comput. Learning Theory (ACM, New York, 1997), pp. 2–9.Google Scholar
  44. 44.
    T. Suzuki, “Convergence Rate of Bayesian Tensor Estimator and Its Minimax Optimality”, in Proc. 32nd Internat. Conf. on Machine Learning (Lille, 2015) (2015), pp. 1273–1282.Google Scholar
  45. 45.
    V. Y. Tan and C. Févotte, “Automatic Relevance Determination in Nonnegative Matrix Factorization”, in SPARS’09-Signal Processing with Adaptive Sparse Structured Representations (2009).Google Scholar
  46. 46.
    W. Xu, X. Liu, and Y. Gong, “Document Clustering Based on Nonnegative Matrix Factorization”, in Proc. 26th Annual Internat. ACM SIGIR Conf. on Research and Development in Inform. Retrieval (ACM, 2003), pp. 267–273.Google Scholar
  47. 47.
    Y. Xu and W. Yin, “A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion”, SIAMJ. on Imaging Sci. 6 (3), 1758–1789 (2013).MathSciNetCrossRefzbMATHGoogle Scholar
  48. 48.
    Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An Alternating Direction Algorithm for Matrix Completion with Nonnegative Factors”, Frontiers of Mathematics in China 7 (2), 365–384 (2012).MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    M. Zhong and M. Girolami, “Reversible Jump MCMC for Nonnegative Matrix Factorization”, in Internat. Conf. Artificial Intelligence and Statist. (2009), pp. 663–670.Google Scholar
  50. 50.
    M. Zhou, C. Wang, M. Chen, J. Paisley, D. Dunson, and L. Carin, Nonparametric Bayesian Matrix Completion, in Proc. IEEE SAM (2010).Google Scholar

Copyright information

© Allerton Press, Inc. 2017

Authors and Affiliations

  1. 1.CREST, ENSAEUniv. Paris SaclayParisFrance
  2. 2.Modal Project-TeamInria Lille – Nord Europe Research CenterParisFrance

Personalised recommendations