Dictionary filtering: a probabilistic approach to online matrix factorisation

  • Ömer Deniz AkyildizEmail author
  • Joaquín Míguez
Original Paper


This paper investigates a link between matrix factorisation algorithms and recursive linear filters. In particular, we describe a probabilistic model in which sequential inference naturally leads to a matrix factorisation procedure. Using this probabilistic model, we derive a matrix-variate recursive linear filter that can be run efficiently in high-dimensional settings and leads to the factorisation of the data matrix into a dictionary matrix and a coefficient matrix. The resulting algorithm, referred to as the dictionary filter, is inherently online and has easy-to-tune parameters. We provide an extension of the proposed method for the cases where the dataset of interest is time-varying and nonstationary, thereby showing the adaptability of the proposed framework to non-standard problem settings. Numerical results, which are provided for image restoration and video modelling problems, demonstrate that the proposed method is a viable alternative to existing methods.


Online matrix factorisation Kalman filtering Stochastic optimisation 



  1. 1.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  2. 2.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562, (2001)Google Scholar
  3. 3.
    Lin, C.-J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 1(8), 30–37 (2009)CrossRefGoogle Scholar
  5. 5.
    Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Neural Information Processing Systems (NIPS) Conference, vol. 1, pp. 2–1, (2007)Google Scholar
  6. 6.
    Schmidt, M.N., Winther, O., Hansen, L.K.: Bayesian non-negative matrix factorization. In: International Conference on Independent Component Analysis and Signal Separation, pp. 540–547. Springer, (2009)Google Scholar
  7. 7.
    Cemgil, A.T.: Bayesian inference for nonnegative matrix factorisation models. Comput. Intell. Neurosci. 2009, 17 (2009). CrossRefGoogle Scholar
  8. 8.
    Bucak, S.S., Gunsel, B.: Incremental subspace learning via non-negative matrix factorization. Pattern Recognit. 42(5), 788–797 (2009)CrossRefGoogle Scholar
  9. 9.
    Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69–77. ACM, (2011)Google Scholar
  10. 10.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Guan, N., Tao, D., Luo, Z., Yuan, B.: Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1087–1099 (2012)CrossRefGoogle Scholar
  12. 12.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations, (2015)Google Scholar
  14. 14.
    Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. In: Advances In Neural Information Processing Systems, pp. 181–189, (2015)Google Scholar
  15. 15.
    Yildirim, S., Cemgil, A.T., Singh, S.S.: An online expectation-maximisation algorithm for nonnegative matrix factorisation models. In: 16th IFAC Symposium on System Identification (SYSID 2012), (2012)Google Scholar
  16. 16.
    Paisley, J., Blei, D., Jordan, M.I.: Bayesian nonnegative matrix factorization with stochastic variational inference. In: volume Handbook of Mixed Membership Models and Their Applications, chapter 11. Chapman and Hall/CRC, (2015)Google Scholar
  17. 17.
    Carvalho, C.M., West, M., et al.: Dynamic matrix-variate graphical models. Bayesian Anal. 2(1), 69–97 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Triantafyllopoulos, K.: Reference priors for matrix-variate dynamic linear models. Commun. Stat.-Theory Methods 37(6), 947–958 (2008)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Hennig, P., Kiefel, M.: Quasi-newton methods: a new direction. J. Mach. Learn. Res. 14(1), 843–865 (2013)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Bertsekas, D.P.: Nonlinear programming. Athena scientific, (1999)Google Scholar
  21. 21.
    Ahn, S., Korattikara, A., Liu, N., Rajan, S., Welling, M.: Large-scale distributed Bayesian matrix factorization using stochastic gradient MCMC. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 9–18. ACM, (2015)Google Scholar
  22. 22.
    Harville, D.A.: Matrix algebra from a statistician’s perspective, vol. 1. Springer, (1997)Google Scholar
  23. 23.
    Anderson, B.D.O., Moore, J.B.: Optimal filtering. Englewood Cliffs 21, 22–95 (1979)zbMATHGoogle Scholar
  24. 24.
    Bottou, L.: Online learning and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK (1998)zbMATHGoogle Scholar
  25. 25.
    Ollivier, Y.: Online natural gradient as a Kalman filter. arXiv:1703.00209, (2017). Accessed 30 Aug 2017
  26. 26.
    Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010, 1–38 (2011)Google Scholar
  27. 27.
    Akyıldız, Ö.D.: Online matrix factorization via Broyden updates. arXiv:1506.04389, (2015). Accessed 25 Aug 2017
  28. 28.
    Akyıldız, Ö.D., Elvira, V., Míguez, J.: The incremental proximal method: a probabilistic perspective. In: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4279–4283, (2018)Google Scholar
  29. 29.
    Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark. arXiv:1609.08675, (2016). Accessed 12 Aug 2018

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations