Statistics and Computing

, Volume 22, Issue 4, pp 945–957 | Cite as

A hierarchical model for ordinal matrix factorization

Article

Abstract

This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries.

The model is evaluated on a collaborative filtering task, where users have rated a collection of movies and the system is asked to predict their ratings for other movies. The Netflix data set is used for evaluation, which consists of around 100 million ratings. Using root mean-squared error (RMSE) as an evaluation metric, results show that the suggested model outperforms alternative factorization techniques. Results also show how Gibbs sampling outperforms variational Bayes on this task, despite the large number of ratings and model parameters. Matlab implementations of the proposed algorithms are available from cogsys.imm.dtu.dk/ordinalmatrixfactorization.

Keywords

Large scale machine learning Collaborative filtering Ordinal regression Low rank matrix decomposition Hierarchial modelling Bayesian inference Variational Bayes Gibbs sampling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993) MathSciNetMATHCrossRefGoogle Scholar
  2. Ansari, A., Essegaier, S., Kohli, R.: Internet recommendation systems. J. Mark. Res., 363–375 (2000) Google Scholar
  3. Bell, R.M., Koren, Y.: Improved neighborhood-based collaborative filtering. In: Proceedings of KDD Cup and Workshop (2007) Google Scholar
  4. Bell, R.M., Koren, Y., Volinsky, C.: The BellKor solution to the Netflix prize. Tech. rep., AT&T Labs–Research (2007) Google Scholar
  5. Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007) MathSciNetMATHCrossRefGoogle Scholar
  6. Chu, W., Ghahramani, Z.: Gaussian processes for ordinal regression. J. Mach. Learn. Res. 6, 1019–1041 (2005) MathSciNetMATHGoogle Scholar
  7. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984) MATHCrossRefGoogle Scholar
  8. Hofmann, T.: Probabilistic latent semantic analysis. In: Uncertainty in Artificial Intelligence, pp. 289–296 (1999) Google Scholar
  9. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999) MATHCrossRefGoogle Scholar
  10. Koren, Y.: The BellKor solution to the Netflix Grand Prize. Tech. rep. (2009) Google Scholar
  11. Lawrence, N.D., Urtasun, R.: Non-linear matrix factorization with Gaussian processes. In: Bottou, L., Littman, M. (eds.) Proceedings of the International Conference in Machine Learning. Morgan Kauffman, San Francisco (2009) Google Scholar
  12. Lim, Y.J., Teh, Y.W.: Variational Bayesian approach to movie rating prediction. In: Proceedings of KDD Cup and Workshop (2007) Google Scholar
  13. Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003) CrossRefGoogle Scholar
  14. Mackey, L., Weiss, D., Jordan, M.I.: Mixed membership matrix factorization. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning, pp. 711–718 (2010) Google Scholar
  15. Marlin, B.: Modeling user rating profiles for collaborative filtering. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004) Google Scholar
  16. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman and Hall, London (1989) MATHGoogle Scholar
  17. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 10, 1348–1362 (2008) CrossRefGoogle Scholar
  18. Neal, R.: Bayesian Learning for Neural Networks. Springer, New York (1996) MATHCrossRefGoogle Scholar
  19. Piotte, M., Chabbert, M.: The Pragmatic Theory solution to the Netflix Grand Prize. Tech. rep. (2009) Google Scholar
  20. Porteous, I., Asuncion, A., Welling, M.: Bayesian matrix factorization with side information and Dirichlet process mixtures. In: AAAI Conference on Artificial Intelligence (2010) Google Scholar
  21. Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 713–719 (2005) CrossRefGoogle Scholar
  22. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, Berlin (2004) MATHGoogle Scholar
  23. Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of the 25th International Conference on Machine Learning, pp. 880–887 (2008a) CrossRefGoogle Scholar
  24. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 1257–1264. MIT Press, Cambridge (2008b) Google Scholar
  25. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the International Conference on Machine Learning, vol. 24, pp. 791–798 (2007) Google Scholar
  26. Shen, B.H., Ji, S., Ye, J.: Mining discrete patterns via binary matrix factorization. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 757–766 (2009) CrossRefGoogle Scholar
  27. Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. Adv. Neural Inf. Process. Syst. 17, 1329–1336 (2005) Google Scholar
  28. Stern, D.H., Herbrich, R., Graepel, T.: Matchbox: large scale online Bayesian recommendations. In: WWW, pp. 111–120 (2009) CrossRefGoogle Scholar
  29. Stevens, S.S.: On the theory of scales of measurement. Science 103(2684), 677–680 (1946) MATHCrossRefGoogle Scholar
  30. Takács, G., Pilászy, I., Németh, B., Tikk, D.: Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 10, 623–656 (2009) Google Scholar
  31. Töscher, A., Jahrer, M., Bell, R.: The BigChaos solution to the Netflix Grand Prize. Tech. rep. (2009) Google Scholar
  32. Yu, K., Lafferty, J., Zhu, S., Gong, Y.: Large-scale collaborative prediction using a nonparametric random effects model. In: Bottou, L., Littman, M. (eds.) Proceedings of the International Conference in Machine Learning. Morgan Kauffman, San Francisco (2009a) Google Scholar
  33. Yu, K., Zhu, S., Lafferty, J., Gong, Y.: Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of the 32nd International ACM SIGIR Conference, pp. 211–218 (2009b) Google Scholar
  34. Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Discov., 1–25 (2009) Google Scholar
  35. Zhu, S., Yu, K., Gong, Y.: Stochastic relational models for large-scale dyadic data using MCMC. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 1993–2000 (2009) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Microsoft Research CambridgeCambridgeUK
  2. 2.Engineering DepartmentUniversity of CambridgeCambridgeUK
  3. 3.Informatics and Mathematical ModellingTechnical University of DenmarkLyngbyDenmark

Personalised recommendations