Large-Scale Parallel Collaborative Filtering for the Netflix Prize

Zhou, Yunhong; Wilkinson, Dennis; Schreiber, Robert; Pan, Rong

doi:10.1007/978-3-540-68880-8_32

Yunhong Zhou¹,
Dennis Wilkinson¹,
Robert Schreiber¹ &
…
Rong Pan¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5034))

Included in the following conference series:

International Conference on Algorithmic Applications in Management

5521 Accesses
379 Citations
1 Altmetric

Abstract

Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering (CF) based on historical records of items that the users have viewed, purchased, or rated. Two major problems that most CF approaches have to contend with are scalability and sparseness of the user profiles. To tackle these issues, in this paper, we describe a CF algorithm alternating-least-squares with weighted-λ -regularization (ALS-WR), which is implemented on a parallel Matlab platform. We show empirically that the performance of ALS-WR (in terms of root mean squared error (RMSE)) monotonically improves with both the number of features and the number of ALS iterations. We applied the ALS-WR algorithm on a large-scale CF problem, the Netflix Challenge, with 1000 hidden features and obtained a RMSE score of 0.8985, which is one of the best results based on a pure method. In addition, combining with the parallel version of other known methods, we achieved a performance improvement of 5.91% over Netflix’s own CineMatch recommendation system. Our method is simple and scales well to very large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Hadoop Project, http://lucene.apache.org/hadoop/
Netflix CineMatch, http://www.netflix.com
Balabanovi, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Communications of the ACM 40(3), 66–72 (1997)
Article Google Scholar
Bell, R., Koren, Y., Volinsky, C.: The bellkor solution to the netflix prize. Netflix Prize Progress Award (October 2007), http://www.netflixprize.com/assets/ProgressPrize2007_KorBell.pdf
Bell, R., Koren, Y., Volinsky, C.: Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: Proc. KDD 2007, pp. 95–104 (2007)
Google Scholar
Chang, F., et al.: Bigtable: A distributed storage system for structured data. In: Proc. of OSDI 2006, pp. 205–218 (2006)
Google Scholar
Das, A., Datar, M., Garg, A., Rajaram, S.: Google news personalization: Scalable online collaborative filtering. In: Proc. of WWW 2007, pp. 271–280 (2007)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proc. OSDI 2004, San Francisco, pp. 137–150 (2004)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41(6), 391–407 (1999)
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proc. of SOSP 2003, pp. 29–43 (2003)
Google Scholar
Hill, W., Stead, L., Rosenstein, M., Furnas, G.: Recommending and evaluating choices in a virtual community of use. In: Proc. of CHI 1995, Denver (1995)
Google Scholar
Krulwich, B., Burkey, C.: Learning user information interests through extraction of semantically significant phrases. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, Stanford, CA (March 1996)
Google Scholar
Kurucz, M., Benczur, A.A., Csalogany, K.: Methods for large scale SVD with missing values. In: Proc. KDD Cup and Workshop (2007)
Google Scholar
Lang, K.: NewsWeeder: Learning to filter Netnews. In: Proc. ICML 1995, pp. 331–339 (1995)
Google Scholar
Lim, Y.J., Teh, Y.W.: Variational bayesian approach to movie rating prediction. In: Proc. KDD Cup and Workshop (2007)
Google Scholar
Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7, 76–80 (2003)
Article Google Scholar
Paterek, A.: Improving regularized singular value decomposition for collaborative filtering. In: Proc. KDD Cup and Workshop (2007)
Google Scholar
Popescul, A., Ungar, L., Pennock, D., Lawrence, S.: Probabilistic models for unified collaborative and content-based recommendation in Sparse-Data Environments. In: Proc. UAI, pp. 437–444 (2001)
Google Scholar
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of Netnews. In: Proc. the ACM Conference on Computer-Supported Cooperative Work, Chapel Hill, NC (1994)
Google Scholar
Salakhutdinov, R., Mnih, A., Hinton, G.E.: Restricted boltzmann machines for collaborative filtering. In: Proc. ICML, pp. 791–798 (2007)
Google Scholar
Takacs, G., Pilaszy, I., Nemeth, B., Tikk, D.: On the gravity recommendation system. In: Proc. KDD Cup and Workshop (2007)
Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-posed Problems. John Wiley, New York (1977)
MATH Google Scholar
Wu, M.: Collaborative filtering via ensembles of matrix factorizations. In: Proc. KDD Cup and Workshop (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

HP Labs, 1501 Page Mill Rd, Palo Alto, CA, 94304,
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber & Rong Pan

Authors

Yunhong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Wilkinson
View author publications
You can also search for this author in PubMed Google Scholar
Robert Schreiber
View author publications
You can also search for this author in PubMed Google Scholar
Rong Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rudolf Fleischer Jinhui Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R. (2008). Large-Scale Parallel Collaborative Filtering for the Netflix Prize. In: Fleischer, R., Xu, J. (eds) Algorithmic Aspects in Information and Management. AAIM 2008. Lecture Notes in Computer Science, vol 5034. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68880-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-68880-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68865-5
Online ISBN: 978-3-540-68880-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics