Using rating matrix compression techniques to speed up collaborative recommendations
- 540 Downloads
Collaborative filtering is a popular recommendation technique. Although researchers have focused on the accuracy of the recommendations, real applications also need efficient algorithms. An index structure can be used to store the rating matrix and compute recommendations very fast. In this paper we study how compression techniques can reduce the size of this index structure and, at the same time, speed up recommendations. We show how coding techniques commonly used in Information Retrieval can be effectively applied to collaborative filtering, reducing the matrix size up to 75 %, and almost doubling the recommendation speed. Additionally, we propose a novel identifier reassignment technique, that achieves high compression rates, reducing by 40 % the size of an already compressed matrix. It is a very simple approach based on assigning the smallest identifiers to the items and users with the highest number of ratings, and it can be efficiently computed using a two pass indexing. The usage of the proposed compression techniques can significantly reduce the storage and time costs of recommender systems, which are two important factors in many real applications.
KeywordsRecommender systems Collaborative filtering Rating matrix compression Identifier assignment
This research was supported by the Ministry of Education and Science of Spain and FEDER funds of the European Union (Project TIN2009-14203).
- Badue, C. S., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A., & Ziviani, N. (2007). Analyzing imbalance among homogeneous index servers in a web search system. Information Processing and Manage, 43, 592–608. doi: 10.1016/j.ipm.2006.09.002. http://dl.acm.org/citation.cfm?id=1224561.1224707.Google Scholar
- Bennett, J., & Lanning, S. (2007). The netflix prize. In KDDCup ’07: Proceedings of KDD cup and workshop, (p. 4). San Jose, CA, USA: ACM. http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings.html.
- Blanco, R., & Barreiro, A. (2005). Characterization of a simple case of the reassignment of document identifiers as a pattern sequencing problem. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’05, (pp. 587–588). New York, NY, USA: ACM. doi: 10.1145/1076034.1076141.
- Blandford, D., & Blelloch, G. (2002). Index compression through document reordering. In Proceedings of the data compression conference, DCC ’02, (p. 342). Washington, DC, USA: IEEE Computer Society. http://dl.acm.org/citation.cfm?id=882455.875020.
- Breese, J. S., Heckerman, D., & Kadie, C. M. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In G. F. Cooper, & S. Moral (Eds.), Proceedings of the fourteenth annual conference on uncertainty in artificial intelligence, (pp. 43–52). San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
- Brozovsky, L., & Petricek, V. (2007). Recommender system for online dating service. In Proceedings of Znalosti 2007 conference. Ostrava: VSB.Google Scholar
- Cöster, R., & Svensson, M. (2002). Inverted file search algorithms for collaborative filtering. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 246–252). New York, NY, USA: ACM. doi: 10.1145/564376.564420.
- Ding, S., Attenberg, J., & Suel, T. (2010). Scalable techniques for document identifier assignment ininverted indexes. In Proceedings of the 19th international conference on world wide web, WWW ’10, (pp. 311–320). New York, NY, USA: ACM. doi: 10.1145/1772690.1772723.
- Ding, S., & Suel, T. (2011). Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’11, (pp. 993–1002). New York, NY, USA.: ACM. doi: 10.1145/2009916.2010048.
- Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master’s thesis, University of Toronto. citeseer.ist.psu.edu/marlin04collaborative.html.
- Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In CSCW ’94: Proceedings of the 1994 ACM conference on computer supported cooperative work, (pp. 175–186). New York, NY, USA: ACM. doi: 10.1145/192844.192905.
- Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th international conference on world wide web, (pp. 285–295). New York, NY, USA: ACM. doi: 10.1145/371920.372071.
- Silvestri, F. (2007). Sorting out the document identifier assignment problem. In Proceedings of the 29th European conference on IR research , ECIR’07, (pp. 101–112). Heidelberg, Berlin: Springer. http://dl.acm.org/citation.cfm?id=1763653.1763668.
- Silvestri, F., Orlando, S., & Perego, R. (2004). Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval SIGIR ’04, (pp. 305–312). New York, NY, USA: ACM. doi: 10.1145/1008992.1009046.
- Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes : Compressing and indexing documents and images (2nd ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar