Advertisement

Information Retrieval

, Volume 16, Issue 6, pp 680–696 | Cite as

Using rating matrix compression techniques to speed up collaborative recommendations

  • Vreixo Formoso
  • Diego Fernández
  • Fidel Cacheda
  • Victor Carneiro
Article

Abstract

Collaborative filtering is a popular recommendation technique. Although researchers have focused on the accuracy of the recommendations, real applications also need efficient algorithms. An index structure can be used to store the rating matrix and compute recommendations very fast. In this paper we study how compression techniques can reduce the size of this index structure and, at the same time, speed up recommendations. We show how coding techniques commonly used in Information Retrieval can be effectively applied to collaborative filtering, reducing the matrix size up to 75 %, and almost doubling the recommendation speed. Additionally, we propose a novel identifier reassignment technique, that achieves high compression rates, reducing by 40 % the size of an already compressed matrix. It is a very simple approach based on assigning the smallest identifiers to the items and users with the highest number of ratings, and it can be efficiently computed using a two pass indexing. The usage of the proposed compression techniques can significantly reduce the storage and time costs of recommender systems, which are two important factors in many real applications.

Keywords

Recommender systems Collaborative filtering Rating matrix compression Identifier assignment 

Notes

Acknowledgments

This research was supported by the Ministry of Education and Science of Spain and FEDER funds of the European Union (Project TIN2009-14203).

References

  1. Badue, C. S., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A., & Ziviani, N. (2007). Analyzing imbalance among homogeneous index servers in a web search system. Information Processing and Manage, 43, 592–608. doi: 10.1016/j.ipm.2006.09.002. http://dl.acm.org/citation.cfm?id=1224561.1224707.Google Scholar
  2. Bennett, J., & Lanning, S. (2007). The netflix prize. In KDDCup ’07: Proceedings of KDD cup and workshop, (p. 4). San Jose, CA, USA: ACM. http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings.html.
  3. Blanco, R., & Barreiro, A. (2005). Characterization of a simple case of the reassignment of document identifiers as a pattern sequencing problem. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’05, (pp. 587–588). New York, NY, USA: ACM. doi: 10.1145/1076034.1076141.
  4. Blanco, R., &  Barreiro, A. (2006). Tsp and cluster-based solutions to the reassignment of document identifiers. Information Retrieval 9, 499–517. doi: 10.1007/s10791-006-6614-y. http://dl.acm.org/citation.cfm?id=1147841.1147844.
  5. Blandford, D., & Blelloch, G. (2002). Index compression through document reordering. In Proceedings of the data compression conference, DCC ’02, (p. 342). Washington, DC, USA: IEEE Computer Society. http://dl.acm.org/citation.cfm?id=882455.875020.
  6. Boldi, P., & Vigna, S. (2005). Codes for the world wide web. Internet mathematics, 2(4), 407–429. doi: 10.1080/15427951.2005.10129113.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Breese, J. S., Heckerman, D., & Kadie, C. M. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In G. F. Cooper, & S. Moral (Eds.), Proceedings of the fourteenth annual conference on uncertainty in artificial intelligence, (pp. 43–52). San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
  8. Brozovsky, L., & Petricek, V. (2007). Recommender system for online dating service. In Proceedings of Znalosti 2007 conference. Ostrava: VSB.Google Scholar
  9. Cacheda, F., Carneiro, V., Fernández, D., & Formoso, V. (2011). Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Transactions on the Web, 5, 2:1–2:33. doi: 10.1145/1921591.1921593.Google Scholar
  10. Cöster, R., & Svensson, M. (2002). Inverted file search algorithms for collaborative filtering. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 246–252). New York, NY, USA: ACM. doi: 10.1145/564376.564420.
  11. Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook, (pp. 107–144). New York: Springer.CrossRefGoogle Scholar
  12. Ding, S., Attenberg, J., & Suel, T. (2010). Scalable techniques for document identifier assignment ininverted indexes. In Proceedings of the 19th international conference on world wide web, WWW ’10, (pp. 311–320). New York, NY, USA: ACM. doi: 10.1145/1772690.1772723.
  13. Ding, S., & Suel, T. (2011). Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’11, (pp. 993–1002). New York, NY, USA.: ACM. doi: 10.1145/2009916.2010048.
  14. Elias, P. (1975). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194 – 203. doi: 10.1109/TIT.1975.1055349.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Golomb, S. (1966). Run-length encodings (corresp.). IEEE Transactions on Information Theory, 12(3), 399–401. doi: 10.1109/TIT.1966.1053907.MathSciNetCrossRefzbMATHGoogle Scholar
  16. Herlocker, J., Konstan, J. A., & Riedl, J. (2002). An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310. doi: 10.1023/A:1020443909834.CrossRefGoogle Scholar
  17. Huffman, D. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9), 1098 –1101. doi: 10.1109/JRPROC.1952.273898.CrossRefGoogle Scholar
  18. Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80. doi: 10.1109/MIC.2003.1167344.Google Scholar
  19. Manning, C. D., Raghavan, P., Schtze, H. (2008). Introduction to information retrieval. New York, NY, USA: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  20. Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master’s thesis, University of Toronto. citeseer.ist.psu.edu/marlin04collaborative.html.
  21. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In CSCW ’94: Proceedings of the 1994 ACM conference on computer supported cooperative work, (pp. 175–186). New York, NY, USA: ACM. doi: 10.1145/192844.192905.
  22. Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th international conference on world wide web, (pp. 285–295). New York, NY, USA: ACM. doi: 10.1145/371920.372071.
  23. Silvestri, F. (2007). Sorting out the document identifier assignment problem. In Proceedings of the 29th European conference on IR research , ECIR’07, (pp. 101–112). Heidelberg, Berlin: Springer. http://dl.acm.org/citation.cfm?id=1763653.1763668.
  24. Silvestri, F., Orlando, S., & Perego, R. (2004). Assigning identifiers to documents to enhance the clustering property of fulltext indexes. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval SIGIR ’04, (pp. 305–312). New York, NY, USA: ACM. doi: 10.1145/1008992.1009046.
  25. Turtle, H., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing & Management, 31(6), 831–850. doi: 10.1016/0306-4573(95)00020-H.Google Scholar
  26. Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing gigabytes : Compressing and indexing documents and images (2nd ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Vreixo Formoso
    • 1
  • Diego Fernández
    • 1
  • Fidel Cacheda
    • 1
  • Victor Carneiro
    • 1
  1. 1.Department of Information and Communication TechnologiesFacultad de InformáticaCoruñaSpain

Personalised recommendations