Information Systems Frontiers

, Volume 19, Issue 5, pp 1161–1176 | Cite as

Index partitioning through a bipartite graph model for faster similarity search in recommendation systems

Article

Abstract

Scalability of a recommendation system is an important factor for large e-commerce sites containing millions of products visited by millions of users. Similarity search is the core operation in recommendation systems. In this paper, we explain a framework to alleviate performance bottleneck of similarity search for very large-scale recommendation systems. The framework employs inverted index for real-time similarity search and handles dynamic updates. As the inverted index gets larger, retrieving recommendations become computationally expensive. There are various works devoted to solve this problem, such as clustering and preprocessing to compute recommendations offline. Our solution is based on bipartite graph partitioning for minimizing the affinity between entities in different partitions. Number of operations in similarity search is reduced by executing search within the closest partitions to the query. Parts are balanced, so that computational loads of partitions are almost the same, which is significant for reducing the computational cost. Sequential experiments with several different recommendation approaches and large datasets consisting of millions of users and items validate the scalability of the proposed recommendation framework. Accuracy drops only by a small factor due to partitioning, if any. Even slight improvements in recommendation accuracy are observed in our collaborative filtering experiments.

Keywords

Recommendation systems Content-based recommendations Collaborative filtering Inverted index Bipartite graph Graph partitioning 

References

  1. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.CrossRefGoogle Scholar
  2. Altingovde, I.S., Subakan, Ö.N., & Ulusoy, Ö. (2013). Cluster searching strategies for collaborative recommendation systems. Information Processing & Management, 49(3), 688–697.CrossRefGoogle Scholar
  3. Bellogín, A., & Parapar, J. (2012). Using graph partitioning techniques for neighbour selection in user-based collaborative filtering. In Proceedings of the sixth ACM conference on Recommender systems (pp. 213–216): ACM.Google Scholar
  4. Bellogín, A., Wang, J., & Castells, P. (2013). Bridging memory-based collaborative filtering and text retrieval. Information Retrieval, 16(6), 697–724.CrossRefGoogle Scholar
  5. Boman, E., Devine, K., Fisk, L.A., Heaphy, R., Hendrickson, B., Vaughan, C., Catalyurek, U., Bozdag, D., Mitchell, W., & Teresco, J. (2007). Zoltan 3.0: parallel partitioning, load-balancing, and data management services; users guide Sandia National Laboratories, Albuquerque, NM.Google Scholar
  6. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the CIKM, (Vol. 2003 pp. 426–434).Google Scholar
  7. Bui, T.N., & Jones, C. (1993). A heuristic for reducing fill-in in sparse matrix factorization. In PPSC (pp. 445–452).Google Scholar
  8. Cambazoglu, B.B. (2006). Models and algorithms for parallel text retrieval: PhD thesis, Bilkent University.Google Scholar
  9. Cambazoglu, B.B., Catal, A., & Aykanat, C. (2006). Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems. In Proceedings of the International Symposium on Computer and Information Sciences–ISCIS 2006 (pp. 717–725): Springer.Google Scholar
  10. Catalyurek, U.V., & Aykanat, C. (1999). Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems, 10(7), 673–693.CrossRefGoogle Scholar
  11. Cevahir, A. (2014). Scalable textual similarity search on large document collections through random indexing and k-means clustering. In Trends and Applications in Knowledge Discovery and Data Mining, LNCS (LNAI) 8643.Google Scholar
  12. Chen, R., Shi, J., Zang, B., & Guan, H. (2015). Bipartite-oriented distributed graph partitioning for big learning. Journal of Computer Science and Technology, 30(1), 20–29.CrossRefGoogle Scholar
  13. Cöster, R., & Svensson, M. (2002). Inverted file search algorithms for collaborative filtering. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
  14. Das, A.S., Datar, M., Garg, A., & Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web (pp. 271–280): ACM.Google Scholar
  15. Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigentaste: a constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133–151.CrossRefGoogle Scholar
  16. Gong, S. (2010). A collaborative filtering recommendation algorithm based on user clustering and item clustering. Journal of Software, 5(7), 745–752.CrossRefGoogle Scholar
  17. Grainger, T. (2012). Building a real time, solr-powered recommendation engine. In Lucene Revolution 2012, Oral Presentation, Retrieved January 7, 2014 from http://www.youtube.com/watch?v=13yQbaW2V4Y.
  18. Hendrickson, B., & Kolda, T.G. (2000). Graph partitioning models for parallel computing. Parallel Computing, 26(12), 1519– 1534.CrossRefGoogle Scholar
  19. Huang, Z., Zeng, D.D., & Chen, H. (2007). Analyzing consumer-product graphs: Empirical findings and applications in recommender systems. Management Science, 53(7), 1146–1164.CrossRefGoogle Scholar
  20. Karypis, G., & Kumar, V. (1995). Metis-unstructured graph partitioning and sparse matrix ordering system version 2.0. Technical Report, Minnesota University Minneapolis Department of Computer Science.Google Scholar
  21. Katukuri, J., Mukherjee, R., & Konik, T. (2013). Large-scale recommendations in a dynamic marketplace. In LSRS.Google Scholar
  22. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.CrossRefGoogle Scholar
  23. Lacic, E., Kowald, D., Parra, D., Kahr, M., & Trattner, C. (2014). Towards a scalable social recommender engine for online marketplaces: the case of apache solr. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion (pp. 817–822).Google Scholar
  24. Li, P., Hastie, T.J., & Church, K.W. (2006). Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296): ACM.Google Scholar
  25. Li, X., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: a graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880–890.CrossRefGoogle Scholar
  26. Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80.CrossRefGoogle Scholar
  27. McCandless, M., Hatcher, E., & Gospodnetic, O. (2010). Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co.Google Scholar
  28. O’Connor, M., & Herlocker, J. (1999). Clustering items for collaborative filtering. In Proceedings of the ACM SIGIR Workshop on Recommender Systems.Google Scholar
  29. Pham, M.C., Cao, Y., Klamma, R., & Jarke, M. (2011). A clustering approach for collaborative filtering recommendation using social network analysis. J UCS, 17(4), 583–604.Google Scholar
  30. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (pp. 175-186): ACM.Google Scholar
  31. Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefGoogle Scholar
  32. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Technical Report, Minnesota University Minneapolis Department of Computer Science.Google Scholar
  33. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (pp. 285–295): ACM.Google Scholar
  34. Turtle, H., & Flood, J. (1995). Query evaluation: strategies and optimizations. Information Processing and Management, 31(6), 831–850.CrossRefGoogle Scholar
  35. Ungar, L.H., & Foster, D.P. (1998). Clustering methods for collaborative filtering. In AAAI Workshop on Recommendation Systems.Google Scholar
  36. Xue, G.R., Lin, C., Yang, Q., Xi, W., Zeng, H.J., Yu, Y., & Chen, Z. (2005). Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 114–121): ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Rakuten Institute of TechnologyTokyoJapan

Personalised recommendations