Index partitioning through a bipartite graph model for faster similarity search in recommendation systems

Abstract

Scalability of a recommendation system is an important factor for large e-commerce sites containing millions of products visited by millions of users. Similarity search is the core operation in recommendation systems. In this paper, we explain a framework to alleviate performance bottleneck of similarity search for very large-scale recommendation systems. The framework employs inverted index for real-time similarity search and handles dynamic updates. As the inverted index gets larger, retrieving recommendations become computationally expensive. There are various works devoted to solve this problem, such as clustering and preprocessing to compute recommendations offline. Our solution is based on bipartite graph partitioning for minimizing the affinity between entities in different partitions. Number of operations in similarity search is reduced by executing search within the closest partitions to the query. Parts are balanced, so that computational loads of partitions are almost the same, which is significant for reducing the computational cost. Sequential experiments with several different recommendation approaches and large datasets consisting of millions of users and items validate the scalability of the proposed recommendation framework. Accuracy drops only by a small factor due to partitioning, if any. Even slight improvements in recommendation accuracy are observed in our collaborative filtering experiments.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    http://rit.rakuten.co.jp/opendata.html

  2. 2.

    http://grouplens.org/datasets/movielens/

  3. 3.

    https://github.com/ocelma/python-recsys

References

  1. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.

    Article  Google Scholar 

  2. Altingovde, I.S., Subakan, Ö.N., & Ulusoy, Ö. (2013). Cluster searching strategies for collaborative recommendation systems. Information Processing & Management, 49(3), 688–697.

    Article  Google Scholar 

  3. Bellogín, A., & Parapar, J. (2012). Using graph partitioning techniques for neighbour selection in user-based collaborative filtering. In Proceedings of the sixth ACM conference on Recommender systems (pp. 213–216): ACM.

  4. Bellogín, A., Wang, J., & Castells, P. (2013). Bridging memory-based collaborative filtering and text retrieval. Information Retrieval, 16(6), 697–724.

    Article  Google Scholar 

  5. Boman, E., Devine, K., Fisk, L.A., Heaphy, R., Hendrickson, B., Vaughan, C., Catalyurek, U., Bozdag, D., Mitchell, W., & Teresco, J. (2007). Zoltan 3.0: parallel partitioning, load-balancing, and data management services; users guide Sandia National Laboratories, Albuquerque, NM.

  6. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the CIKM, (Vol. 2003 pp. 426–434).

  7. Bui, T.N., & Jones, C. (1993). A heuristic for reducing fill-in in sparse matrix factorization. In PPSC (pp. 445–452).

  8. Cambazoglu, B.B. (2006). Models and algorithms for parallel text retrieval: PhD thesis, Bilkent University.

  9. Cambazoglu, B.B., Catal, A., & Aykanat, C. (2006). Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems. In Proceedings of the International Symposium on Computer and Information Sciences–ISCIS 2006 (pp. 717–725): Springer.

  10. Catalyurek, U.V., & Aykanat, C. (1999). Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems, 10(7), 673–693.

    Article  Google Scholar 

  11. Cevahir, A. (2014). Scalable textual similarity search on large document collections through random indexing and k-means clustering. In Trends and Applications in Knowledge Discovery and Data Mining, LNCS (LNAI) 8643.

  12. Chen, R., Shi, J., Zang, B., & Guan, H. (2015). Bipartite-oriented distributed graph partitioning for big learning. Journal of Computer Science and Technology, 30(1), 20–29.

    Article  Google Scholar 

  13. Cöster, R., & Svensson, M. (2002). Inverted file search algorithms for collaborative filtering. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

  14. Das, A.S., Datar, M., Garg, A., & Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web (pp. 271–280): ACM.

  15. Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigentaste: a constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133–151.

    Article  Google Scholar 

  16. Gong, S. (2010). A collaborative filtering recommendation algorithm based on user clustering and item clustering. Journal of Software, 5(7), 745–752.

    Article  Google Scholar 

  17. Grainger, T. (2012). Building a real time, solr-powered recommendation engine. In Lucene Revolution 2012, Oral Presentation, Retrieved January 7, 2014 from http://www.youtube.com/watch?v=13yQbaW2V4Y.

  18. Hendrickson, B., & Kolda, T.G. (2000). Graph partitioning models for parallel computing. Parallel Computing, 26(12), 1519– 1534.

    Article  Google Scholar 

  19. Huang, Z., Zeng, D.D., & Chen, H. (2007). Analyzing consumer-product graphs: Empirical findings and applications in recommender systems. Management Science, 53(7), 1146–1164.

    Article  Google Scholar 

  20. Karypis, G., & Kumar, V. (1995). Metis-unstructured graph partitioning and sparse matrix ordering system version 2.0. Technical Report, Minnesota University Minneapolis Department of Computer Science.

  21. Katukuri, J., Mukherjee, R., & Konik, T. (2013). Large-scale recommendations in a dynamic marketplace. In LSRS.

  22. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

    Article  Google Scholar 

  23. Lacic, E., Kowald, D., Parra, D., Kahr, M., & Trattner, C. (2014). Towards a scalable social recommender engine for online marketplaces: the case of apache solr. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion (pp. 817–822).

  24. Li, P., Hastie, T.J., & Church, K.W. (2006). Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296): ACM.

  25. Li, X., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: a graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880–890.

    Article  Google Scholar 

  26. Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80.

    Article  Google Scholar 

  27. McCandless, M., Hatcher, E., & Gospodnetic, O. (2010). Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co.

  28. O’Connor, M., & Herlocker, J. (1999). Clustering items for collaborative filtering. In Proceedings of the ACM SIGIR Workshop on Recommender Systems.

  29. Pham, M.C., Cao, Y., Klamma, R., & Jarke, M. (2011). A clustering approach for collaborative filtering recommendation using social network analysis. J UCS, 17(4), 583–604.

    Google Scholar 

  30. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (pp. 175-186): ACM.

  31. Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

    Article  Google Scholar 

  32. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Technical Report, Minnesota University Minneapolis Department of Computer Science.

  33. Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (pp. 285–295): ACM.

  34. Turtle, H., & Flood, J. (1995). Query evaluation: strategies and optimizations. Information Processing and Management, 31(6), 831–850.

    Article  Google Scholar 

  35. Ungar, L.H., & Foster, D.P. (1998). Clustering methods for collaborative filtering. In AAAI Workshop on Recommendation Systems.

  36. Xue, G.R., Lin, C., Yang, Q., Xi, W., Zeng, H.J., Yu, Y., & Chen, Z. (2005). Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 114–121): ACM.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ali Cevahir.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cevahir, A. Index partitioning through a bipartite graph model for faster similarity search in recommendation systems. Inf Syst Front 19, 1161–1176 (2017). https://doi.org/10.1007/s10796-016-9646-x

Download citation

Keywords

  • Recommendation systems
  • Content-based recommendations
  • Collaborative filtering
  • Inverted index
  • Bipartite graph
  • Graph partitioning