Abstract
Scalability of a recommendation system is an important factor for large e-commerce sites containing millions of products visited by millions of users. Similarity search is the core operation in recommendation systems. In this paper, we explain a framework to alleviate performance bottleneck of similarity search for very large-scale recommendation systems. The framework employs inverted index for real-time similarity search and handles dynamic updates. As the inverted index gets larger, retrieving recommendations become computationally expensive. There are various works devoted to solve this problem, such as clustering and preprocessing to compute recommendations offline. Our solution is based on bipartite graph partitioning for minimizing the affinity between entities in different partitions. Number of operations in similarity search is reduced by executing search within the closest partitions to the query. Parts are balanced, so that computational loads of partitions are almost the same, which is significant for reducing the computational cost. Sequential experiments with several different recommendation approaches and large datasets consisting of millions of users and items validate the scalability of the proposed recommendation framework. Accuracy drops only by a small factor due to partitioning, if any. Even slight improvements in recommendation accuracy are observed in our collaborative filtering experiments.
Similar content being viewed by others
References
Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749.
Altingovde, I.S., Subakan, Ö.N., & Ulusoy, Ö. (2013). Cluster searching strategies for collaborative recommendation systems. Information Processing & Management, 49(3), 688–697.
Bellogín, A., & Parapar, J. (2012). Using graph partitioning techniques for neighbour selection in user-based collaborative filtering. In Proceedings of the sixth ACM conference on Recommender systems (pp. 213–216): ACM.
Bellogín, A., Wang, J., & Castells, P. (2013). Bridging memory-based collaborative filtering and text retrieval. Information Retrieval, 16(6), 697–724.
Boman, E., Devine, K., Fisk, L.A., Heaphy, R., Hendrickson, B., Vaughan, C., Catalyurek, U., Bozdag, D., Mitchell, W., & Teresco, J. (2007). Zoltan 3.0: parallel partitioning, load-balancing, and data management services; users guide Sandia National Laboratories, Albuquerque, NM.
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the CIKM, (Vol. 2003 pp. 426–434).
Bui, T.N., & Jones, C. (1993). A heuristic for reducing fill-in in sparse matrix factorization. In PPSC (pp. 445–452).
Cambazoglu, B.B. (2006). Models and algorithms for parallel text retrieval: PhD thesis, Bilkent University.
Cambazoglu, B.B., Catal, A., & Aykanat, C. (2006). Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems. In Proceedings of the International Symposium on Computer and Information Sciences–ISCIS 2006 (pp. 717–725): Springer.
Catalyurek, U.V., & Aykanat, C. (1999). Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems, 10(7), 673–693.
Cevahir, A. (2014). Scalable textual similarity search on large document collections through random indexing and k-means clustering. In Trends and Applications in Knowledge Discovery and Data Mining, LNCS (LNAI) 8643.
Chen, R., Shi, J., Zang, B., & Guan, H. (2015). Bipartite-oriented distributed graph partitioning for big learning. Journal of Computer Science and Technology, 30(1), 20–29.
Cöster, R., & Svensson, M. (2002). Inverted file search algorithms for collaborative filtering. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Das, A.S., Datar, M., Garg, A., & Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web (pp. 271–280): ACM.
Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigentaste: a constant time collaborative filtering algorithm. Information Retrieval, 4(2), 133–151.
Gong, S. (2010). A collaborative filtering recommendation algorithm based on user clustering and item clustering. Journal of Software, 5(7), 745–752.
Grainger, T. (2012). Building a real time, solr-powered recommendation engine. In Lucene Revolution 2012, Oral Presentation, Retrieved January 7, 2014 from http://www.youtube.com/watch?v=13yQbaW2V4Y.
Hendrickson, B., & Kolda, T.G. (2000). Graph partitioning models for parallel computing. Parallel Computing, 26(12), 1519– 1534.
Huang, Z., Zeng, D.D., & Chen, H. (2007). Analyzing consumer-product graphs: Empirical findings and applications in recommender systems. Management Science, 53(7), 1146–1164.
Karypis, G., & Kumar, V. (1995). Metis-unstructured graph partitioning and sparse matrix ordering system version 2.0. Technical Report, Minnesota University Minneapolis Department of Computer Science.
Katukuri, J., Mukherjee, R., & Konik, T. (2013). Large-scale recommendations in a dynamic marketplace. In LSRS.
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
Lacic, E., Kowald, D., Parra, D., Kahr, M., & Trattner, C. (2014). Towards a scalable social recommender engine for online marketplaces: the case of apache solr. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion (pp. 817–822).
Li, P., Hastie, T.J., & Church, K.W. (2006). Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 287–296): ACM.
Li, X., & Chen, H. (2013). Recommendation as link prediction in bipartite graphs: a graph kernel-based machine learning approach. Decision Support Systems, 54(2), 880–890.
Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80.
McCandless, M., Hatcher, E., & Gospodnetic, O. (2010). Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co.
O’Connor, M., & Herlocker, J. (1999). Clustering items for collaborative filtering. In Proceedings of the ACM SIGIR Workshop on Recommender Systems.
Pham, M.C., Cao, Y., Klamma, R., & Jarke, M. (2011). A clustering approach for collaborative filtering recommendation using social network analysis. J UCS, 17(4), 583–604.
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work (pp. 175-186): ACM.
Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Technical Report, Minnesota University Minneapolis Department of Computer Science.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (pp. 285–295): ACM.
Turtle, H., & Flood, J. (1995). Query evaluation: strategies and optimizations. Information Processing and Management, 31(6), 831–850.
Ungar, L.H., & Foster, D.P. (1998). Clustering methods for collaborative filtering. In AAAI Workshop on Recommendation Systems.
Xue, G.R., Lin, C., Yang, Q., Xi, W., Zeng, H.J., Yu, Y., & Chen, Z. (2005). Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 114–121): ACM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cevahir, A. Index partitioning through a bipartite graph model for faster similarity search in recommendation systems. Inf Syst Front 19, 1161–1176 (2017). https://doi.org/10.1007/s10796-016-9646-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-016-9646-x