Abstract
Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmadizar, F. (2012). A new ant colony algorithm for makespan minimization in permutation flow shops. Computers & Industrial Engineering.
Amdahl, G. (1967). Validity of the single processor approach to achieving large scale computing capabilities. In Proc. spring joint computer conf. (pp. 483–485). ACM.
Anand, S.S. & Mobasher, B. (2003). Intelligent techniques for web personalization. In Proc. int. conf. intelligent techniques for web personalization (pp. 1–36). Springer.
Berkovsky, S. & Freyne, J. (2010). Group-based recipe recommendations: analysis of data aggregation strategies. In Proc. 4th ACM conf. Recommender Systems, RecSys ’10 (pp. 111–118). New York: ACM. doi:10.1145/1864708.1864732.
Bilolikar, V., Jain, K., Sharma, M. (2012). An annealed genetic algorithm for multi mode resource constrained project scheduling problem. International Journal of Computers and Applications, 60(1), 36–42.
Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6), 520–528.
Chhabra, S. & Resnick, P. (2012). Cubethat: news article recommender. In Proc. 6th ACM Conf. Recommender Systems, RecSys ’12 (pp. 295–296). New York: ACM. doi:10.1145/2365952.2366020.
Das, A., Datar, M., Garg, A., Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proc. 16th int. conf. world wide web (pp. 271–280). ACM.
De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L. (2011). Content-based recommendation algorithms on the hadoop mapreduce framework. In Proc. 7th int. conf. web information systems and technologies. Ghent University, Department of Information Technology.
Dean, J. & Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
Dooms, S., De Pessemier, T., Martens, L. (2011). A file-based approach for recommender systems in high-performance computing environments. In Proc. 22nd int. workshop on database and expert systems applications (pp. 529–533). IEEE. doi:10.1109/DEXA.2011.3.
Dooms, S., De Pessemier, T., Martens, L. (2011). An online evaluation of explicit feedback mechanisms for recommender systems. In Proc. 7th int. conf. web information systems and technologies (pp. 391–394).
Dooms, S., De Pessemier, T., Martens, L. (2011). A user-centric evaluation of recommender algorithms for an event recommendation system. In Workshop on Human Decision Making in Recommender Systems (Decisions@RecSys’11) and User-Centric Evaluation of Recommender Systems and Their Interfaces—2 (UCERSTI 2) affiliated with 5th ACM Conf. Recommender Systems (RecSys 2011) (pp. 67–73).
Gomez-Gasquet, P., Segura-Andres, R., Franco, D., Andres, C. (2012). A makespan minimization in an m-stage flow shop lot streaming with sequence dependent setup times: Milp model and experimental approach. In 6th int. conf. industrial engineering and industrial management (pp. 332–339).
Hager, G. & Wellein, G. (2010). Introduction to high performance computing for scientists and engineers (1st ed.). Boca Raton: CRC Press, Inc.
Han, P., Xie, B., Yang, F., Shen, R. (2004). A scalable p2p recommender system based on distributed collaborative filtering. Expert Systems with Applications, 27(2), 203–210. doi:10.1016/j.eswa.2004.01.003. http://www.sciencedirect.com/science/article/pii/S0957417404000065.
Herlocker, J., Konstan, J.A., Riedl, J. (2002). An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310.
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In Proc. 22nd int. ACM SIGIR conf. research and development in information retrieval (pp. 230–237). ACM.
Hochbaum, D.S. & Shmoys, D.B. (1987). Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of the ACM, 34(1), 144–162. doi:10.1145/7531.7535.
Jannach, D., Zanker, M., Felfernig, A., Friedrich, G. (2010). Recommender systems: An introduction. Cambridge University Press.
Jiang, J., Lu, J., Zhang, G., Long, G. (2011). Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In 2011 IEEE world congress on services (SERVICES) (pp. 490–497). IEEE.
Keckler, S., Olukotun, K., Hofstee, H. (2009). Multicore processors and systems. Springer
Lämmel, R. (2008). Googles mapreduce programming model revisited. Science of Computer Programming, 70(1), 1–30.
Levi, A., Mokryn, O., Diot, C., Taft, N. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 115–122). New York: ACM. doi:10.1145/2365952.2365977.
Liu, M., Zheng, F., Wang, S., Xu, Y. (2013). Approximation algorithms for parallel machine scheduling with linear deterioration. Theoretical Computer Science, 497, 108–111. doi:10.1016/j.tcs.2012.01.020.
McCarthy, J.F. & Anagnost, T.D. (1998). MusicFX: an arbiter of group preferences for computer supported collaborative workouts. In Proc. ACM conf. Computer Supported Cooperative Work, CSCW ’98 (pp. 363–372). New York: ACM. doi:10.1145/289444.289511.
Pera, M.S. & Ng, Y.K. (2012). Personalized recommendations on books for k-12 readers. In Proc. 5th ACM workshop on Research advances in large digital book repositories and complementary media, BooksOnline ’12 (pp. 11–12). New York: ACM. doi:10.1145/2390116.2390124.
Peralta, V. (2007). Extraction and integration of movielens and imdb data. Tech. rep., Technical Report, Laboratoire PRiSM, Université de Versailles, France.
Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Tech. rep., DTIC Document.
Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2002). Incremental singular value decomposition algorithms for highly scalable recommender systems. In 5th int. conf. computer and information science (pp. 27–28). Citeseer.
Schelter, S., Boden, C., Markl, V. (2012). Scalable similarity-based neighborhood methods with mapreduce. In pROC. 6th ACM conf. on recommender systems (pp. 163–170). ACM.
Symeonidis, P., Nanopoulos, A., Manolopoulos, Y. (2009). Moviexplain: a recommender system with explanations. In Proc. 3rd ACM conf. Recommender Systems, RecSys ’09 (pp. 317–320). New York: ACM. doi:10.1145/1639714.1639777.
Takács, G., Pilászy, I., Németh, B., Tikk, D. (2009). Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, 623–656.
Xie, B., Han, P., Yang, F., Shen, R.M., Zeng, H.J., Chen, Z. (2007). Dcfla: a distributed collaborative-filtering neighbor-locating algorithm. Information Sciences, 177(6), 1349–1363. doi:10.1016/j.ins.2006.09.005.
Yang, D., Chen, T., Zhang, W., Lu, Q., Yu, Y. (2012). Local implicit feedback mining for music recommendation. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 91–98). New York: ACM. doi:10.1145/2365952.2365973.
Zhao, Z. & Shang, M. (2010). User-based collaborative-filtering recommendation algorithms on hadoop. In 3rd int. conf. Knowledge Discovery and Data Mining (WKDD’10) (pp. 478–481). IEEE.
Acknowledgements
The described research activities were funded by a PhD grant to Simon Dooms of the Agency for Innovation by Science and Technology (IWT Vlaanderen). The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by Ghent University, the Hercules Foundation and the Flemish Government—department EWI.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dooms, S., Audenaert, P., Fostier, J. et al. In-memory, distributed content-based recommender system. J Intell Inf Syst 42, 645–669 (2014). https://doi.org/10.1007/s10844-013-0276-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0276-1