Journal of Intelligent Information Systems

, Volume 42, Issue 3, pp 645–669 | Cite as

In-memory, distributed content-based recommender system

  • Simon DoomsEmail author
  • Pieter Audenaert
  • Jan Fostier
  • Toon De Pessemier
  • Luc Martens


Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node).


Recommender system Distributed Parallel Speedup 



The described research activities were funded by a PhD grant to Simon Dooms of the Agency for Innovation by Science and Technology (IWT Vlaanderen). The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by Ghent University, the Hercules Foundation and the Flemish Government—department EWI.


  1. Ahmadizar, F. (2012). A new ant colony algorithm for makespan minimization in permutation flow shops. Computers & Industrial Engineering.Google Scholar
  2. Amdahl, G. (1967). Validity of the single processor approach to achieving large scale computing capabilities. In Proc. spring joint computer conf. (pp. 483–485). ACM.Google Scholar
  3. Anand, S.S. & Mobasher, B. (2003). Intelligent techniques for web personalization. In Proc. int. conf. intelligent techniques for web personalization (pp. 1–36). Springer.Google Scholar
  4. Berkovsky, S. & Freyne, J. (2010). Group-based recipe recommendations: analysis of data aggregation strategies. In Proc. 4th ACM conf. Recommender Systems, RecSys ’10 (pp. 111–118). New York: ACM. doi: 10.1145/1864708.1864732.CrossRefGoogle Scholar
  5. Bilolikar, V., Jain, K., Sharma, M. (2012). An annealed genetic algorithm for multi mode resource constrained project scheduling problem. International Journal of Computers and Applications, 60(1), 36–42.Google Scholar
  6. Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6), 520–528.CrossRefGoogle Scholar
  7. Chhabra, S. & Resnick, P. (2012). Cubethat: news article recommender. In Proc. 6th ACM Conf. Recommender Systems, RecSys ’12 (pp. 295–296). New York: ACM. doi: 10.1145/2365952.2366020.CrossRefGoogle Scholar
  8. Das, A., Datar, M., Garg, A., Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proc. 16th int. conf. world wide web (pp. 271–280). ACM.Google Scholar
  9. De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L. (2011). Content-based recommendation algorithms on the hadoop mapreduce framework. In Proc. 7th int. conf. web information systems and technologies. Ghent University, Department of Information Technology.Google Scholar
  10. Dean, J. & Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.CrossRefGoogle Scholar
  11. Dooms, S., De Pessemier, T., Martens, L. (2011). A file-based approach for recommender systems in high-performance computing environments. In Proc. 22nd int. workshop on database and expert systems applications (pp. 529–533). IEEE. doi: 10.1109/DEXA.2011.3.
  12. Dooms, S., De Pessemier, T., Martens, L. (2011). An online evaluation of explicit feedback mechanisms for recommender systems. In Proc. 7th int. conf. web information systems and technologies (pp. 391–394).Google Scholar
  13. Dooms, S., De Pessemier, T., Martens, L. (2011). A user-centric evaluation of recommender algorithms for an event recommendation system. In Workshop on Human Decision Making in Recommender Systems (Decisions@RecSys’11) and User-Centric Evaluation of Recommender Systems and Their Interfaces—2 (UCERSTI 2) affiliated with 5th ACM Conf. Recommender Systems (RecSys 2011) (pp. 67–73).Google Scholar
  14. Gomez-Gasquet, P., Segura-Andres, R., Franco, D., Andres, C. (2012). A makespan minimization in an m-stage flow shop lot streaming with sequence dependent setup times: Milp model and experimental approach. In 6th int. conf. industrial engineering and industrial management (pp. 332–339).Google Scholar
  15. Hager, G. & Wellein, G. (2010). Introduction to high performance computing for scientists and engineers (1st ed.). Boca Raton: CRC Press, Inc.CrossRefGoogle Scholar
  16. Han, P., Xie, B., Yang, F., Shen, R. (2004). A scalable p2p recommender system based on distributed collaborative filtering. Expert Systems with Applications, 27(2), 203–210. doi: 10.1016/j.eswa.2004.01.003. Scholar
  17. Herlocker, J., Konstan, J.A., Riedl, J. (2002). An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310.CrossRefGoogle Scholar
  18. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In Proc. 22nd int. ACM SIGIR conf. research and development in information retrieval (pp. 230–237). ACM.Google Scholar
  19. Hochbaum, D.S. & Shmoys, D.B. (1987). Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of the ACM, 34(1), 144–162. doi: 10.1145/7531.7535.CrossRefMathSciNetGoogle Scholar
  20. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G. (2010). Recommender systems: An introduction. Cambridge University Press.Google Scholar
  21. Jiang, J., Lu, J., Zhang, G., Long, G. (2011). Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In 2011 IEEE world congress on services (SERVICES) (pp. 490–497). IEEE.Google Scholar
  22. Keckler, S., Olukotun, K., Hofstee, H. (2009). Multicore processors and systems. SpringerGoogle Scholar
  23. Lämmel, R. (2008). Googles mapreduce programming model revisited. Science of Computer Programming, 70(1), 1–30.CrossRefzbMATHMathSciNetGoogle Scholar
  24. Levi, A., Mokryn, O., Diot, C., Taft, N. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 115–122). New York: ACM. doi: 10.1145/2365952.2365977.CrossRefGoogle Scholar
  25. Liu, M., Zheng, F., Wang, S., Xu, Y. (2013). Approximation algorithms for parallel machine scheduling with linear deterioration. Theoretical Computer Science, 497, 108–111. doi: 10.1016/j.tcs.2012.01.020.CrossRefzbMATHMathSciNetGoogle Scholar
  26. McCarthy, J.F. & Anagnost, T.D. (1998). MusicFX: an arbiter of group preferences for computer supported collaborative workouts. In Proc. ACM conf. Computer Supported Cooperative Work, CSCW ’98 (pp. 363–372). New York: ACM. doi: 10.1145/289444.289511.Google Scholar
  27. Pera, M.S. & Ng, Y.K. (2012). Personalized recommendations on books for k-12 readers. In Proc. 5th ACM workshop on Research advances in large digital book repositories and complementary media, BooksOnline ’12 (pp. 11–12). New York: ACM. doi: 10.1145/2390116.2390124.CrossRefGoogle Scholar
  28. Peralta, V. (2007). Extraction and integration of movielens and imdb data. Tech. rep., Technical Report, Laboratoire PRiSM, Université de Versailles, France.Google Scholar
  29. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Tech. rep., DTIC Document.Google Scholar
  30. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2002). Incremental singular value decomposition algorithms for highly scalable recommender systems. In 5th int. conf. computer and information science (pp. 27–28). Citeseer.Google Scholar
  31. Schelter, S., Boden, C., Markl, V. (2012). Scalable similarity-based neighborhood methods with mapreduce. In pROC. 6th ACM conf. on recommender systems (pp. 163–170). ACM.Google Scholar
  32. Symeonidis, P., Nanopoulos, A., Manolopoulos, Y. (2009). Moviexplain: a recommender system with explanations. In Proc. 3rd ACM conf. Recommender Systems, RecSys ’09 (pp. 317–320). New York: ACM. doi: 10.1145/1639714.1639777.Google Scholar
  33. Takács, G., Pilászy, I., Németh, B., Tikk, D. (2009). Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, 623–656.Google Scholar
  34. Xie, B., Han, P., Yang, F., Shen, R.M., Zeng, H.J., Chen, Z. (2007). Dcfla: a distributed collaborative-filtering neighbor-locating algorithm. Information Sciences, 177(6), 1349–1363. doi: 10.1016/j.ins.2006.09.005.CrossRefGoogle Scholar
  35. Yang, D., Chen, T., Zhang, W., Lu, Q., Yu, Y. (2012). Local implicit feedback mining for music recommendation. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 91–98). New York: ACM. doi: 10.1145/2365952.2365973.CrossRefGoogle Scholar
  36. Zhao, Z. & Shang, M. (2010). User-based collaborative-filtering recommendation algorithms on hadoop. In 3rd int. conf. Knowledge Discovery and Data Mining (WKDD’10) (pp. 478–481). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Simon Dooms
    • 1
    Email author
  • Pieter Audenaert
    • 2
  • Jan Fostier
    • 2
  • Toon De Pessemier
    • 1
  • Luc Martens
    • 1
  1. 1.WicaiMinds-Ghent UniversityGhentBelgium
  2. 2.IBCNiMinds-Ghent UniversityGhentBelgium

Personalised recommendations