In-memory, distributed content-based recommender system

Abstract

Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    http://www.grouplens.org/node/73

  2. 2.

    http://www.netflixprize.com

  3. 3.

    http://kddcup.yahoo.com/datasets.php

  4. 4.

    http://mahout.apache.org

  5. 5.

    https://github.com/sidooms/DistributedCB

References

  1. Ahmadizar, F. (2012). A new ant colony algorithm for makespan minimization in permutation flow shops. Computers & Industrial Engineering.

  2. Amdahl, G. (1967). Validity of the single processor approach to achieving large scale computing capabilities. In Proc. spring joint computer conf. (pp. 483–485). ACM.

  3. Anand, S.S. & Mobasher, B. (2003). Intelligent techniques for web personalization. In Proc. int. conf. intelligent techniques for web personalization (pp. 1–36). Springer.

  4. Berkovsky, S. & Freyne, J. (2010). Group-based recipe recommendations: analysis of data aggregation strategies. In Proc. 4th ACM conf. Recommender Systems, RecSys ’10 (pp. 111–118). New York: ACM. doi:10.1145/1864708.1864732.

    Chapter  Google Scholar 

  5. Bilolikar, V., Jain, K., Sharma, M. (2012). An annealed genetic algorithm for multi mode resource constrained project scheduling problem. International Journal of Computers and Applications, 60(1), 36–42.

    Google Scholar 

  6. Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6), 520–528.

    Article  Google Scholar 

  7. Chhabra, S. & Resnick, P. (2012). Cubethat: news article recommender. In Proc. 6th ACM Conf. Recommender Systems, RecSys ’12 (pp. 295–296). New York: ACM. doi:10.1145/2365952.2366020.

    Chapter  Google Scholar 

  8. Das, A., Datar, M., Garg, A., Rajaram, S. (2007). Google news personalization: scalable online collaborative filtering. In Proc. 16th int. conf. world wide web (pp. 271–280). ACM.

  9. De Pessemier, T., Vanhecke, K., Dooms, S., Martens, L. (2011). Content-based recommendation algorithms on the hadoop mapreduce framework. In Proc. 7th int. conf. web information systems and technologies. Ghent University, Department of Information Technology.

  10. Dean, J. & Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.

    Article  Google Scholar 

  11. Dooms, S., De Pessemier, T., Martens, L. (2011). A file-based approach for recommender systems in high-performance computing environments. In Proc. 22nd int. workshop on database and expert systems applications (pp. 529–533). IEEE. doi:10.1109/DEXA.2011.3.

  12. Dooms, S., De Pessemier, T., Martens, L. (2011). An online evaluation of explicit feedback mechanisms for recommender systems. In Proc. 7th int. conf. web information systems and technologies (pp. 391–394).

  13. Dooms, S., De Pessemier, T., Martens, L. (2011). A user-centric evaluation of recommender algorithms for an event recommendation system. In Workshop on Human Decision Making in Recommender Systems (Decisions@RecSys’11) and User-Centric Evaluation of Recommender Systems and Their Interfaces—2 (UCERSTI 2) affiliated with 5th ACM Conf. Recommender Systems (RecSys 2011) (pp. 67–73).

  14. Gomez-Gasquet, P., Segura-Andres, R., Franco, D., Andres, C. (2012). A makespan minimization in an m-stage flow shop lot streaming with sequence dependent setup times: Milp model and experimental approach. In 6th int. conf. industrial engineering and industrial management (pp. 332–339).

  15. Hager, G. & Wellein, G. (2010). Introduction to high performance computing for scientists and engineers (1st ed.). Boca Raton: CRC Press, Inc.

    Book  Google Scholar 

  16. Han, P., Xie, B., Yang, F., Shen, R. (2004). A scalable p2p recommender system based on distributed collaborative filtering. Expert Systems with Applications, 27(2), 203–210. doi:10.1016/j.eswa.2004.01.003. http://www.sciencedirect.com/science/article/pii/S0957417404000065.

    Article  Google Scholar 

  17. Herlocker, J., Konstan, J.A., Riedl, J. (2002). An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310.

    Article  Google Scholar 

  18. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. In Proc. 22nd int. ACM SIGIR conf. research and development in information retrieval (pp. 230–237). ACM.

  19. Hochbaum, D.S. & Shmoys, D.B. (1987). Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of the ACM, 34(1), 144–162. doi:10.1145/7531.7535.

    Article  MathSciNet  Google Scholar 

  20. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G. (2010). Recommender systems: An introduction. Cambridge University Press.

  21. Jiang, J., Lu, J., Zhang, G., Long, G. (2011). Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In 2011 IEEE world congress on services (SERVICES) (pp. 490–497). IEEE.

  22. Keckler, S., Olukotun, K., Hofstee, H. (2009). Multicore processors and systems. Springer

  23. Lämmel, R. (2008). Googles mapreduce programming model revisited. Science of Computer Programming, 70(1), 1–30.

    Article  MATH  MathSciNet  Google Scholar 

  24. Levi, A., Mokryn, O., Diot, C., Taft, N. (2012). Finding a needle in a haystack of reviews: cold start context-based hotel recommender system. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 115–122). New York: ACM. doi:10.1145/2365952.2365977.

    Chapter  Google Scholar 

  25. Liu, M., Zheng, F., Wang, S., Xu, Y. (2013). Approximation algorithms for parallel machine scheduling with linear deterioration. Theoretical Computer Science, 497, 108–111. doi:10.1016/j.tcs.2012.01.020.

    Article  MATH  MathSciNet  Google Scholar 

  26. McCarthy, J.F. & Anagnost, T.D. (1998). MusicFX: an arbiter of group preferences for computer supported collaborative workouts. In Proc. ACM conf. Computer Supported Cooperative Work, CSCW ’98 (pp. 363–372). New York: ACM. doi:10.1145/289444.289511.

    Google Scholar 

  27. Pera, M.S. & Ng, Y.K. (2012). Personalized recommendations on books for k-12 readers. In Proc. 5th ACM workshop on Research advances in large digital book repositories and complementary media, BooksOnline ’12 (pp. 11–12). New York: ACM. doi:10.1145/2390116.2390124.

    Chapter  Google Scholar 

  28. Peralta, V. (2007). Extraction and integration of movielens and imdb data. Tech. rep., Technical Report, Laboratoire PRiSM, Université de Versailles, France.

  29. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000). Application of dimensionality reduction in recommender system-a case study. Tech. rep., DTIC Document.

  30. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2002). Incremental singular value decomposition algorithms for highly scalable recommender systems. In 5th int. conf. computer and information science (pp. 27–28). Citeseer.

  31. Schelter, S., Boden, C., Markl, V. (2012). Scalable similarity-based neighborhood methods with mapreduce. In pROC. 6th ACM conf. on recommender systems (pp. 163–170). ACM.

  32. Symeonidis, P., Nanopoulos, A., Manolopoulos, Y. (2009). Moviexplain: a recommender system with explanations. In Proc. 3rd ACM conf. Recommender Systems, RecSys ’09 (pp. 317–320). New York: ACM. doi:10.1145/1639714.1639777.

    Google Scholar 

  33. Takács, G., Pilászy, I., Németh, B., Tikk, D. (2009). Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, 623–656.

    Google Scholar 

  34. Xie, B., Han, P., Yang, F., Shen, R.M., Zeng, H.J., Chen, Z. (2007). Dcfla: a distributed collaborative-filtering neighbor-locating algorithm. Information Sciences, 177(6), 1349–1363. doi:10.1016/j.ins.2006.09.005.

    Article  Google Scholar 

  35. Yang, D., Chen, T., Zhang, W., Lu, Q., Yu, Y. (2012). Local implicit feedback mining for music recommendation. In Proc. 6th ACM conf. Recommender Systems, RecSys ’12 (pp. 91–98). New York: ACM. doi:10.1145/2365952.2365973.

    Chapter  Google Scholar 

  36. Zhao, Z. & Shang, M. (2010). User-based collaborative-filtering recommendation algorithms on hadoop. In 3rd int. conf. Knowledge Discovery and Data Mining (WKDD’10) (pp. 478–481). IEEE.

Download references

Acknowledgements

The described research activities were funded by a PhD grant to Simon Dooms of the Agency for Innovation by Science and Technology (IWT Vlaanderen). The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by Ghent University, the Hercules Foundation and the Flemish Government—department EWI.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Simon Dooms.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Dooms, S., Audenaert, P., Fostier, J. et al. In-memory, distributed content-based recommender system. J Intell Inf Syst 42, 645–669 (2014). https://doi.org/10.1007/s10844-013-0276-1

Download citation

Keywords

  • Recommender system
  • Distributed
  • Parallel
  • Speedup