Abstract
In a data science theory, the recommended methodology is one of the most popular theories and has been deployed in many real industries. However, one of the most challenging problems these days is how to recommend items with massively streaming data. Therefore, this paper aims to do a real-time recommendation engine using the Lambda architecture. The Apache Hadoop and Apache Spark frameworks were used in this research to process the MovieLens dataset comprised 100 K and 20 M ratings from the GroupLens research. Using alternating least squares (ALS) and k-means algorithms, the top K recommendation movies and the top K trending movies for each user were shown as results. Additionally, the mean squared error (MSE) and within cluster sum of squared error (WCSS) had been computed to evaluate the performance of the ALS and k-means algorithms, sequentially. The results showed that they are acceptable since the MSE and WCSS values are low when comparing to the size of data. However, they can still be improved by tuning some parameters.
Similar content being viewed by others
References
Kantor PB, Rokach L, Ricci F, Shapira B (2011) Recommender systems handbook. Springer, Berlin
Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7:76–80
Aggarwal CC (2016) Recommender systems. Springer, Switzerland
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. J Comput 42(8):30–37
Pentreath N (2015) Machine learning with spark. Packt Publishing, Birmingham
Panigrahia S, Lenkaa RK, Stitipragyana A (2016) A hybrid distributed collaborative filtering recommender engine using Apache Spark. International workshop on big data and data mining challenges on IoT and pervasive systems (BigD2M 2016), pp 1000–1006
Karanth S (2014) Mastering Hadoop. Packt Publishing, Birmingham
Shvachko K (2010) The Hadoop distributed file system. In: Proceeding of 2010 IEEE 26th symposium, mass storage system and technology, (MSST’10), pp 1–10
Deanand J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. OSDI
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, USENIX Association
Marz N, Warren J (2013) Big Data: principles and best practices of scalable real-time data systems. O’Reilly Media, Newton
Gong S (2010) A collaborative filtering recommendation algorithm based on user clustering and item clustering. JSW 5(7):745–752
Phorasim P, Yu L (2016) Movies recommendation system using collaborative filtering and k-means. Int J Adv Comput Res 7(29):52
Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the Netflix prize. Algorithmic aspects in information and management. Springer, Berlin, pp 337–348
Phulari SV, Shah PP, Kalpande AD, Pawar VA (2016) Clustering and filtering approach for searching Big Data application query. Int J Eng Sci Innov Technol 5(1):197–204
Liu Q, Xiaobing L (2015) A new parallel item-based collaborative filtering algorithm based on Hadoop. JSW 10(4):416–426
Dutta K, Jayapal M (2015) Big Data analytics in real time systems. In: Big Data analytics seminar, pp 1–13
Huang Y, Cui B, Zhang W, Jiang J, Xu Y (2015) TencentRec—real-time stream recommendation in practice, SIGMOD’15
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Numnonda, T. A real-time recommendation engine using lambda architecture. Artif Life Robotics 23, 249–254 (2018). https://doi.org/10.1007/s10015-017-0424-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-017-0424-8