A distributed group recommendation system based on extreme gradient boosting and big data technologies

Ait Hammou, Badr; Ait Lahcen, Ayoub; Mouline, Salma

doi:10.1007/s10489-019-01482-9

A distributed group recommendation system based on extreme gradient boosting and big data technologies

Published: 23 May 2019

Volume 49, pages 4128–4149, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

674 Accesses
17 Citations
Explore all metrics

Abstract

Personalized recommendation systems have emerged as useful tools for recommending the appropriate items to individual users. However, in such situations, some items tend to be consumed by groups of users, such as tourist attractions or television programs. With this purpose in mind, Group Recommender Systems (GRSs) are tailored to help groups of users to find suitable items according to their preferences and needs. In general, these systems often confront the sparsity problem, which negatively affects their efficiency. With the increase in the number of users, items, groups, and ratings in the system. Data becomes too big to be processed efficiently by traditional systems. Thus, there is an increasing need for distributed recommendation approaches able to manage the issues related to Big Data and sparsity problem. In this paper, we propose a distributed group recommendation system, which is designed based on Apache Spark to handle large-scale data. It integrates a novel proposed recommendation method, a dimension reduction technique, with supervised and unsupervised learning for dealing efficiently with the curse of dimensionality problem, detecting the groups of users, and improving the prediction quality. Experimental results on three real-world data sets show that our proposal is significantly better than other competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Notes

References

Castro J, Lu J, Zhang G, Dong Y, Martínez L (2018) Opinion dynamics-based group recommender systems. IEEE Trans Syst Man Cybern Syst Hum 48(12):2394–2406. https://doi.org/10.1109/TSMC.2017.2695158
Article Google Scholar
Ekstrand MD, Riedl JT, Konstan JA (2011) Collaborative filtering recommender systems. Foundations and Trends®. Human–Comput Interact 4(2):81–173
Google Scholar
Dakhel AM, Malazi HT, Mahdavi M (2018) A social recommender system using item asymmetric correlation. Appl Intell 48(3):527–540
Article Google Scholar
Hammou BA, Lahcen AA (2017) FRAIPA: A fast recommendation approach with improved prediction accuracy. Expert Syst Appl 87:90–97
Article Google Scholar
Zhang F, Gong T, Lee VE, Zhao G, Rong C, Qu G (2016) Fast algorithms to evaluate collaborative filtering recommender systems. Knowl-Based Syst 96:96–103
Article Google Scholar
Christensen IA, Schiaffino S (2011) Entertainment recommender systems for group of users. Expert Syst Appl 38(11):14127–14135
Google Scholar
Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media
Ricci F, Rokach L, Shapira B (2015) Recommender systems: introduction and challenges. In: Recommender systems handbook. Springer, Boston, pp 1–34
Chapter Google Scholar
Castro J, Yera R, Martínez L (2018) A fuzzy approach for natural noise management in group recommender systems. Expert Syst Appl 94:237–249
Article Google Scholar
Boratto L, Carta S, Fenu G (2016) Discovery and representation of the preferences of automatically detected groups: Exploiting the link between group modeling and clustering. Fut Gener Comput Syst 64:165–174
Article Google Scholar
Boratto L, Carta S, Fenu G (2017) Investigating the role of the rating prediction task in granularity-based group recommender systems and big data scenarios. Inf Sci 378:424–443
Article Google Scholar
Apache Spark, https://spark.apache.org. Last accessed July 10, 2018
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, pp 2–2
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica, I (2010) Spark: Cluster computing with working sets. HotCloud 10(10-10):95
Google Scholar
Apache Cassandra, https://cassandra.apache.org. Last accessed July 10, 2018
Lakshman A, Malik, P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40
Article Google Scholar
Apache Spark’s scalable machine learning library (MLlib), https://spark.apache.org/mllib/. Last accessed July 10, 2018
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Xin, D (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
MathSciNet MATH Google Scholar
Shani G, Gunawardana A (2011) Evaluating recommendation systems. In: Recommender systems handbook. Springer, Boston, pp 257–297
Google Scholar
McCarthy JF, Anagnost T MusicFX: An Arbiter of Group Preferences for Computer-Supported Cooperative Workouts. In: 1998 ACM Conference on Computer-Supported Cooperative Work (CSCW’98)
Chao DL, Balthrop J, Forrest S (2005) Adaptive radio: achieving consensus using negative preferences. In: Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work. ACM, pp 120–123
O’connor M, Cosley D, Konstan JA, Riedl J (2001) PolyLens: a recommender system for groups of users. In: ECSCW 2001. Springer, Dordrecht, pp 199–218
Aggarwal CC (2016) Recommender systems. Springer International Publishing, Cham, pp 1–28
Book Google Scholar
Ardissono L, Goy A, Petrone G, Segnan M, Torasso P (2001) Tailoring the recommendation of tourist information to heterogeneous user groups. In: Workshop on adaptive hypermedia. Springer, Berlin, pp 280–295
Chapter Google Scholar
Yu Z, Zhou X, Hao Y, Gu, J (2006) TV program recommendation for multiple viewers based on user profile merging. User Model User-Adapt Interact 16(1):63–82
Article Google Scholar
Quijano-Sanchez L, Recio-Garcia JA, Diaz-Agudo B, Jimenez-Diaz, G (2013) Social factors in group recommender systems. ACM Trans Intell Syst Technol (TIST) 4(1):8
Google Scholar
Chen YL, Cheng LC, Chuang, CN (2008) A group recommendation system with consideration of interactions among group members. Expert Syst Appl 34(3):2082–2090
Article Google Scholar
Agarwal A, Chakraborty M, Chowdary, C R (2017) Does order matter? Effect of order in group recommendation. Expert Syst Appl 82:115–127
Article Google Scholar
Hammou BA, Lahcen AA, Mouline, S (2018) APRA: An approximate parallel recommendation algorithm for Big Data. Knowl-Based Syst 157:10–19
Article Google Scholar
Garcia I, Pajares S, Sebastia L, Onaindia, E (2012) Preference elicitation techniques for group recommender systems. Inf Sci 189:155–175
Article Google Scholar
Castro J, Yera R, martínez L (2017) An empirical study of natural noise management in group recommendation systems. Decis Support Syst 94:1–11
Article Google Scholar
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577– 584
Google Scholar
Zhang YW, Zhou YY, Wang FT, Sun Z, He, Q (2018) Service recommendation based on quotient space granularity analysis and covering algorithm on Spark. Knowl-Based Syst 147:25–35
Article Google Scholar
Kashef R, Kamel, MS (2009) Enhanced bisecting k-means clustering using intermediate cooperation. Pattern Recogn 42(11):2557–2569
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794
Harper FM, Konstan, JA (2016) The movielens datasets: History and context. ACM Trans Interact Intell Syst (tiis) 5(4):19
Google Scholar
Hu R, Dou W, Liu, J (2014) ClubCF: A clustering-based collaborative filtering approach for big data application. IEEE Trans Emerg Top Comput 2(3):302–313
Article Google Scholar

Download references

Author information

Authors and Affiliations

LRIT, Associated Unit to CNRST (URAC 29), Rabat IT Center, Faculty of Sciences, Mohammed V University, Rabat, Morocco
Badr Ait Hammou, Ayoub Ait Lahcen & Salma Mouline
LGS, National School of Applied Sciences (ENSA), Ibn Tofail University, Kenitra, Morocco
Ayoub Ait Lahcen

Authors

Badr Ait Hammou
View author publications
You can also search for this author in PubMed Google Scholar
Ayoub Ait Lahcen
View author publications
You can also search for this author in PubMed Google Scholar
Salma Mouline
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Badr Ait Hammou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Extreme gradient boosting

XGBoost (Extreme gradient boosting) represents a powerful machine learning technique. It is a tree ensemble, which consists of a set of classification and regression trees (CARTs).

Mathematically, the XGBoost model is described as follows:

$$ \hat y_{i}=\sum\limits_{k=1}^{K} f_{k}(x_{i}), f_{k} \in \digamma $$

(35)

where K is the number of trees, f is a function in the functional space г, and г is the set of all possible CARTs.

The objective function to optimize is written as follows:

$$ obj=\sum\limits_{i=1}^{n} l(y_{i},\hat y_{i})+ \sum\limits_{{k=1}}^{K} {\Omega}(f_{k}) $$

(36)

where l is the loss function that measures the difference between the predicted value $\hat y_i$ and the target value y_i. While Ω represents the regularization term.

With regard to the training task, XGBoost is trained in an additive manner. The prediction value $\hat y^{(t)}_i$ of the i_th instance at the t_th iteration is given by:

$$ \hat y^{(t)}_{i}=\hat y_{i}^{(t-1)}+f_{t}(x_{i}) $$

(37)

Therefore, the objective function at the t_th iteration is defined as follows:

$$ obj^{(t)}=\sum\limits_{i=1}^{n} l(y_{i},\hat y^{(t-1)}_{i}+f_{t}(x_{i}))+ {\Omega}(f_{t}) $$

(38)

In order to approximate the objective function, Taylor expansion is employed. The formulation can be written as follows:

$$ obj^{(t)}=\sum\limits_{i=1}^{n} [ g_{i} f_{t}(x_{i}) + \frac{1}{2} h_{i} {f^{2}_{t}}(x_{i})]+ {\Omega}(f_{t}) $$

(39)

where g_i and h_i are defined as:

$$ \begin{array}{@{}rcl@{}} g_{i}&=&\partial_{\hat y^{(t-1)}_{i}} l(y_{i},\hat y^{(t-1)}_{i})\\ h_{i}&=&\partial^{2}_{\hat y^{(t-1)}_{i}} l(y_{i},\hat y^{(t-1)}_{i}) \end{array} $$

(40)

Let I_j be a set of instances assigned to the leaf j, after defining the regularization term Ω, (39) can be rewritten as follows:

$$ \begin{array}{@{}rcl@{}} obj^{(t)}&=&\sum\limits_{{i=1}}^{n} [ g_{i} f_{t}(x_{i}) + \frac{1}{2} h_{i} {f^{2}_{t}}(x_{i})]+ \gamma T + \frac{1}{2} \lambda \sum\limits_{{j=1}}^{T} {w^{2}_{j}}\\ &=&\sum\limits_{{j=1}}^{T}[(\sum\limits_{{i \in I_{j}}} g_{i}) w_{j} + \frac{1}{2}(\sum\limits_{{i \in I_{j}}} h_{i}+\lambda) {w_{j}^{2}}] + \gamma T \end{array} $$

(41)

where T is the number of leaves in the tree, λ and γ are the regularization parameters, and w_j is the weight of leaf j.

Given a fixed tree structure, the optimal weight wj∗ and the objective are computed as follows:

$$ \begin{array}{@{}rcl@{}} w_{j}^{*}=-\frac{{\sum}_{{i \in I_{j}}} g_{i}}{{\sum}_{{i \in I_{j}}} h_{i}+\lambda}&&\\ obj^{*}=-\frac{1}{2} \sum\limits_{{j=1}}^{T} \frac{{\sum}_{{i \in I_{j}}} g_{i}}{{\sum}_{{i \in I_{j}}} h_{i}+\lambda}+\gamma T&& \end{array} $$

(42)

The smaller objective value indicates the better structure.

Regarding the construction of trees, XGBoost adopts a greedy algorithm, which starts from tree with depth 0, and iteratively splits the leaves of the tree.

The gain after adding a split is measured as follows:

$$ \begin{array}{@{}rcl@{}} Gain&=&\frac{1}{2}[\frac{{\sum}_{{i \in I_{R}}} {g_{i}^{2}}}{{\sum}_{{i \in I_{R}}} h_{i}+\lambda}+\frac{{\sum}_{{i \in I_{L}}} {g_{i}^{2}}}{{\sum}_{{i \in I_{L}}} h_{i}+\lambda}\\ &&-\frac{{\sum}_{{i \in I}} {g_{i}^{2}}}{{\sum}_{{i \in I}} h_{i}+\lambda}]-\gamma \end{array} $$

(43)

where I_R and I_L denote the instance sets of right and left nodes after the split, respectively [35].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ait Hammou, B., Ait Lahcen, A. & Mouline, S. A distributed group recommendation system based on extreme gradient boosting and big data technologies. Appl Intell 49, 4128–4149 (2019). https://doi.org/10.1007/s10489-019-01482-9

Download citation

Published: 23 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10489-019-01482-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed group recommendation system based on extreme gradient boosting and big data technologies

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Extreme gradient boosting

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A distributed group recommendation system based on extreme gradient boosting and big data technologies

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Extreme gradient boosting

Appendix: Extreme gradient boosting

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation