Skip to main content
Log in

A distributed group recommendation system based on extreme gradient boosting and big data technologies

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Personalized recommendation systems have emerged as useful tools for recommending the appropriate items to individual users. However, in such situations, some items tend to be consumed by groups of users, such as tourist attractions or television programs. With this purpose in mind, Group Recommender Systems (GRSs) are tailored to help groups of users to find suitable items according to their preferences and needs. In general, these systems often confront the sparsity problem, which negatively affects their efficiency. With the increase in the number of users, items, groups, and ratings in the system. Data becomes too big to be processed efficiently by traditional systems. Thus, there is an increasing need for distributed recommendation approaches able to manage the issues related to Big Data and sparsity problem. In this paper, we propose a distributed group recommendation system, which is designed based on Apache Spark to handle large-scale data. It integrates a novel proposed recommendation method, a dimension reduction technique, with supervised and unsupervised learning for dealing efficiently with the curse of dimensionality problem, detecting the groups of users, and improving the prediction quality. Experimental results on three real-world data sets show that our proposal is significantly better than other competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://webscope.sandbox.yahoo.com/catalog.php?datatype=r

  2. https://grouplens.org/datasets/movielens/1m/

  3. http://grouplens.org/datasets/movielens/100k/

  4. https://www.netflixprize.com/

References

  1. Castro J, Lu J, Zhang G, Dong Y, Martínez L (2018) Opinion dynamics-based group recommender systems. IEEE Trans Syst Man Cybern Syst Hum 48(12):2394–2406. https://doi.org/10.1109/TSMC.2017.2695158

    Article  Google Scholar 

  2. Ekstrand MD, Riedl JT, Konstan JA (2011) Collaborative filtering recommender systems. Foundations and Trends®. Human–Comput Interact 4(2):81–173

    Google Scholar 

  3. Dakhel AM, Malazi HT, Mahdavi M (2018) A social recommender system using item asymmetric correlation. Appl Intell 48(3):527–540

    Article  Google Scholar 

  4. Hammou BA, Lahcen AA (2017) FRAIPA: A fast recommendation approach with improved prediction accuracy. Expert Syst Appl 87:90–97

    Article  Google Scholar 

  5. Zhang F, Gong T, Lee VE, Zhao G, Rong C, Qu G (2016) Fast algorithms to evaluate collaborative filtering recommender systems. Knowl-Based Syst 96:96–103

    Article  Google Scholar 

  6. Christensen IA, Schiaffino S (2011) Entertainment recommender systems for group of users. Expert Syst Appl 38(11):14127–14135

    Google Scholar 

  7. Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media

  8. Ricci F, Rokach L, Shapira B (2015) Recommender systems: introduction and challenges. In: Recommender systems handbook. Springer, Boston, pp 1–34

    Chapter  Google Scholar 

  9. Castro J, Yera R, Martínez L (2018) A fuzzy approach for natural noise management in group recommender systems. Expert Syst Appl 94:237–249

    Article  Google Scholar 

  10. Boratto L, Carta S, Fenu G (2016) Discovery and representation of the preferences of automatically detected groups: Exploiting the link between group modeling and clustering. Fut Gener Comput Syst 64:165–174

    Article  Google Scholar 

  11. Boratto L, Carta S, Fenu G (2017) Investigating the role of the rating prediction task in granularity-based group recommender systems and big data scenarios. Inf Sci 378:424–443

    Article  Google Scholar 

  12. Apache Spark, https://spark.apache.org. Last accessed July 10, 2018

  13. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, pp 2–2

  14. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica, I (2010) Spark: Cluster computing with working sets. HotCloud 10(10-10):95

    Google Scholar 

  15. Apache Cassandra, https://cassandra.apache.org. Last accessed July 10, 2018

  16. Lakshman A, Malik, P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40

    Article  Google Scholar 

  17. Apache Spark’s scalable machine learning library (MLlib), https://spark.apache.org/mllib/. Last accessed July 10, 2018

  18. Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Xin, D (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241

    MathSciNet  MATH  Google Scholar 

  19. Shani G, Gunawardana A (2011) Evaluating recommendation systems. In: Recommender systems handbook. Springer, Boston, pp 257–297

    Google Scholar 

  20. McCarthy JF, Anagnost T MusicFX: An Arbiter of Group Preferences for Computer-Supported Cooperative Workouts. In: 1998 ACM Conference on Computer-Supported Cooperative Work (CSCW’98)

  21. Chao DL, Balthrop J, Forrest S (2005) Adaptive radio: achieving consensus using negative preferences. In: Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work. ACM, pp 120–123

  22. O’connor M, Cosley D, Konstan JA, Riedl J (2001) PolyLens: a recommender system for groups of users. In: ECSCW 2001. Springer, Dordrecht, pp 199–218

  23. Aggarwal CC (2016) Recommender systems. Springer International Publishing, Cham, pp 1–28

    Book  Google Scholar 

  24. Ardissono L, Goy A, Petrone G, Segnan M, Torasso P (2001) Tailoring the recommendation of tourist information to heterogeneous user groups. In: Workshop on adaptive hypermedia. Springer, Berlin, pp 280–295

    Chapter  Google Scholar 

  25. Yu Z, Zhou X, Hao Y, Gu, J (2006) TV program recommendation for multiple viewers based on user profile merging. User Model User-Adapt Interact 16(1):63–82

    Article  Google Scholar 

  26. Quijano-Sanchez L, Recio-Garcia JA, Diaz-Agudo B, Jimenez-Diaz, G (2013) Social factors in group recommender systems. ACM Trans Intell Syst Technol (TIST) 4(1):8

    Google Scholar 

  27. Chen YL, Cheng LC, Chuang, CN (2008) A group recommendation system with consideration of interactions among group members. Expert Syst Appl 34(3):2082–2090

    Article  Google Scholar 

  28. Agarwal A, Chakraborty M, Chowdary, C R (2017) Does order matter? Effect of order in group recommendation. Expert Syst Appl 82:115–127

    Article  Google Scholar 

  29. Hammou BA, Lahcen AA, Mouline, S (2018) APRA: An approximate parallel recommendation algorithm for Big Data. Knowl-Based Syst 157:10–19

    Article  Google Scholar 

  30. Garcia I, Pajares S, Sebastia L, Onaindia, E (2012) Preference elicitation techniques for group recommender systems. Inf Sci 189:155–175

    Article  Google Scholar 

  31. Castro J, Yera R, martínez L (2017) An empirical study of natural noise management in group recommendation systems. Decis Support Syst 94:1–11

    Article  Google Scholar 

  32. Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577– 584

    Google Scholar 

  33. Zhang YW, Zhou YY, Wang FT, Sun Z, He, Q (2018) Service recommendation based on quotient space granularity analysis and covering algorithm on Spark. Knowl-Based Syst 147:25–35

    Article  Google Scholar 

  34. Kashef R, Kamel, MS (2009) Enhanced bisecting k-means clustering using intermediate cooperation. Pattern Recogn 42(11):2557–2569

    Article  Google Scholar 

  35. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794

  36. Harper FM, Konstan, JA (2016) The movielens datasets: History and context. ACM Trans Interact Intell Syst (tiis) 5(4):19

    Google Scholar 

  37. Hu R, Dou W, Liu, J (2014) ClubCF: A clustering-based collaborative filtering approach for big data application. IEEE Trans Emerg Top Comput 2(3):302–313

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Badr Ait Hammou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Extreme gradient boosting

Appendix: Extreme gradient boosting

XGBoost (Extreme gradient boosting) represents a powerful machine learning technique. It is a tree ensemble, which consists of a set of classification and regression trees (CARTs).

Mathematically, the XGBoost model is described as follows:

$$ \hat y_{i}=\sum\limits_{k=1}^{K} f_{k}(x_{i}), f_{k} \in \digamma $$
(35)

where K is the number of trees, f is a function in the functional space г, and г is the set of all possible CARTs.

The objective function to optimize is written as follows:

$$ obj=\sum\limits_{i=1}^{n} l(y_{i},\hat y_{i})+ \sum\limits_{{k=1}}^{K} {\Omega}(f_{k}) $$
(36)

where l is the loss function that measures the difference between the predicted value \(\hat y_i\) and the target value yi. While Ω represents the regularization term.

With regard to the training task, XGBoost is trained in an additive manner. The prediction value \(\hat y^{(t)}_i\) of the ith instance at the tth iteration is given by:

$$ \hat y^{(t)}_{i}=\hat y_{i}^{(t-1)}+f_{t}(x_{i}) $$
(37)

Therefore, the objective function at the tth iteration is defined as follows:

$$ obj^{(t)}=\sum\limits_{i=1}^{n} l(y_{i},\hat y^{(t-1)}_{i}+f_{t}(x_{i}))+ {\Omega}(f_{t}) $$
(38)

In order to approximate the objective function, Taylor expansion is employed. The formulation can be written as follows:

$$ obj^{(t)}=\sum\limits_{i=1}^{n} [ g_{i} f_{t}(x_{i}) + \frac{1}{2} h_{i} {f^{2}_{t}}(x_{i})]+ {\Omega}(f_{t}) $$
(39)

where gi and hi are defined as:

$$ \begin{array}{@{}rcl@{}} g_{i}&=&\partial_{\hat y^{(t-1)}_{i}} l(y_{i},\hat y^{(t-1)}_{i})\\ h_{i}&=&\partial^{2}_{\hat y^{(t-1)}_{i}} l(y_{i},\hat y^{(t-1)}_{i}) \end{array} $$
(40)

Let Ij be a set of instances assigned to the leaf j, after defining the regularization term Ω, (39) can be rewritten as follows:

$$ \begin{array}{@{}rcl@{}} obj^{(t)}&=&\sum\limits_{{i=1}}^{n} [ g_{i} f_{t}(x_{i}) + \frac{1}{2} h_{i} {f^{2}_{t}}(x_{i})]+ \gamma T + \frac{1}{2} \lambda \sum\limits_{{j=1}}^{T} {w^{2}_{j}}\\ &=&\sum\limits_{{j=1}}^{T}[(\sum\limits_{{i \in I_{j}}} g_{i}) w_{j} + \frac{1}{2}(\sum\limits_{{i \in I_{j}}} h_{i}+\lambda) {w_{j}^{2}}] + \gamma T \end{array} $$
(41)

where T is the number of leaves in the tree, λ and γ are the regularization parameters, and wj is the weight of leaf j.

Given a fixed tree structure, the optimal weight wj∗ and the objective are computed as follows:

$$ \begin{array}{@{}rcl@{}} w_{j}^{*}=-\frac{{\sum}_{{i \in I_{j}}} g_{i}}{{\sum}_{{i \in I_{j}}} h_{i}+\lambda}&&\\ obj^{*}=-\frac{1}{2} \sum\limits_{{j=1}}^{T} \frac{{\sum}_{{i \in I_{j}}} g_{i}}{{\sum}_{{i \in I_{j}}} h_{i}+\lambda}+\gamma T&& \end{array} $$
(42)

The smaller objective value indicates the better structure.

Regarding the construction of trees, XGBoost adopts a greedy algorithm, which starts from tree with depth 0, and iteratively splits the leaves of the tree.

The gain after adding a split is measured as follows:

$$ \begin{array}{@{}rcl@{}} Gain&=&\frac{1}{2}[\frac{{\sum}_{{i \in I_{R}}} {g_{i}^{2}}}{{\sum}_{{i \in I_{R}}} h_{i}+\lambda}+\frac{{\sum}_{{i \in I_{L}}} {g_{i}^{2}}}{{\sum}_{{i \in I_{L}}} h_{i}+\lambda}\\ &&-\frac{{\sum}_{{i \in I}} {g_{i}^{2}}}{{\sum}_{{i \in I}} h_{i}+\lambda}]-\gamma \end{array} $$
(43)

where IR and IL denote the instance sets of right and left nodes after the split, respectively [35].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ait Hammou, B., Ait Lahcen, A. & Mouline, S. A distributed group recommendation system based on extreme gradient boosting and big data technologies. Appl Intell 49, 4128–4149 (2019). https://doi.org/10.1007/s10489-019-01482-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01482-9

Keywords

Navigation