Scalable Collaborative Filtering Based on Splitting-Merging Clustering Algorithm

  • Nabil BelacelEmail author
  • Guillaume Durand
  • Serge Leger
  • Cajetan Bouchard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11352)


Recommender systems apply information filtering technologies to identify a set of items that could be of interest to a user. Collaborative filtering (CF) is one of the most well-known successful filtering techniques in recommender systems and has been widely applied. However the usual CF techniques face issues that limit their application, especially in dealing with highly sparse and large-scale data. For instance, CF algorithms using the k-Nearest Neighbor approach are very efficient in filtering interesting items to users but in the same time they require a very expensive computation and grow non-linearly with the number of users and items in a database. To address this scalability issues, some researchers propose to use clustering methods. K-means is among the well-known clustering algorithms but has the shortcomings of dependency on the number of the clusters and on the initial centroids, which lead to inaccurate recommendations and increase computation time. In this paper, we will show by comparing with K-means based approaches how a clustering algorithm called K-means+ that considers the statistical nature of data can improve the performances of recommendation with reasonable computation time. The results presented that predictions of substantially better quality are obtained with the proposed K-means+ method. These results also provide significant evidences that the proposed Splitting-Merging clustering based CF is more scalable than the conventional one.


Recommender systems Collaborative filtering Information filtering Clustering Splitting-merging clustering 


  1. 1.
    Bobadilla, J., Ortega, F., Hernando, A., GutiéRrez, A.: Recommender systems survey. Knowl.-Based Syst. 46, 109–132 (2013)CrossRefGoogle Scholar
  2. 2.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)Google Scholar
  3. 3.
    Deshpande, M., Karypis, G.: Item-based top-N recommendation algorithms. ACM Trans. Inf. Syst. (TOIS) 22, 143–177 (2004)CrossRefGoogle Scholar
  4. 4.
    Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adapt. Interact. 12, 331–370 (2002)CrossRefGoogle Scholar
  5. 5.
    Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007). Scholar
  6. 6.
    Lu, J., Wu, D., Mao, M., Wang, W., Zhang, G.: Recommender system application developments: a survey. Decis. Support. Syst. 74, 12–32 (2015)CrossRefGoogle Scholar
  7. 7.
    Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 4 (2009)CrossRefGoogle Scholar
  8. 8.
    Shi, Y., Larson, M., Hanjalic, A.: Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput. Surv. (CSUR) 47, 1–45 (2014)CrossRefGoogle Scholar
  9. 9.
    Polatidis, N., Georgiadis, C.K.: A multi-level collaborative filtering method that improves recommendations. Expert Syst. Appl. 48, 100–110 (2016)CrossRefGoogle Scholar
  10. 10.
    Linden, G., Smith, B., York, J.: recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 7, 76–80 (2003)CrossRefGoogle Scholar
  11. 11.
    Zhang, W.: Research on application of collaborative filtering in electronic commerce recommender systems. In: Lin, S., Huang, X. (eds.) CSEE 2011. CCIS, vol. 215, pp. 539–544. Springer, Heidelberg (2011). Scholar
  12. 12.
    Huang, Z., Zeng, D., Chen, H.: A comparative study of recommendation algorithms in e-commerce applications. EEE Intell. Syst. 22(5), 68–78 (2007)CrossRefGoogle Scholar
  13. 13.
    Bobadilla, J., Serradilla, F., Hernando, A.: Collaborative filtering adapted to recommender systems of e-learning. Knowl.-Based Syst. 22, 261–265 (2009)CrossRefGoogle Scholar
  14. 14.
    Shambour, Q., Lu, J.: A hybrid trust-enhanced collaborative filtering recommendation approach for personalized government-to-business e-services. Int. J. Intell. Syst. 26, 814–843 (2011)CrossRefGoogle Scholar
  15. 15.
    Zhang, Y., Chen, W., Yin, Z.: Collaborative filtering with social regularization for TV program recommendation. Knowl.-Based Syst. 54, 310–317 (2013)CrossRefGoogle Scholar
  16. 16.
    Winoto, P., Tang, T.Y.: The role of user mood in movie recommendations. Expert Syst. Appl. 37, 6086–6092 (2010)CrossRefGoogle Scholar
  17. 17.
    Cohen, W.W., Fan, W.: Web-collaborative filtering: recommending music by crawling the web. Comput. Netw. 33, 685–698 (2000)CrossRefGoogle Scholar
  18. 18.
    Benkoussas, C., Hamdan, H., Albitar, S., Ollagnier, A., Bellot, P.: Collaborative filtering for book recommandation. In: Working Notes for CLEF 2014 Conference, Sheeld, UK, 15–18 September 2014, pp. 501–507 (2014)Google Scholar
  19. 19.
    Singh, A., Sharma, A., Dey, N., Ashour, A.S.: Web recommendation techniques: status, issues and challenges. J. Netw. Commun. Emerg. Technol. 5, 57–65 (2015)Google Scholar
  20. 20.
    Chen, K., Chen, T., Zheng, G., Jin, O., Yao, E., Yu, Y.: Collaborative personalized tweet recommendation. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 661–670. ACM (2012)Google Scholar
  21. 21.
    Herlocker, J.L., Konstan, J.A., Borchers, A., Riedj, J.: An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 230–237. ACM, New York (1999)Google Scholar
  22. 22.
    Zahra, S., Ghazanfar, M.A., Khalid, A., Azam, M.A., Naeem, U., Prugel-Bennett, A.: Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf. Sci. 320, 156–189 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Gong, S., Ye, H., Tan, H.: Combining memory-based and model-based collaborative filtering in recommender system. In: Pacific-Asia Conference on Circuits, Communications and Systems, PACCS 2009, pp. 690–693. IEEE (2009)Google Scholar
  24. 24.
    Su, X., Khoshgoftaar, T.M.: Collaborative filtering for multi-class data using belief nets algorithms. In: 18th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2006, pp. 497–504. IEEE (2006)Google Scholar
  25. 25.
    Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci. 49, 136–146 (2015). Proceedings of 4th International Conference on Advances in Computing, Communication and Control (ICAC3 2015)CrossRefGoogle Scholar
  26. 26.
    Zhang, Z., Liu, H.: Application and research of improved probability matrix factorization techniques in collaborative filtering. Int. J. Control Autom. 7, 79–92 (2014)CrossRefGoogle Scholar
  27. 27.
    Hofmann, T., Puzicha, J.: Latent class models for collaborative filtering. In: IJCAI, vol. 99, pp. 688–693 (1999)Google Scholar
  28. 28.
    Roh, T.H., Oh, K.J., Han, I.: The collaborative filtering recommendation based on SOM cluster-indexing CBR. Expert Syst. Appl. 25, 413–423 (2003)CrossRefGoogle Scholar
  29. 29.
    Feng, Z., Huiyou, C.: Employing BP neural networks to alleviate the sparsity issue in collaborative filtering recommendation algorithms. J. Comput. Res. Dev. 4, 014 (2006)Google Scholar
  30. 30.
    Salah, A., Rogovschi, N., Nadif, M.: A dynamic collaborative filtering system via a weighted clustering approach. Neurocomputing 175, 206–215 (2016)CrossRefGoogle Scholar
  31. 31.
    Ungar, L.H., Foster, D.P.: Clustering methods for collaborative filtering. In: AAAI Workshop on Recommendation Systems, vol. 1, pp. 114–129 (1998)Google Scholar
  32. 32.
    Guan, Y., Ghorbani, A.A., Belacel, N.: Y-means: a clustering method for intrusion detection. In: Canadian Conference on Electrical and Computer Engineering, IEEE CCECE 2003, vol. 2, pp. 1083–1086. IEEE (2003)Google Scholar
  33. 33.
    Guan, Y., Ghorbani, A.A., Belacel, N.: Y-means: a clustering method for intrusion detection. In: Canadian Conference on Electrical and Computer Engineering, IEEE CCECE 2003, vol. 2, pp. 1083–1086. IEEE (2003)Google Scholar
  34. 34.
    Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5, 19:1–19:19 (2015)CrossRefGoogle Scholar
  35. 35.
    Jawaheer, G., Szomszor, M., Kostkova, P.: Comparison of implicit and explicit feedback from an online music recommendation service. In: Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, pp. 47–51. ACM (2010)Google Scholar
  36. 36.
    Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175-186. ACM (1994)Google Scholar
  37. 37.
    Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., Riedl, J.: GroupLens: applying collaborative filtering to usenet news. Commun. ACM 40, 77–87 (1997)CrossRefGoogle Scholar
  38. 38.
    Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)CrossRefGoogle Scholar
  39. 39.
    Hill, W., Stead, L., Rosenstein, M., Furnas, G.: Recommending and evaluating choices in a virtual community of use. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 194–201. ACM Press/Addison-Wesley Publishing Co. (1995)Google Scholar
  40. 40.
    Herlocker, J., Konstan, J.A., Riedl, J.: An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf. Retr. 5, 287–310 (2002)CrossRefGoogle Scholar
  41. 41.
    Shardanand, U., Maes, P.: Social information filtering: algorithms for automating word of mouth. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 210–217. ACM Press/Addison-Wesley Publishing Co. (1995)Google Scholar
  42. 42.
    Al-Shamri, M.Y.H.: Power coefficient as a similarity measure for memory-based collaborative recommender systems. Expert Syst. Appl. 41, 5680–5688 (2014)CrossRefGoogle Scholar
  43. 43.
    Liu, H., Hu, Z., Mian, A., Tian, H., Zhu, X.: A new user similarity model to improve the accuracy of collaborative filtering. Knowl.-Based Syst. 56, 156–166 (2014)CrossRefGoogle Scholar
  44. 44.
    Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends Hum.-Comput. Interact. 4, 81–173 (2011)CrossRefGoogle Scholar
  45. 45.
    Ekstrand, M.D., Ludwig, M., Konstan, J.A., Riedl, J.T.: Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. In: Proceedings of the fifth ACM Conference on Recommender systems, pp. 133–140. ACM (2011)Google Scholar
  46. 46.
    Darvishi-Mirshekarlou, F., Akbarpour, S., Feizi-Derakhshi, M., et al.: Reviewing cluster based collaborative filtering approaches. Int. J. Comput. Appl. Technol. Res. 2, 650–659 (2013)Google Scholar
  47. 47.
    Huang, C., Yin, J.: Effective association clusters filtering to cold-start recommendations. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 5, pp. 2461–2464. IEEE (2010)Google Scholar
  48. 48.
    Birtolo, C., Ronca, D., Armenise, R., Ascione, M.: Personalized suggestions by means of collaborative filtering: a comparison of two different model-based techniques. In: 2011 Third World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 444–450 (2011)Google Scholar
  49. 49.
    Birtolo, C., Ronca, D.: Advances in clustering collaborative filtering by means of fuzzy c-means and trust. Expert Syst. Appl. 40, 6997–7009 (2013)CrossRefGoogle Scholar
  50. 50.
    Koren, Y.: Factor in the neighbors: scalable and accurate collaborative filtering. ACM Trans. Knowl. Discov. Data 4, 1–24 (2010)CrossRefGoogle Scholar
  51. 51.
    Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 791–798. ACM, New York (2007)Google Scholar
  52. 52.
    Wilson, J., Chaudhury, S., Lall, B.: Improving collaborative filtering based recommenders using topic modelling. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 01, pp. 340–346. IEEE Computer Society (2014)Google Scholar
  53. 53.
    Sahoo, N., Singh, P.V., Mukhopadhyay, T.: A hidden Markov model for collaborative filtering. MIS Q. 36, 1329–1356 (2012)CrossRefGoogle Scholar
  54. 54.
    Durand, G., Laplante, F., Kop, R.: A learning design recommendation system based on Markov decision processes. In: ACM SIG KDD 2011 Workshop: Knowledge Discovery in Educational Data (2011)Google Scholar
  55. 55.
    Belacel, N., Hansen, P., Mladenovic, N.: Fuzzy J-means: a new heuristic for fuzzy clustering. Pattern Recognit. 35, 2193–2200 (2002)CrossRefGoogle Scholar
  56. 56.
    Belacel, N., Wang, C., Cupelovic-Culf, M.: Clustering: Unsupervised Learning in Large Biological Data, pp. 89–127. Wiley, Hoboken (2010)Google Scholar
  57. 57.
    LaPlante, F., Kardouchi, M., Belacel, N.: Image categorization using a heuristic automatic clustering method based on hierarchical clustering. In: Kamel, M., Campilho, A. (eds.) ICIAR 2015. LNCS, vol. 9164, pp. 150–158. Springer, Cham (2015). Scholar
  58. 58.
    Wu, W., Xiong, H., Shekhar, S.: Clustering and Information Retrieval, vol. 11. Springer, Heidelberg (2013). Scholar
  59. 59.
    Zhang, C.X., Zhang, Z.K., Yu, L., Liu, C., Liu, H., Yan, X.Y.: Information filtering via collaborative user clustering modeling. Phys. A: Stat. Mech. Appl. 396, 195–203 (2014)CrossRefGoogle Scholar
  60. 60.
    Tsai, C.F., Hung, C.: Cluster ensembles in collaborative filtering recommendation. Appl. Soft Comput. 12, 1417–1425 (2012)CrossRefGoogle Scholar
  61. 61.
    Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In: Proceedings of the fifth international conference on computer and information technology, vol. 1, pp. 1–5 (2002)Google Scholar
  62. 62.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 6, 721–741 (1984)CrossRefGoogle Scholar
  63. 63.
    Kohrs, A., Merialdo, B.: Clustering for collaborative filtering applications. Intell. Image Process. Data Anal. Inf. Retr. 3, 199–205 (1999)zbMATHGoogle Scholar
  64. 64.
    Xue, G.R., et al.: Scalable collaborative filtering using cluster-based smoothing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 114–121. ACM, New York (2005)Google Scholar
  65. 65.
    Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)Google Scholar
  66. 66.
    Ma, X., Lu, H., Gan, Z., Zhao, Q.: An exploration of improving prediction accuracy by constructing a multi-type clustering based recommendation framework. Neurocomputing 191, 388–397 (2016)CrossRefGoogle Scholar
  67. 67.
    Hu, R., Dou, W., Liu, J.: Clustering-based collaborative filtering approach for mashups recommendation over big data. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp. 810–817 (2013)Google Scholar
  68. 68.
    Dakhel, G., Mahdavi, M.: A new collaborative filtering algorithm using k-means clustering and neighbors’ voting. In: 2011 11th International Conference on Hybrid Intelligent Systems (HIS), pp. 179–184 (2011)Google Scholar
  69. 69.
    Pereira, A.L.V., Hruschka, E.R.: Simultaneous co-clustering and learning to address the cold start problem in recommender systems. Knowl.-Based Syst. 82, 11–19 (2015)CrossRefGoogle Scholar
  70. 70.
    Huang, H., et al.: K-means+ method for improving gene selection for classification of microarray data. In: Computational Systems Bioinformatics Conference, pp. 110–111. IEEE (2005)Google Scholar
  71. 71.
    Hansen, P., Mladenovic, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognit. 34, 405–413 (2001)CrossRefGoogle Scholar
  72. 72.
    Cremonesi, P., Turrin, R., Lentini, E., Matteucci, M.: An evaluation methodology for collaborative recommender systems. In: International Conference on Automated solutions for Cross Media Content and Multi-channel Distribution, AXMEDIS 2008, pp. 224–231 (2008)Google Scholar
  73. 73.
    Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.T.: Application of dimensionality reduction in recommender system-a case study. In: ACM WEBKDD Workshop, pp. 1–12 (2000)Google Scholar
  74. 74.
    Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22, 5–53 (2004)CrossRefGoogle Scholar
  75. 75.
    Belacel, N., Durand, G., Leger, S., Bouchard, C.: Splitting-merging clustering algorithm for collaborative filtering recommendation system. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, INSTICC, pp. 165–174. SciTePress (2018)Google Scholar
  76. 76.
    Wang, C., Belacel, N.: VNSOptClust: a variable neighborhood search based approach for unsupervised anomaly detection. In: Le Thi, H.A., Bouvry, P., Pham Dinh, T. (eds.) MCO 2008. CCIS, vol. 14, pp. 607–616. Springer, Heidelberg (2008). Scholar

Copyright information

© Her Majesty the Queen in Right of Canada as represented by NRC Canada 2019

Authors and Affiliations

  • Nabil Belacel
    • 1
    Email author
  • Guillaume Durand
    • 1
  • Serge Leger
    • 1
  • Cajetan Bouchard
    • 1
  1. 1.National Research Council, Digital Technologies Research CenterOttawaCanada

Personalised recommendations