Skip to main content


Log in

Leveraging clustering to improve collaborative filtering

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript


Extensive work on matrix factorization (MF) techniques have been done recently as they provide accurate rating prediction models in recommendation systems. Additional extensions, such as neighbour-aware models, have been shown to improve rating prediction further. However, these models often suffer from a long computation time. In this paper, we propose a novel method that applies clustering algorithms to the latent vectors of users and items. Our method can capture the common interests between the cluster of users and the cluster of items in a latent space. A matrix factorization technique is then applied to this cluster-level rating matrix to predict the future cluster-level interests. We then aggregate the traditional user-item rating predictions with our cluster-level rating predictions to improve the rating prediction accuracy. Our method is a general “wrapper” that can be applied to all collaborative filtering methods. In our experiments, we show that our new approach, when applied to a variety of existing matrix factorization techniques, improves their rating predictions and also results in better rating predictions for cold-start users. Above all, in this paper we show that better quality and more quantity of these clusters achieve a better rating prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others



  2. The implementation package is publicly accessible at:


  • Rapidminer (2016). Accessed.

  • Weka 3: Data mining software in java (2016). Accessed.

  • Balijepally, V., Mangalaraj G., Iyengar K. (2011) Are we wielding this hammer correctly? A reflective review of the application of cluster analysis in information systems research. Journal AIS 12 (5) [].

  • Beutel, A., Murray K., Faloutsos C., Smola A.J. 2014. Cobafi: Collaborative bayesian filtering. ACM, NY, USA. doi:10.1145/2566486.2568040.

  • Bishop, C.M. (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., NJ, USA.

    Google Scholar 

  • Connor, M., Herlocker J. (1999) Clustering items for collaborative filtering. Proceedings of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA.

  • Desrosiers, C., Karypis G. (2011) A comprehensive survey of neighborhood-based recommendation methods. Recommender Systems Handbook. In: Ricci F., Rokach L., Shapira B., Kantor P.B. (eds), 107–144.. Springer, US. doi:10.1007/978-0-387-85820-3_4.

  • Ekstrand, M.D., Riedl J.T., Konstan J.A. (2011) Collaborative filtering recommender systems. Foundations Trends Human-Computer Interaction 4 (2): 81–173. doi:10.1561/1100000009.

    Article  Google Scholar 

  • George, T., Merugu S. (2005) A scalable collaborative filtering framework based on co-clustering. Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, 625–628.. IEEE Computer Society, DC, USA. doi:10.1109/ICDM.2005.14.

  • Gueye, M., Abdessalem T., Naacke H. (2011) A cluster-based matrix-factorization for online integration of new ratings. Journées de Bases de Données Avancées (BDA) , 1–18.

  • Herlocker, J.L., Konstan J.A., Borchers A., Riedl J. (1999) An algorithmic framework for performing collaborative filtering. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’99, 230–237.. ACM, NY, USA. doi:10.1145/312624.312682 10.1145/312624.312682.

  • Jamali, M., Huang T., Ester M. (2011) A generalized stochastic block model for recommendation in social rating networks. Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, 53–60.. ACM, NY, USA. doi:10.1145/2043932.2043946.

  • Konstan, J.A., Riedl J.T. (2012) Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22: 101–123. doi:10.1007/s11257-011-9112-x 10.1007/s11257-011-9112-x.

    Article  Google Scholar 

  • Koren, Y. (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08, 426–434.. ACM, NY, USA. doi:10.1145/1401890.1401944.

  • Koren, Y. (2010) Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions Knowledge Discovery Data 4 (1): 1:1–1:24. doi:10.1145/1644873.1644874.

    Google Scholar 

  • Koren, Y., Bell R. (2011) Advances in collaborative filtering. Recommender Systems Handbook. In: Ricci F., Rokach L., Shapira B., Kantor P.B. (eds), 145–186.. Springer, US. doi:10.1007/978-0-387-85820-3_5.

  • Koren, Y., Bell R., Volinsky C. (2009) Matrix factorization techniques for recommender systems. Computer 42 (8): 30–37. doi:10.1109/MC.2009.263.

    Article  Google Scholar 

  • Mirbakhsh, N., Ling C.X. (2013) Clustering-based factorized collaborative filtering. Proceedings of the 7th ACM conference on Recommender systems, RecSys ’13, 315–318.. ACM, NY, USA. doi:10.1145/2507157.2507233.

  • Mirbakhsh, N., Ling C.X. (2015) Improving top-n recommendation for cold-start users via cross-domain information (accepted to publish) the Transactions on Knowledge Discovery from Data (TKDD).

  • Ning, X., Karypis G. (2012) Sparse linear methods with side information for top-n recommendations. Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12, 155–162.. ACM, NY, USA. doi:10.1145/2365952.2365983 10.1145/2365952.2365983.

  • Rashid, A.M., Karypis G., Riedl J. (2008) Learning preferences of new users in recommender systems: an information theoretic approach. SIGKDD Exploration Newsletter 10 (2): 90–100. doi:10.1145/1540276.1540302.

    Article  Google Scholar 

  • Rendle, S., Freudenthaler C., Gantner Z., Schmidt-Thieme L. (2009) Bpr: Bayesian personalized ranking from implicit feedback. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, 452–461.. AUAI Press, Virginia, US [].

  • Steck, H. (2010) Training and testing of recommender systems on data missing not at random. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, 713–722.. ACM, NY, USA. doi:10.1145/1835804.1835895.

  • Steck, H. (2010) Training and testing of recommender systems on data missing not at random. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, 713–722.. ACM, NY, USA. doi:10.1145/1835804.1835895.

  • Töscher, A., Jahrer M., Legenstein R. (2008) Improved neighborhood-based algorithms for large-scale recommender systems. Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, NETFLIX ’08, 4:1–4:6.. ACM, NY, USA. doi:10.1145/1722149.1722153.

  • Witten, I.H., Frank E. (2005) Data mining: Practical machine learning tools and techniques, second edition (morgan kaufmann series in data management systems). Morgan Kaufmann Publishers Inc., CA, USA.

    Google Scholar 

  • Xu, B., Bu J., Chen C., Cai D. (2012) An exploration of improving collaborative recommender systems via user-item subgroups. Proceedings of the 21st international conference on World Wide Web, WWW ’12, 21–30.. ACM, NY, USA. doi:10.1145/2187836.2187840.

Download references


This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET: and Compute/Calcul Canada. The authors would like to thank the reviewers of the 2013 ACM Recommender System conference (RecSys’13) for their valuable comments.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nima Mirbakhsh.



In this section, we list all the parameters that we employ in the learning process of our proposed CBMF model in our experiment in the datasets (Algorithm 1). Table 4 shows these selected values.

Table 4 Employed parameters in Algorithm 1 in the datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirbakhsh, N., Ling, C.X. Leveraging clustering to improve collaborative filtering. Inf Syst Front 20, 111–124 (2018).

Download citation

  • Published:

  • Issue Date:

  • DOI: