A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems

  • Rodolfo BojorqueEmail author
  • Remigio Hurtado
  • Andrés Inga
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 787)


This work shows similarity metrics behavior on sparse data for recommender systems (RS). Clustering in RS is an important technique to perform groups of users or items with the purpose of personalization and optimization recommendations. The majority of clustering techniques try to minimize the Euclidean distance between the samples and their centroid, but this technique has a drawback on sparse data because it considers the lack of value as zero. We propose a comparative analysis of similarity metrics like Pearson Correlation, Jaccard, Mean Square Difference, Jaccard Mean Square Difference and Mean Jaccard Difference as an alternative method to Euclidean distance, our work shows results for FilmTrust and MovieLens 100K datasets, these both free and public with high sparsity. We probe that using similarity measures is better for accuracy in terms of Mean Absolute Error and Within-Cluster on sparse data.


Clustering Recommender systems Similarity measures 


  1. 1.
    Ortega, F., Hernando, A., Bobadilla, J., Kang, J. H.: Recommending items to group of users using matrix factorization based collaborative filtering. Inf. Sci. 345, 313–324 (2016). ISSN 00200255, Scholar
  2. 2.
    Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035 (2007). ISBN 978-0-898716-24-5Google Scholar
  3. 3.
    Meteren, R., Someren, M.: Using content-based filtering for recommendation. In: Proceedings of ECML 2000 Workshop on Maching Learning in Information Age, pp. 47–56 (2000)Google Scholar
  4. 4.
    Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRefGoogle Scholar
  5. 5.
    Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 4, 2 (2009). ISSN 1687-7470,
  6. 6.
    Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., Zhou, T.: Recommender systems. Phys. Rep. 519(1), 1–49 (2012). ISSN 03701573,
  7. 7.
    Jameson, A., Smyth, B.: Recommendation to Groups, pp. 596–627. Springer, Heidelberg (2007). ISBN 978-3-540-72079-9,
  8. 8.
    Boratto, L., Carta, S.: State-of-the-Art in Group Recommendation and New Approaches for Automatic Identification of Groups, pp. 1–20. Springer, Heidelberg (2011). ISBN 978-3-642-16089-9, Scholar
  9. 9.
    Zahra, S., Ghazanfar, M.A., Khalid, A., Azam, M.A., Naeem, U., Prugel-Bennett, A.: Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf. Sci. 320, 156–189 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015). ISSN 0957-4174, Scholar
  11. 11.
    Ghazanfar, M. A., Szedmak, S., Prugel-Bennett, A.: Incremental kernel mapping algorithms for scalable recommender systems. In: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp. 1077–1084 (2011). ISSN 1082-3409,
  12. 12.
    Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Ponce, A.C., de Carvalho, L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man, Cybern. 39(2), 133–155 (2009)CrossRefGoogle Scholar
  13. 13.
    Nazeer, K.A.A., Kumar, S.D.M., Sebastian, M.P.: Enhancing the K-Means clustering algorithm by using a o(n logn) heuristic method for finding better initial centroids. In: 2011 Second International Conference on Emerging Applications of Information Technology, pp. 261–264 (2011)Google Scholar
  14. 14.
    Bobadilla, J., Ortega, F., Hernando, A.: A collaborative filtering similarity measure based on singularities. Inf. Process. Manage. 48, 204–217 (2012). ISSN 03064573, Scholar
  15. 15.
    Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl.-Based Syst. 82, 163–177 (2015)CrossRefGoogle Scholar
  16. 16.
    Bobadilla, J., Ortega, F., Hernando, A., Bernal, J.: A collaborative filtering approach to mitigate the new user cold start problem. Knowl.-Based Syst. 26, 225–238 (2012)CrossRefGoogle Scholar
  17. 17.
    Ghazanfar, M.A., Prügel-Bennett, A.: Leveraging clustering approaches to solve the gray-sheep users problem in recommender systems. Expert Syst. Appl. 41, 3261–3272 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Rodolfo Bojorque
    • 1
    • 2
    Email author
  • Remigio Hurtado
    • 1
    • 2
  • Andrés Inga
    • 1
  1. 1.Universidad Politécnica SalesianaCuencaEcuador
  2. 2.Universidad Politécnica de MadridMadridSpain

Personalised recommendations