Skip to main content

A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems

  • Conference paper
  • First Online:
Advances in Artificial Intelligence, Software and Systems Engineering (AHFE 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 787))

Included in the following conference series:

Abstract

This work shows similarity metrics behavior on sparse data for recommender systems (RS). Clustering in RS is an important technique to perform groups of users or items with the purpose of personalization and optimization recommendations. The majority of clustering techniques try to minimize the Euclidean distance between the samples and their centroid, but this technique has a drawback on sparse data because it considers the lack of value as zero. We propose a comparative analysis of similarity metrics like Pearson Correlation, Jaccard, Mean Square Difference, Jaccard Mean Square Difference and Mean Jaccard Difference as an alternative method to Euclidean distance, our work shows results for FilmTrust and MovieLens 100K datasets, these both free and public with high sparsity. We probe that using similarity measures is better for accuracy in terms of Mean Absolute Error and Within-Cluster on sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ortega, F., Hernando, A., Bobadilla, J., Kang, J. H.: Recommending items to group of users using matrix factorization based collaborative filtering. Inf. Sci. 345, 313–324 (2016). ISSN 00200255, https://doi.org/10.1016/j.ins.2016.01.083

    Article  Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035 (2007). ISBN 978-0-898716-24-5

    Google Scholar 

  3. Meteren, R., Someren, M.: Using content-based filtering for recommendation. In: Proceedings of ECML 2000 Workshop on Maching Learning in Information Age, pp. 47–56 (2000)

    Google Scholar 

  4. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)

    Article  Google Scholar 

  5. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 4, 2 (2009). ISSN 1687-7470, https://doi.org/10.1155/2009/421425

  6. Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., Zhou, T.: Recommender systems. Phys. Rep. 519(1), 1–49 (2012). ISSN 03701573, https://doi.org/10.1016/j.physrep.2012.02.006

  7. Jameson, A., Smyth, B.: Recommendation to Groups, pp. 596–627. Springer, Heidelberg (2007). ISBN 978-3-540-72079-9, https://doi.org/10.1007/978-3-540-72079-920

  8. Boratto, L., Carta, S.: State-of-the-Art in Group Recommendation and New Approaches for Automatic Identification of Groups, pp. 1–20. Springer, Heidelberg (2011). ISBN 978-3-642-16089-9, https://doi.org/10.1007/978-3-642-16089-91

    Google Scholar 

  9. Zahra, S., Ghazanfar, M.A., Khalid, A., Azam, M.A., Naeem, U., Prugel-Bennett, A.: Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf. Sci. 320, 156–189 (2015)

    Article  MathSciNet  Google Scholar 

  10. Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015). ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2014.09.054

    Article  Google Scholar 

  11. Ghazanfar, M. A., Szedmak, S., Prugel-Bennett, A.: Incremental kernel mapping algorithms for scalable recommender systems. In: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp. 1077–1084 (2011). ISSN 1082-3409, https://doi.org/10.1109/ictai.2011.183

  12. Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Ponce, A.C., de Carvalho, L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man, Cybern. 39(2), 133–155 (2009)

    Article  Google Scholar 

  13. Nazeer, K.A.A., Kumar, S.D.M., Sebastian, M.P.: Enhancing the K-Means clustering algorithm by using a o(n logn) heuristic method for finding better initial centroids. In: 2011 Second International Conference on Emerging Applications of Information Technology, pp. 261–264 (2011)

    Google Scholar 

  14. Bobadilla, J., Ortega, F., Hernando, A.: A collaborative filtering similarity measure based on singularities. Inf. Process. Manage. 48, 204–217 (2012). ISSN 03064573, https://doi.org/10.1016/j.ipm.2011.03.007

    Article  Google Scholar 

  15. Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl.-Based Syst. 82, 163–177 (2015)

    Article  Google Scholar 

  16. Bobadilla, J., Ortega, F., Hernando, A., Bernal, J.: A collaborative filtering approach to mitigate the new user cold start problem. Knowl.-Based Syst. 26, 225–238 (2012)

    Article  Google Scholar 

  17. Ghazanfar, M.A., Prügel-Bennett, A.: Leveraging clustering approaches to solve the gray-sheep users problem in recommender systems. Expert Syst. Appl. 41, 3261–3272 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodolfo Bojorque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bojorque, R., Hurtado, R., Inga, A. (2019). A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems. In: Ahram, T. (eds) Advances in Artificial Intelligence, Software and Systems Engineering. AHFE 2018. Advances in Intelligent Systems and Computing, vol 787. Springer, Cham. https://doi.org/10.1007/978-3-319-94229-2_28

Download citation

Publish with us

Policies and ethics