A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems

Bojorque, Rodolfo; Hurtado, Remigio; Inga, Andrés

doi:10.1007/978-3-319-94229-2_28

Rodolfo Bojorque^15,16,
Remigio Hurtado^15,16 &
Andrés Inga¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 787))

Included in the following conference series:

International Conference on Applied Human Factors and Ergonomics

1764 Accesses
2 Citations

Abstract

This work shows similarity metrics behavior on sparse data for recommender systems (RS). Clustering in RS is an important technique to perform groups of users or items with the purpose of personalization and optimization recommendations. The majority of clustering techniques try to minimize the Euclidean distance between the samples and their centroid, but this technique has a drawback on sparse data because it considers the lack of value as zero. We propose a comparative analysis of similarity metrics like Pearson Correlation, Jaccard, Mean Square Difference, Jaccard Mean Square Difference and Mean Jaccard Difference as an alternative method to Euclidean distance, our work shows results for FilmTrust and MovieLens 100K datasets, these both free and public with high sparsity. We probe that using similarity measures is better for accuracy in terms of Mean Absolute Error and Within-Cluster on sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ortega, F., Hernando, A., Bobadilla, J., Kang, J. H.: Recommending items to group of users using matrix factorization based collaborative filtering. Inf. Sci. 345, 313–324 (2016). ISSN 00200255, https://doi.org/10.1016/j.ins.2016.01.083
Article Google Scholar
Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035 (2007). ISBN 978-0-898716-24-5
Google Scholar
Meteren, R., Someren, M.: Using content-based filtering for recommendation. In: Proceedings of ECML 2000 Workshop on Maching Learning in Information Age, pp. 47–56 (2000)
Google Scholar
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Article Google Scholar
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 4, 2 (2009). ISSN 1687-7470, https://doi.org/10.1155/2009/421425
Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., Zhou, T.: Recommender systems. Phys. Rep. 519(1), 1–49 (2012). ISSN 03701573, https://doi.org/10.1016/j.physrep.2012.02.006
Jameson, A., Smyth, B.: Recommendation to Groups, pp. 596–627. Springer, Heidelberg (2007). ISBN 978-3-540-72079-9, https://doi.org/10.1007/978-3-540-72079-920
Boratto, L., Carta, S.: State-of-the-Art in Group Recommendation and New Approaches for Automatic Identification of Groups, pp. 1–20. Springer, Heidelberg (2011). ISBN 978-3-642-16089-9, https://doi.org/10.1007/978-3-642-16089-91
Google Scholar
Zahra, S., Ghazanfar, M.A., Khalid, A., Azam, M.A., Naeem, U., Prugel-Bennett, A.: Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf. Sci. 320, 156–189 (2015)
Article MathSciNet Google Scholar
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015). ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2014.09.054
Article Google Scholar
Ghazanfar, M. A., Szedmak, S., Prugel-Bennett, A.: Incremental kernel mapping algorithms for scalable recommender systems. In: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp. 1077–1084 (2011). ISSN 1082-3409, https://doi.org/10.1109/ictai.2011.183
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Ponce, A.C., de Carvalho, L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man, Cybern. 39(2), 133–155 (2009)
Article Google Scholar
Nazeer, K.A.A., Kumar, S.D.M., Sebastian, M.P.: Enhancing the K-Means clustering algorithm by using a o(n logn) heuristic method for finding better initial centroids. In: 2011 Second International Conference on Emerging Applications of Information Technology, pp. 261–264 (2011)
Google Scholar
Bobadilla, J., Ortega, F., Hernando, A.: A collaborative filtering similarity measure based on singularities. Inf. Process. Manage. 48, 204–217 (2012). ISSN 03064573, https://doi.org/10.1016/j.ipm.2011.03.007
Article Google Scholar
Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl.-Based Syst. 82, 163–177 (2015)
Article Google Scholar
Bobadilla, J., Ortega, F., Hernando, A., Bernal, J.: A collaborative filtering approach to mitigate the new user cold start problem. Knowl.-Based Syst. 26, 225–238 (2012)
Article Google Scholar
Ghazanfar, M.A., Prügel-Bennett, A.: Leveraging clustering approaches to solve the gray-sheep users problem in recommender systems. Expert Syst. Appl. 41, 3261–3272 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Politécnica Salesiana, Cuenca, Ecuador
Rodolfo Bojorque, Remigio Hurtado & Andrés Inga
Universidad Politécnica de Madrid, Madrid, Spain
Rodolfo Bojorque & Remigio Hurtado

Authors

Rodolfo Bojorque
View author publications
You can also search for this author in PubMed Google Scholar
Remigio Hurtado
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Inga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodolfo Bojorque .

Editor information

Editors and Affiliations

University of Central Florida, Orlando, Florida, USA
Tareq Z. Ahram

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bojorque, R., Hurtado, R., Inga, A. (2019). A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems. In: Ahram, T. (eds) Advances in Artificial Intelligence, Software and Systems Engineering. AHFE 2018. Advances in Intelligent Systems and Computing, vol 787. Springer, Cham. https://doi.org/10.1007/978-3-319-94229-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-94229-2_28
Published: 29 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94228-5
Online ISBN: 978-3-319-94229-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics