Abstract
This work shows similarity metrics behavior on sparse data for recommender systems (RS). Clustering in RS is an important technique to perform groups of users or items with the purpose of personalization and optimization recommendations. The majority of clustering techniques try to minimize the Euclidean distance between the samples and their centroid, but this technique has a drawback on sparse data because it considers the lack of value as zero. We propose a comparative analysis of similarity metrics like Pearson Correlation, Jaccard, Mean Square Difference, Jaccard Mean Square Difference and Mean Jaccard Difference as an alternative method to Euclidean distance, our work shows results for FilmTrust and MovieLens 100K datasets, these both free and public with high sparsity. We probe that using similarity measures is better for accuracy in terms of Mean Absolute Error and Within-Cluster on sparse data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ortega, F., Hernando, A., Bobadilla, J., Kang, J. H.: Recommending items to group of users using matrix factorization based collaborative filtering. Inf. Sci. 345, 313–324 (2016). ISSN 00200255, https://doi.org/10.1016/j.ins.2016.01.083
Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035 (2007). ISBN 978-0-898716-24-5
Meteren, R., Someren, M.: Using content-based filtering for recommendation. In: Proceedings of ECML 2000 Workshop on Maching Learning in Information Age, pp. 47–56 (2000)
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 4, 2 (2009). ISSN 1687-7470, https://doi.org/10.1155/2009/421425
Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., Zhou, T.: Recommender systems. Phys. Rep. 519(1), 1–49 (2012). ISSN 03701573, https://doi.org/10.1016/j.physrep.2012.02.006
Jameson, A., Smyth, B.: Recommendation to Groups, pp. 596–627. Springer, Heidelberg (2007). ISBN 978-3-540-72079-9, https://doi.org/10.1007/978-3-540-72079-920
Boratto, L., Carta, S.: State-of-the-Art in Group Recommendation and New Approaches for Automatic Identification of Groups, pp. 1–20. Springer, Heidelberg (2011). ISBN 978-3-642-16089-9, https://doi.org/10.1007/978-3-642-16089-91
Zahra, S., Ghazanfar, M.A., Khalid, A., Azam, M.A., Naeem, U., Prugel-Bennett, A.: Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf. Sci. 320, 156–189 (2015)
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015). ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2014.09.054
Ghazanfar, M. A., Szedmak, S., Prugel-Bennett, A.: Incremental kernel mapping algorithms for scalable recommender systems. In: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp. 1077–1084 (2011). ISSN 1082-3409, https://doi.org/10.1109/ictai.2011.183
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Ponce, A.C., de Carvalho, L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man, Cybern. 39(2), 133–155 (2009)
Nazeer, K.A.A., Kumar, S.D.M., Sebastian, M.P.: Enhancing the K-Means clustering algorithm by using a o(n logn) heuristic method for finding better initial centroids. In: 2011 Second International Conference on Emerging Applications of Information Technology, pp. 261–264 (2011)
Bobadilla, J., Ortega, F., Hernando, A.: A collaborative filtering similarity measure based on singularities. Inf. Process. Manage. 48, 204–217 (2012). ISSN 03064573, https://doi.org/10.1016/j.ipm.2011.03.007
Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl.-Based Syst. 82, 163–177 (2015)
Bobadilla, J., Ortega, F., Hernando, A., Bernal, J.: A collaborative filtering approach to mitigate the new user cold start problem. Knowl.-Based Syst. 26, 225–238 (2012)
Ghazanfar, M.A., Prügel-Bennett, A.: Leveraging clustering approaches to solve the gray-sheep users problem in recommender systems. Expert Syst. Appl. 41, 3261–3272 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bojorque, R., Hurtado, R., Inga, A. (2019). A Comparative Analysis of Similarity Metrics on Sparse Data for Clustering in Recommender Systems. In: Ahram, T. (eds) Advances in Artificial Intelligence, Software and Systems Engineering. AHFE 2018. Advances in Intelligent Systems and Computing, vol 787. Springer, Cham. https://doi.org/10.1007/978-3-319-94229-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-94229-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94228-5
Online ISBN: 978-3-319-94229-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)