Abstract
Tagging recommender systems provide users the freedom to explore tags and obtain recommendations. The releasing and sharing of these tagging datasets will accelerate both commercial and research work on recommender systems. However, releasing the original tagging datasets is usually confronted with serious privacy concerns, because adversaries may re-identify a user and her/his sensitive information from tagging datasets with only a little background information. Recently, several privacy techniques have been proposed to address the problem, but most of these lack a strict privacy notion, and rarely prevent individuals being re-identified from the dataset. This paper proposes a privacy- preserving tag release algorithm, PriTop. This algorithm is designed to satisfy differential privacy, a strict privacy notion with the goal of protecting users in a tagging dataset. The proposed PriTop algorithm includes three privacy-preserving operations: Private topic model generation structures the uncontrolled tags; private weight perturbation adds Laplace noise into the weights to hide the numbers of tags; while private tag selection finally finds the most suitable replacement tags for the original tags, so the exact tags can be hidden. We present extensive experimental results on four real-world datasets, Delicious, MovieLens, Last.fm and BibSonomy. While the recommendation algorithm is successful in all the cases, our results further suggest the proposed PriTop algorithm can successfully retain the utility of the datasets while preserving privacy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berkovsky S, Eytani Y, Kuflik T, Ricci F (2007) Enhancing privacy and preserving accuracy of a distributed collaborative filtering. In: Proceedings of the 2007 ACM conference on recommender systems, RecSys ’07. ACM, New York, NY, USA, pp 9–16
Blei David M (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th annual ACM symposium on theory of computing, STOC ’08. ACM, New York, NY, USA, pp 609–618
Calandrino JA, Kilzer A, Narayanan A, Felten EW, Shmatikov V (2011) “you might also like: ” privacy risks of collaborative filtering. In: Proceedings of the 2011 IEEE symposium on security and privacy, SP ’11. IEEE Computer Society, Washington, DC, USA, pp 231–246
Canny J (2002) Collaborative filtering with privacy via factor analysis. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02. ACM, New York, NY, USA, pp 238–245
Dwork C (2006) Differential privacy. In: ICALP’06: Proceedings of the 33rd international conference on automata, languages and programming. Springer, Berlin, pp 1–12
Dwork C (2008) Differential privacy: a survey of results. In: TAMC’08: Proceedings of the 5th international conference on theory and applications of models of computation. Springer, Berlin, pp 1–19
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: TCC’06: Proceedings of the third conference on theory of cryptography. Springer, Berlin, pp 265–284
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):1–53
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235
Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G (2007) Tag recommendations in folksonomies. In: Proceedings of the 11th European conference on principles and practice of knowledge discovery in databases, PKDD 2007. Springer, Berlin, pp 506–514
Krestel R, Fankhauser P, Nejdl W (2009) Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM conference on recommender systems, RecSys ’09. ACM, New York, NY, USA, pp 61–68
Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Marinho L, Hotho A, Jschke R, Nanopoulos A, Rendle S, Schmidt-Thieme L, Stumme G, Symeonidis P (2012) SpringerBriefs in electrical and computer engineering. Recommender systems for social tagging systems. Springer, US, pp 75–80
McSherry F, Mironov I (2009) Differentially private recommender systems: building privacy into the net. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09. ACM, New York, NY, USA, pp 627–636
McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th annual IEEE symposium on foundations of computer science, FOCS ’07. IEEE Computer Society, Washington, DC, USA, pp 94–103
Narayanan A, Shmatikov V (2006) How to break anonymity of the netflix prize dataset. CoRR, abs/cs/0610105
Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE symposium on security and privacy, SP ’08. IEEE Computer Society, Washington, DC, USA, pp 111–125
Parameswaran R, Blough DM (2007) Privacy preserving collaborative filtering using data obfuscation. In: Granular computing, 2007. GRC 2007. IEEE international conference on granular computing, p 380
Parra-Arnau J, Perego A, Ferrari E, Forne J, Rebollo-Monedero D (2014) Privacy-preserving enhanced collaborative tagging. IEEE Trans Knowl Data Eng 26(1):180–193
Parra-Arnau J, Rebollo-Monedero D, Forne J (2014) Measuring the privacy of user profiles in personalized information systems. Future Gener Comput Syst 33(0):5363
Polat H, Du W (2003) Privacy-preserving collaborative filtering using randomized perturbation techniques. In: ICDM 2003. Third IEEE international conference on Data mining, 2003, pp 625–628
Polat H, Du W (2006) Achieving private recommendations using randomized response techniques. In: Proceedings of the 10th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’06. Springer, Berlin, pp 637–646
Ramakrishnan N, Keller BJ, Mirza BJ, Grama AY, Karypis G (2001) Privacy risks in recommender systems. IEEE Internet Comput 5(6):54–62
Shepitsen A, Gemmell J, Mobasher B, Burke R (2008) Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the 2008 ACM conference on recommender systems, RecSys ’08. ACM, New York, NY, USA, pp 259–266
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, NY, USA, pp 327–336
Steyvers M, Griffiths T (2007) Probabilistic topic models. Handb Latent semant Anal 427(7):424–440
Symeonidis P, Nanopoulos A, Manolopoulos Y (2008) Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM conference on recommender systems, RecSys ’08. ACM, New York, NY, USA, pp 43–50
Zhan J, Hsieh C-L, Wang I-C, Tsan sheng H, Liau C-J, Wang Da-Wei (2010) Privacy-preserving collaborative recommender systems. IEEE Trans Syst Man Cybern C Appl Rev 40(4):472–476
Zhu T, Li G, Ren Y, Zhou W, Xiong P (2013) Differential privacy for neighborhood-based collaborative filtering. In: Proceedings of the 2013 international conference on advances in social networks analysis and mining (ASONAM 2013), ASONAM ’13. IEEE computer society, Washington, DC, USA, pp 752–759
Author information
Authors and Affiliations
Corresponding author
Additional information
This manuscript is an extended version of PAKDD best student paper.
Rights and permissions
About this article
Cite this article
Zhu, T., Li, G., Zhou, W. et al. Privacy-preserving topic model for tagging recommender systems. Knowl Inf Syst 46, 33–58 (2016). https://doi.org/10.1007/s10115-015-0832-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0832-9