Skip to main content
Log in

A differential privacy framework for matrix factorization recommender systems

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

Recommender systems rely on personal information about user behavior for the recommendation generation purposes. Thus, they inherently have the potential to hamper user privacy and disclose sensitive information. Several works studied how neighborhood-based recommendation methods can incorporate user privacy protection. However, privacy preserving latent factor models, in particular, those represented by matrix factorization techniques, the state-of-the-art in recommender systems, have received little attention. In this paper, we address the problem of privacy preserving matrix factorization by utilizing differential privacy, a rigorous and provable approach to privacy in statistical databases. We propose a generic framework and evaluate several ways, in which differential privacy can be applied to matrix factorization. By doing so, we specifically address the privacy-accuracy trade-off offered by each of the algorithms. We show that, of all the algorithms considered, input perturbation results in the best recommendation accuracy, while guaranteeing a solid level of privacy protection against attacks that aim to gain knowledge about either specific user ratings or even the existence of these ratings. Our analysis additionally highlights the system aspects that should be addressed when applying differential privacy in practice, and when considering potential privacy preserving solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/movielens/.

  2. http://www.netflixprize.com/.

  3. We used Matlab and, specifically, crossvalind, to split the data.

  4. Results obtained for the MovieLens-100K dataset exhibit a similar trend and are not shown, but only summarized in Table 4.

  5. Due to technical limitations (computational time and memory requirements), the experiments with the input perturbation approach could not be conducted on the MovieLens-10M and Netflix datasets. Therefore, ISGD results are not shown in Table 5 and ISGD curves are missing from Fig. 4b, c.

  6. We approached the authors, but unfortunately differentially private implementation of the kNN algorithm outlined in McSherry and Mironov (2009) was not publicly available, such that we were not able to reproduce the exact results reported therein.

  7. Due to memory limitations, kNN implementation for the MovieLens-10M dataset was not feasible.

References

  • Berkovsky, S., Eytani, Y., Kuflik, T., Ricci, F.: Hierarchical neighborhood topology for privacy enhanced collaborative filtering. In: Proceedings of Workshop on Privacy-Enhanced Personalization, PEP 2006, Montreal, Canada, pp. 6–13 (2006)

  • Berkovsky, S., Kuflik, T., Ricci, F.: The impact of data obfuscation on the accuracy of collaborative filtering. Expert Systems with Applications 39(5), 5033–5042 (2012)

    Article  Google Scholar 

  • Berlioz, A., Friedman, A., Kâafar, M.A., Boreli, R., Berkovsky, S.: Applying differential privacy to matrix factorization. In: Proceedings of the 9th ACM Conference on Recommender Systems, RecSys 2015, Vienna, Austria, pp. 107–114 (2015). doi:10.1145/2792838.2800173

  • Bhaskar, R., Laxman, S., Smith, A.D., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, Washington, DC, USA, pp. 503–512 (2010). doi:10.1145/1835804.1835869

  • Bilge, A., Gunes, I., Polat, H.: Robustness analysis of privacy-preserving model-based recommendation schemes. Expert Systems With Applications 41(8), 3671–3681 (2014)

    Article  Google Scholar 

  • Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: “You Might Also Like”: Privacy risks of collaborative filtering. In: Proceedings of the 32nd IEEE Symposium on Security and Privacy, S&P 2011, Berkeley, CA, USA, pp. 231–246 (2011). doi:10.1109/SP.2011.40

  • Canny, J.F.: Collaborative filtering with privacy. In: Proceedings of the 23rd IEEE Symposium on Security and Privacy, S&P 2002, Berkeley, CA, USA, pp. 45–57 (2002). doi:10.1109/SECPRI.2002.1004361

  • Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12, 1069–1109 (2011)

    MathSciNet  MATH  Google Scholar 

  • Cheng, Z., Hurley, N.: Trading robustness for privacy in decentralized recommender systems. In: Proceedings of the 21st Conference on Innovative Applications of Artificial Intelligence, IAAI 2009, Pasadena, CA, USA (2009)

  • Dwork, C.: Differential privacy: A survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, TAMC 2008, Xi’an, China, pp. 1–19 (2008). doi:10.1007/978-3-540-79228-4_1

  • Dwork, C., McSherry, F., Nissim, K., Smith, A.: Differential privacy – a primer for the preplexed. In: Joint UNECE/Eurostat work session on statistical data confidentiality. Tarragona, Spain (2011)

  • Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference, TCC 2006, New York, NY, USA, pp. 265–284 (2006). doi:10.1007/11681878_14

  • Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, Scottsdale, AZ, USA, pp. 1054–1067 (2014). doi:10.1145/2660267.2660348

  • Friedman, A., Knijnenburg, B., Vanhecke, K., Martens, L., Berkovsky, S.: Privacy aspects of recommender systems. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 649–688. Springer, (2015)

  • Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, Washington, DC, USA, pp. 493–502 (2010). doi:10.1145/1835804.1835868

  • Hardt, M., Talwar, K.: On the geometry of differential privacy. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, MA, USA, pp. 705–714 (2010). doi:10.1145/1806689.1806786

  • Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems 5(4), 19 (2016)

    Google Scholar 

  • Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D.: Principled evaluation of differentially private algorithms using DPBench. In: Proceedings of the International Conference on Management of Data, SIGMOD 2016, San Francisco, CA, USA, pp. 139–154 (2016). doi:10.1145/2882903.2882931

  • Jeckmans, A.J., Beye, M., Erkin, Z., Hartel, P., Lagendijk, R.L., Tang, Q.: Privacy in recommender systems. In: Ramzan, N., van Zwol, R., Lee, J.S., Clüver, K., Hua, X.S. (eds.) Social Media Retrieval, pp. 263–281. Springer, (2013)

  • Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 2011, pp. 193–204 (2011). doi:10.1145/1989323.1989345

  • Klösgen, W.: Anonymization techniques for knowledge discovery in databases. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, KDD 1995, Montreal, Canada, pp. 186–191 (1995)

  • Kobsa, A.: Privacy-enhanced web personalization. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, pp. 628–670. Springer, (2007)

  • Koren, Y., Bell, R.: Advances in collaborative filtering. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 77–118. Springer, (2015)

  • Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  • Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15), 5802–5805 (2013)

    Article  Google Scholar 

  • Lam, S.K., Frankowski, D., Riedl, J.: Do you trust your recommendations? An exploration of security and privacy issues in recommender systems. In: Proceedings of the International Conference on Emerging Trends in Information and Communication Security, ETRICS 2006, Freiburg, Germany, pp. 14–29 (2006). doi:10.1007/11766155_2

  • Li, T., Unger, T.: Willing to pay for quality personalization? Trade-off between quality and privacy. European Journal of Information Systems 21(6), 621–642 (2012)

    Article  Google Scholar 

  • Machanavajjhala, A., Korolova, A., Sarma, A.D.: Personalized social recommendations - accurate or private? Proceedings of the VLDB Endowment 4(7), 440–450 (2011)

    Article  Google Scholar 

  • McSherry, F., Mironov, I.: Differentially private recommender systems: Building privacy into the netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 627–636 (2009). doi:10.1145/1557019.1557090

  • Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 29th IEEE Symposium on Security and Privacy, (S&P 2008), Oakland, CA, USA, pp. 111–125 (2008). doi:10.1109/SP.2008.33

  • Netflix spilled your Brokeback Mountain secret. http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/. Accessed: July 2016

  • Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., Boneh, D.: Privacy-preserving matrix factorization. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, Berlin, Germany, pp. 801–812 (2013). doi:10.1145/2508859.2516751

  • Ning, X., Desrosiers, C., Karypis, G.: A comprehensive survey of neighborhood-based recommendation methods. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 37–76. Springer, (2015)

  • Parameswaran, R., Blough, D.M.: Privacy preserving collaborative filtering using data obfuscation. In: Proceedings of the IEEE International Conference on Granular Computing, GrC 2007, San Jose, CA, USA, pp. 380–386 (2007). doi:10.1109/GRC.2007.129

  • Polat, H., Du, W.: Achieving private recommendations using randomized response techniques. In: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014, Singapore, Singapore, pp. 637–646 (2006). doi:10.1007/11731139_73

  • Ricci, F., Rokach, L., Shapira, B. (eds.): Recommender Systems Handbook, 2nd edn. Springer, (2015)

  • Said, A., Berkovsky, S., De Luca, E.W., Hermanns, J.: Challenge on context-aware movie recommendation: Camra2011. In: Proceedings of the ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, USA, pp. 385–386 (2011). doi:10.1145/2043932.2044015

  • Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-based access control models. IEEE Computers 29(2), 38–47 (1996)

    Article  Google Scholar 

  • Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Analysis of recommendation algorithms for e-commerce. In: Proceedings of the ACM Conference on Electronic Commerce, Minneapolis, MN, USA, pp. 158–167 (2000). doi:10.1145/352871.352887

  • Sun, X., Kashima, H., Matsuzaki, T., Ueda, N.: Averaged stochastic gradient descent with feedback: An accurate, robust, and fast training method. In: Proceedings of the 10th IEEE International Conference on Data Mining, ICDM 2010, Sydney, Australia, pp. 1067–1072 (2010). doi:10.1109/ICDM.2010.26

  • Sweeney, L.: \(k\)-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Vallet, D., Friedman, A., Berkovsky, S.: Matrix factorization without user data retention. In: Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014, Tainan, Taiwan, pp. 569–580 (2014). doi:10.1007/978-3-319-06608-0_47

  • Weinsberg, U., Bhagat, S., Ioannidis, S., Taft, N.: BlurMe: Inferring and obfuscating user gender based on ratings. In: Proceedings of the 6th ACM Conference on Recommender Systems, RecSys 2012, Dublin, Ireland, pp. 195–202 (2012). doi:10.1145/2365952.2365989

  • Zhou, Y., Wilkinson, D.M., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Proceedings of 4th International Conference on Algorithmic Aspects in Information and Management, AAIM 2008, Shanghai, China, pp. 337–348 (2008). doi:10.1007/978-3-540-68880-8_32

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shlomo Berkovsky.

Appendix: Parameterisation

Appendix: Parameterisation

Here we briefly describe the parameterization of the differentially private algorithms. We detail the results obtained for the MovieLens-100K dataset and bounded differential privacy case; however, the same methodology was also applied to other datasets.

The goal of the parameterization was to set the most appropriate values of the MF and privacy parameters. To start with, the value of the learning rate parameter was set to \(\gamma =0.01\), as in other MF implementations (Koren and Bell 2015). Next, we optimized the regularization parameter \(\lambda \) and the number of latent factors d. For this, we defined a fixed test set of ratings and repeated the MF predictions for various combinations of values of \(\lambda \) and d. These combinations included the exhaustive set of pairs within the ranges \(\lambda \in [0.01,0.15]\) and \(d \in [1,25]\). For each value of the parameters, the RMSE of the predictions for the same test set was computed. Since a 3D plot of the RMSE is hard to corroborate, Fig. 6a, b shows the 2D projections of the plot obtained for the fixed values of \(\lambda \) and d. The best performing combination of \(\lambda =0.08\) and \(d=3\) was used in the unbounded experiments with the MovieLens-100K dataset.

Having set the parameters \(\lambda \) and d, we turned to the number of SGD/ALS iterations, k. For this, we gradually increased the number of iterations from \(k=1\) to \(k=15\), and for each value of k computed the RMSE obtained for the fixed test set. The results of this experiment are shown in Fig. 6c. As expected, the RMSE values stabilise starting from a certain value of k. For example, in this case RMSE is reasonably stable after \(k=7\), such that we parameterize the number of iterations to \(k=10\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Friedman, A., Berkovsky, S. & Kaafar, M.A. A differential privacy framework for matrix factorization recommender systems. User Model User-Adap Inter 26, 425–458 (2016). https://doi.org/10.1007/s11257-016-9177-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-016-9177-7

Keywords

Navigation