Skip to main content

A differential privacy framework for matrix factorization recommender systems

Abstract

Recommender systems rely on personal information about user behavior for the recommendation generation purposes. Thus, they inherently have the potential to hamper user privacy and disclose sensitive information. Several works studied how neighborhood-based recommendation methods can incorporate user privacy protection. However, privacy preserving latent factor models, in particular, those represented by matrix factorization techniques, the state-of-the-art in recommender systems, have received little attention. In this paper, we address the problem of privacy preserving matrix factorization by utilizing differential privacy, a rigorous and provable approach to privacy in statistical databases. We propose a generic framework and evaluate several ways, in which differential privacy can be applied to matrix factorization. By doing so, we specifically address the privacy-accuracy trade-off offered by each of the algorithms. We show that, of all the algorithms considered, input perturbation results in the best recommendation accuracy, while guaranteeing a solid level of privacy protection against attacks that aim to gain knowledge about either specific user ratings or even the existence of these ratings. Our analysis additionally highlights the system aspects that should be addressed when applying differential privacy in practice, and when considering potential privacy preserving solutions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    http://grouplens.org/datasets/movielens/.

  2. 2.

    http://www.netflixprize.com/.

  3. 3.

    We used Matlab and, specifically, crossvalind, to split the data.

  4. 4.

    Results obtained for the MovieLens-100K dataset exhibit a similar trend and are not shown, but only summarized in Table 4.

  5. 5.

    Due to technical limitations (computational time and memory requirements), the experiments with the input perturbation approach could not be conducted on the MovieLens-10M and Netflix datasets. Therefore, ISGD results are not shown in Table 5 and ISGD curves are missing from Fig. 4b, c.

  6. 6.

    We approached the authors, but unfortunately differentially private implementation of the kNN algorithm outlined in McSherry and Mironov (2009) was not publicly available, such that we were not able to reproduce the exact results reported therein.

  7. 7.

    Due to memory limitations, kNN implementation for the MovieLens-10M dataset was not feasible.

References

  1. Berkovsky, S., Eytani, Y., Kuflik, T., Ricci, F.: Hierarchical neighborhood topology for privacy enhanced collaborative filtering. In: Proceedings of Workshop on Privacy-Enhanced Personalization, PEP 2006, Montreal, Canada, pp. 6–13 (2006)

  2. Berkovsky, S., Kuflik, T., Ricci, F.: The impact of data obfuscation on the accuracy of collaborative filtering. Expert Systems with Applications 39(5), 5033–5042 (2012)

    Article  Google Scholar 

  3. Berlioz, A., Friedman, A., Kâafar, M.A., Boreli, R., Berkovsky, S.: Applying differential privacy to matrix factorization. In: Proceedings of the 9th ACM Conference on Recommender Systems, RecSys 2015, Vienna, Austria, pp. 107–114 (2015). doi:10.1145/2792838.2800173

  4. Bhaskar, R., Laxman, S., Smith, A.D., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, Washington, DC, USA, pp. 503–512 (2010). doi:10.1145/1835804.1835869

  5. Bilge, A., Gunes, I., Polat, H.: Robustness analysis of privacy-preserving model-based recommendation schemes. Expert Systems With Applications 41(8), 3671–3681 (2014)

    Article  Google Scholar 

  6. Calandrino, J.A., Kilzer, A., Narayanan, A., Felten, E.W., Shmatikov, V.: “You Might Also Like”: Privacy risks of collaborative filtering. In: Proceedings of the 32nd IEEE Symposium on Security and Privacy, S&P 2011, Berkeley, CA, USA, pp. 231–246 (2011). doi:10.1109/SP.2011.40

  7. Canny, J.F.: Collaborative filtering with privacy. In: Proceedings of the 23rd IEEE Symposium on Security and Privacy, S&P 2002, Berkeley, CA, USA, pp. 45–57 (2002). doi:10.1109/SECPRI.2002.1004361

  8. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12, 1069–1109 (2011)

    MathSciNet  MATH  Google Scholar 

  9. Cheng, Z., Hurley, N.: Trading robustness for privacy in decentralized recommender systems. In: Proceedings of the 21st Conference on Innovative Applications of Artificial Intelligence, IAAI 2009, Pasadena, CA, USA (2009)

  10. Dwork, C.: Differential privacy: A survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, TAMC 2008, Xi’an, China, pp. 1–19 (2008). doi:10.1007/978-3-540-79228-4_1

  11. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Differential privacy – a primer for the preplexed. In: Joint UNECE/Eurostat work session on statistical data confidentiality. Tarragona, Spain (2011)

  12. Dwork, C., McSherry, F., Nissim, K., Smith, A.D.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference, TCC 2006, New York, NY, USA, pp. 265–284 (2006). doi:10.1007/11681878_14

  13. Erlingsson, Ú., Pihur, V., Korolova, A.: RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2014, Scottsdale, AZ, USA, pp. 1054–1067 (2014). doi:10.1145/2660267.2660348

  14. Friedman, A., Knijnenburg, B., Vanhecke, K., Martens, L., Berkovsky, S.: Privacy aspects of recommender systems. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 649–688. Springer, (2015)

  15. Friedman, A., Schuster, A.: Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, Washington, DC, USA, pp. 493–502 (2010). doi:10.1145/1835804.1835868

  16. Hardt, M., Talwar, K.: On the geometry of differential privacy. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, MA, USA, pp. 705–714 (2010). doi:10.1145/1806689.1806786

  17. Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems 5(4), 19 (2016)

    Google Scholar 

  18. Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D.: Principled evaluation of differentially private algorithms using DPBench. In: Proceedings of the International Conference on Management of Data, SIGMOD 2016, San Francisco, CA, USA, pp. 139–154 (2016). doi:10.1145/2882903.2882931

  19. Jeckmans, A.J., Beye, M., Erkin, Z., Hartel, P., Lagendijk, R.L., Tang, Q.: Privacy in recommender systems. In: Ramzan, N., van Zwol, R., Lee, J.S., Clüver, K., Hua, X.S. (eds.) Social Media Retrieval, pp. 263–281. Springer, (2013)

  20. Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 2011, pp. 193–204 (2011). doi:10.1145/1989323.1989345

  21. Klösgen, W.: Anonymization techniques for knowledge discovery in databases. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, KDD 1995, Montreal, Canada, pp. 186–191 (1995)

  22. Kobsa, A.: Privacy-enhanced web personalization. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, pp. 628–670. Springer, (2007)

  23. Koren, Y., Bell, R.: Advances in collaborative filtering. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 77–118. Springer, (2015)

  24. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  25. Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110(15), 5802–5805 (2013)

    Article  Google Scholar 

  26. Lam, S.K., Frankowski, D., Riedl, J.: Do you trust your recommendations? An exploration of security and privacy issues in recommender systems. In: Proceedings of the International Conference on Emerging Trends in Information and Communication Security, ETRICS 2006, Freiburg, Germany, pp. 14–29 (2006). doi:10.1007/11766155_2

  27. Li, T., Unger, T.: Willing to pay for quality personalization? Trade-off between quality and privacy. European Journal of Information Systems 21(6), 621–642 (2012)

    Article  Google Scholar 

  28. Machanavajjhala, A., Korolova, A., Sarma, A.D.: Personalized social recommendations - accurate or private? Proceedings of the VLDB Endowment 4(7), 440–450 (2011)

    Article  Google Scholar 

  29. McSherry, F., Mironov, I.: Differentially private recommender systems: Building privacy into the netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 627–636 (2009). doi:10.1145/1557019.1557090

  30. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proceedings of the 29th IEEE Symposium on Security and Privacy, (S&P 2008), Oakland, CA, USA, pp. 111–125 (2008). doi:10.1109/SP.2008.33

  31. Netflix spilled your Brokeback Mountain secret. http://www.wired.com/threatlevel/2009/12/netflix-privacy-lawsuit/. Accessed: July 2016

  32. Nikolaenko, V., Ioannidis, S., Weinsberg, U., Joye, M., Taft, N., Boneh, D.: Privacy-preserving matrix factorization. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, Berlin, Germany, pp. 801–812 (2013). doi:10.1145/2508859.2516751

  33. Ning, X., Desrosiers, C., Karypis, G.: A comprehensive survey of neighborhood-based recommendation methods. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 37–76. Springer, (2015)

  34. Parameswaran, R., Blough, D.M.: Privacy preserving collaborative filtering using data obfuscation. In: Proceedings of the IEEE International Conference on Granular Computing, GrC 2007, San Jose, CA, USA, pp. 380–386 (2007). doi:10.1109/GRC.2007.129

  35. Polat, H., Du, W.: Achieving private recommendations using randomized response techniques. In: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014, Singapore, Singapore, pp. 637–646 (2006). doi:10.1007/11731139_73

  36. Ricci, F., Rokach, L., Shapira, B. (eds.): Recommender Systems Handbook, 2nd edn. Springer, (2015)

  37. Said, A., Berkovsky, S., De Luca, E.W., Hermanns, J.: Challenge on context-aware movie recommendation: Camra2011. In: Proceedings of the ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, USA, pp. 385–386 (2011). doi:10.1145/2043932.2044015

  38. Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-based access control models. IEEE Computers 29(2), 38–47 (1996)

    Article  Google Scholar 

  39. Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Analysis of recommendation algorithms for e-commerce. In: Proceedings of the ACM Conference on Electronic Commerce, Minneapolis, MN, USA, pp. 158–167 (2000). doi:10.1145/352871.352887

  40. Sun, X., Kashima, H., Matsuzaki, T., Ueda, N.: Averaged stochastic gradient descent with feedback: An accurate, robust, and fast training method. In: Proceedings of the 10th IEEE International Conference on Data Mining, ICDM 2010, Sydney, Australia, pp. 1067–1072 (2010). doi:10.1109/ICDM.2010.26

  41. Sweeney, L.: \(k\)-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  42. Vallet, D., Friedman, A., Berkovsky, S.: Matrix factorization without user data retention. In: Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2014, Tainan, Taiwan, pp. 569–580 (2014). doi:10.1007/978-3-319-06608-0_47

  43. Weinsberg, U., Bhagat, S., Ioannidis, S., Taft, N.: BlurMe: Inferring and obfuscating user gender based on ratings. In: Proceedings of the 6th ACM Conference on Recommender Systems, RecSys 2012, Dublin, Ireland, pp. 195–202 (2012). doi:10.1145/2365952.2365989

  44. Zhou, Y., Wilkinson, D.M., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Proceedings of 4th International Conference on Algorithmic Aspects in Information and Management, AAIM 2008, Shanghai, China, pp. 337–348 (2008). doi:10.1007/978-3-540-68880-8_32

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Shlomo Berkovsky.

Appendix: Parameterisation

Appendix: Parameterisation

Here we briefly describe the parameterization of the differentially private algorithms. We detail the results obtained for the MovieLens-100K dataset and bounded differential privacy case; however, the same methodology was also applied to other datasets.

The goal of the parameterization was to set the most appropriate values of the MF and privacy parameters. To start with, the value of the learning rate parameter was set to \(\gamma =0.01\), as in other MF implementations (Koren and Bell 2015). Next, we optimized the regularization parameter \(\lambda \) and the number of latent factors d. For this, we defined a fixed test set of ratings and repeated the MF predictions for various combinations of values of \(\lambda \) and d. These combinations included the exhaustive set of pairs within the ranges \(\lambda \in [0.01,0.15]\) and \(d \in [1,25]\). For each value of the parameters, the RMSE of the predictions for the same test set was computed. Since a 3D plot of the RMSE is hard to corroborate, Fig. 6a, b shows the 2D projections of the plot obtained for the fixed values of \(\lambda \) and d. The best performing combination of \(\lambda =0.08\) and \(d=3\) was used in the unbounded experiments with the MovieLens-100K dataset.

Having set the parameters \(\lambda \) and d, we turned to the number of SGD/ALS iterations, k. For this, we gradually increased the number of iterations from \(k=1\) to \(k=15\), and for each value of k computed the RMSE obtained for the fixed test set. The results of this experiment are shown in Fig. 6c. As expected, the RMSE values stabilise starting from a certain value of k. For example, in this case RMSE is reasonably stable after \(k=7\), such that we parameterize the number of iterations to \(k=10\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Friedman, A., Berkovsky, S. & Kaafar, M.A. A differential privacy framework for matrix factorization recommender systems. User Model User-Adap Inter 26, 425–458 (2016). https://doi.org/10.1007/s11257-016-9177-7

Download citation

Keywords

  • Recommender System
  • Matrix Factorization
  • Stochastic Gradient Descent
  • Alternate Little Square
  • Differential Privacy