Keywords

1 Introduction

Recommendation systems are often used to help users to find products or services that could interest them. Collaborative Filtering (CF) is a prominent technique used in recommendation systems. CF-based recommendation systems collect and analyse user information to offer better and personalized user experience. However, aggregation and analysis of user information can cause privacy violation. Narayanan et al. [13] demonstrated how analyzing an individual’s historical ratings can reveal sensitive information such as user’s political preference, medical conditions and even religious disposition. Therefore, it is crucial for recommendation systems to protect the privacy of the users while simultaneously providing high-quality recommendations.

Differential privacy (DP) has become a popular tool in various domains to protect the privacy of users even if the adversary possesses a substantial amount of auxiliary information about the aggregated data [5]. Several studies have proposed differential privacy based CF mechanisms [11, 14, 18] to safeguard against privacy attacks in recommendation systems. However, most of the existing mechanisms imply that the data aggregator (DA) is trusted. Unfortunately, many DAs are inclined to collect more data than required and abuse the privacy of users for their benefits. Due to the concerns over untrusted DAs, many researchers [1, 10, 15, 16] have adopted Local Differential Privacy (LDP) for collaborative filtering. LDP based CF requires each user to locally perturb their data and sends the perturbed data to DA. However, this approach yields low prediction accuracy compared to DP based CF because each user’s data is noised locally as opposed to adding noise to aggregates of the user’s data. Therefore, it is necessary to design a LDP based recommendation system where each user can protect the privacy of their data from DA and at the same time, DA can perform recommendations with satisfactory prediction accuracy.

Our work aims to design a novel LDP based recommendation system which yields high data utility under strong privacy guarantee. We perturb user’s original ratings locally using a Bounded Laplace mechanism (BLP) before sending to the DA. Furthermore, we reduce the prediction error by using MF with MoG at the DA. We estimate the added BLP noise using MoG [4], and Expectation-Maximization (EM) method is used to estimate the parameters of MoG. We demonstrate that our BLP-based recommendation system can provide substantial privacy protection and meanwhile achieve a satisfactory recommendation accuracy. The contribution of our work is as follows:

  • We use a Bounded Laplace mechanism (BLP) to perturb each user’s ratings locally in their devices. To the best of our knowledge, this is the first work which uses BLP to perturb each user’s rating in recommendation systems. BLP ensures that the perturbed ratings fall within a predefined output domain without violating the principles of LDP. Additionally, BLP does not require complex computations to be performed in the user’s side contrary to some existing solutions which require users to calculate their latent factors locally in their devices.

  • We significantly improve the rating prediction accuracy of LDP based recommendation system. Local rating perturbation induces large error which grows linearly with the number of users and items. However, BLP compared to the Laplace mechanism introduces limited noise to aggregated ratings. Additionally, MoG is used to model the noise before MF to further increase the prediction accuracy. We demonstrate empirically using Movielens and Jester datasets that our proposed method can achieve satisfactory prediction accuracy under strong privacy guarantee and outperforms the works of [1] and [16].

  • The communication cost of our proposed method is significantly less compared to other existing solutions, such as [16] as our method only requires users to transmit the perturbed ratings once to the DA, so there is no additional communication cost is introduced, unlike other methods that involve multiple iterations of information exchanges between a user and the DA.

2 Related Work

LDP is used to protect the user’s privacy against untrusted DA in many applications. For example, Google uses LDP to collect each user’s chrome usage statistics privately [6]. Likewise, LDP is also used in CF to protect the privacy of users. For instance, [15] introduced an LDP based rating perturbation algorithm which perturbs users’ preference within an item category. Even though this mechanism hides a user’s preference towards an item from an untrusted data aggregator, it can still reveal users’ preferences towards an item category. Hua et al. [10] proposed another LDP based Matrix Factorization for untrusted DA. In their method initially, item profile vectors are learned using a private matrix factorization algorithm. Then these item vectors are sent to the user to derive user profile vectors. As each user’s profile vectors do not depend on other users’ data, they can easily compute their profile vectors locally instead of centrally. Users send their updated item profile vectors back to DA which then used to update the item profile vectors. The method used an objective function perturbation to achieve differential privacy. However, this method adds additional processing and communication overhead at user side.

Shin et al. [16] also proposed a method similar to [10] which requires the DA to send item profile vectors to each user. However, [16] used a randomized response perturbation mechanism instead of the objective perturbation mechanism and users send back the gradient instead of latent factors to DA. Their method also induces more communication and processing cost as users locally compute their user profile vectors over multiple iterations. Another LDP based rating perturbation mechanism was proposed by [1] where the original ratings are perturbed using Laplace mechanism. However, this proposed method used a clamping method to restrict the out-of-range ratings and used off-shelf optimization problems solvers such as SGD (Stochastic Gradient Descent) and ALS (Alternating Least Squares) in their MF algorithm.

3 Local Differential Privacy Based Recommendation System

In this work, we consider an untrustworthy data aggregator with whom the users are not willing to share any sensitive information. In our proposed system the original ratings are perturbed using Bounded Laplace mechanism and perturbed ratings are aggregated by DA. At DA, we use a MF with MoG for noise estimation and rating predictions. Our proposed rating prediction model will help the DA to reconstruct the original ratings from perturbed ratings without violating the privacy of users. Dwork et al. [5] proved that any mechanism that satisfies \(\varepsilon \)-differential privacy is resilient to post-processing. It implies that our perturbed rating from the local differentially private mechanism can be utilised in further processes without producing any additional privacy risk. Figure 1 shows the system architecture for the proposed recommendation system.

Fig. 1.
figure 1

Local differential privacy based recommendation

3.1 LDP Rating Perturbation

Rating Normalization. As different recommendation systems use distinct rating scales, to produce a generalized theoretical model, we adopt the Min-Max scaling approach to normalize the rating scale between 0 and 1. Given an actual rating of \(r^\circ \), the normalized true rating r can be generated as:

$$\begin{aligned} r=\frac{r^\circ -r^\circ _{min}}{r^\circ _{max}-r^\circ _{min}} \end{aligned}$$
(1)

in which \(r^\circ _{max}\) is the highest possible rating score on the rating scale, and \(r^\circ _{min}\) is the lowest. Local sensitivity is the maximum change rating perturbation mechanism can cause in a rating dataset, which is the difference between the maximum and the minimum rating. In a normalized dataset, the maximum rating is 1 and the minimum rating is 0. Therefore, the local sensitivity of the rating perturbation mechanism is \(\varDelta {r}=1\).

3.2 Bounded Laplace Mechanism

Our system perturbs the user’s normalized rating using the BLP mechanism. Bounded Laplace mechanism is used to sanitize the output results of the Laplace mechanism with bounding constraints. BLP satisfies \(\epsilon \)-DP by ignoring out of bound values and re-samples noise for a given input rating r until a value within the given bound is obtained. BLP mechanism can be defined as follows:

Definition 1

(Bounded Laplace Mechanism). Given a scale parameter b and a domain rating interval of (lu), the Bounded Laplace mechanism \(M_{BLP}:R\rightarrow {R^*}\) is given by a conditional probability density function as follows:

$$\begin{aligned} f_{r^*|r}(r^*|r) = {\left\{ \begin{array}{ll} \frac{1}{C_r(b)}\frac{1}{2b}exp(-\frac{|r^*-r|}{b}) ,&{} \text {if } r^* \in [l,u]\\ 0, &{} \text {if } r^* \notin [l,u] \end{array}\right. } \end{aligned}$$
(2)

where \(C_r(b) = \int _{l}^{u}\frac{1}{2b}exp(-\frac{|r^*-r|}{b})dr^*\) is a normalization constant dependent on input rating r and \(r^*\) is the perturbed output.

The normalization constant \(C_r(b)\) can be given as:

$$\begin{aligned} C_r(b)=1-\frac{1}{2}\bigg (exp(-\frac{r-l}{b})+exp(-\frac{u-r}{b}) \bigg ) \end{aligned}$$
(3)

It can be easily proven that the result of integration will yield Eq. (3). It has been shown in [9] that when local sensitivity \(\varDelta {f}=l-u\), BLP mechanism satisfies \(\varepsilon \)-local differential privacy. Using BLP in our proposed mechanism ensures that the perturbed output range is limited to [lu]. However, the mechanism still guarantees that the adversary is unable to obtain any information about the original data by observing the output and thus preserves the privacy of the user. The privacy budget \(\varepsilon \) will be determined by the DA and will be shared with user when they register with DA. The BLP mechanism (as given in Algorithm 1) will run every-time a user want to send rating to DA.

figure a

3.3 Noise Estimation with MoG

Let \(R_{m \times n}\) be the original normalized rating matrix and \(R^*_{m \times n}\) be the perturbed rating matrix of m users over n items. The perturbed ratings can be decomposed as:

$$\begin{aligned} R^*=R+E \end{aligned}$$
(4)

where \(E_{m \times n}\) consists of BLP noise. Each element in the noised rating matrix can be represented as:

$$\begin{aligned} r^*_{ij}=r_{ij}+e_{ij} = (u_i^T)v_j+e_{ij} \end{aligned}$$
(5)

where \(u_i\) is a column vector in user latent factor matrix U and \(v_j\) is a column vector in item latent factor matrix V. As any unknown noise distribution can be modelled as a mixture of Gaussian, we assume that noise \(e_{ij}\) in Eq. (4) is drawn from MoG distribution [4]:

$$\begin{aligned} p(e_{ij} \mid \varPi , \varSigma ) \sim \sum _{k=1}^K \pi _k \mathcal {N}(e_{ij} \mid 0,\sigma _k^2) \end{aligned}$$
(6)

where \(\varPi =(\pi _1,\pi _2,....\pi _k)\), \(\varSigma =(\sigma _1,\sigma _2,....\sigma _K)\), \(\sigma ^2_k\) is the variance of Gaussian component k and K is the total number of Gaussian components. \(\pi _k\) is the mixing proportion and \(\sum _{k=1}^K \pi _k=1\). Therefore, the probability of each perturbed rating \(r^*_{ij}\) of R can be represented as:

$$\begin{aligned} p(r^*_{ij}\mid u_i,v_j,\varPi ,\varSigma ) =\sum _{k=1}^{K}\pi _k\mathcal {N}(r^*_{ij}\mid (u_i^T)v_j,\sigma _k^2) \end{aligned}$$
(7)

The likelihood of \(R^*\) can thus be given as:

$$\begin{aligned} p(R^* \mid U,V,\varPi ,\varSigma )=\prod _{i,j \in \varOmega } \sum _{k=1}^{K}\pi _k\mathcal {N}(r^*_{ij}\mid (u_i^T)v_j,\sigma _k^2) \end{aligned}$$
(8)

where \(\varOmega \) is the set of non-missing data points in perturbed rating matrix \(R^*\). Given a dataset \(R^*\), our goal is to compute the parameters \(U, V,\varPi \) and \(\varSigma \) such that the maximum log-likelihood of \(R^*\) is achieved.

$$\begin{aligned} \begin{aligned}&\max _{U,V, \varPi ,\varSigma }\ \log p(R^* \mid U,V,\varPi , \varSigma ) \\&= \sum _{i,j \in \varOmega ,}\log \sum _{k=1}^{K} \pi _k\mathcal {N}(r^*_{ij} \mid (u_i^T)v_j,\sigma _k^2) \end{aligned} \end{aligned}$$
(9)

3.4 Expectation Maximization for MoG

As maximum log-likelihood function given in Eq. (9) cannot be solved using a closed-form solution, Expectation-Maximization (EM) algorithm is used to estimate model parameters \(U,V,\varPi \) and \(\varSigma \). The EM algorithm introduced in [4] has two steps, Expectation and Maximization. In E-step we compute posterior responsibility using the current model parameters \(U,V,\varPi \) and \(\varSigma \) for each noise point \(e_{ij}\) as:

$$\begin{aligned} \gamma _{ijk} = \frac{\pi _k\mathcal {N}(r^*_{ij} \mid (u_i^T)v_j,\sigma _k^2)}{\sum _{k=1}^{K} \pi _k\mathcal {N}(r^*_{ij} \mid (u_i^T)v_j,\sigma _k^2)} \end{aligned}$$
(10)

The posterior responsibility reflects the probability that it is Gaussian component k generates the noise data point \(e_{ij}\). In M-step we re-estimate each model parameter \(U,V,\varPi ,\varSigma \) using the posterior responsibilities such that the maximum log-likelihood Eq. (11) is obtained [12].

$$\begin{aligned} \max _{U,V, \varPi ,\varSigma } \sum _{i,j \in \varOmega } \sum _{k=1}^{K} \gamma _{ijk}\bigg (\log \pi _k -\log \sqrt{2\pi }\sigma _k-\frac{(r^*_{ij}-(u_i^T)v_j)^2}{2\sigma _k^2}\bigg ) \end{aligned}$$
(11)

To solve the problem given in Eq. (11), we first update \(\varPi \) and \(\varSigma \):

$$\begin{aligned} N_k= \sum _{\forall i,j} \gamma _{ijk} \end{aligned}$$
$$\begin{aligned} \pi _k=\frac{N_k}{N} \end{aligned}$$
$$\begin{aligned} \sigma _k^2=\frac{1}{N_k}\sum _{\forall i,j} \gamma _{ijk}(r^*_{ij}-(u_i^T)v_j)^2 \end{aligned}$$
(12)

where \(N_k\) is the sum of posterior responsibilities for kth Gaussian component and N is the total number of data points. The portion of Eq. (11) related to U and V can be rewritten as:

$$\begin{aligned}&\sum _{i,j \in \varOmega } \sum _{k=1}^{K} \gamma _{ijk}\bigg (-\frac{(r^*_{ij}-(u_i^T)v_j)^2}{2\sigma _k^2}\bigg ) \nonumber \\ =&-\sum _{i,j \in \varOmega } \bigg (\sum _{k=1}^{K}\frac{\gamma _{ijk}}{2\sigma _k^2} \bigg ) ((r^*_{ij}-(u_i^T)v_j)^2) \nonumber \\ =&- \mid \mid W \odot (X-UV^T) \mid \mid _{L_2} \end{aligned}$$
(13)

where W is the weight matrix in which the element \(w_{ij}\) is the weight for rating \(r_{ij}\) and can be defined as:

$$\begin{aligned} w_{ij} = {\left\{ \begin{array}{ll} \sqrt{\sum _{k=1}^{K}\frac{\gamma _{ijk}}{2\sigma _k^2}} ,&{} \text {if } i,j \in \varOmega \\ 0, &{} \text {if } i,j \notin \varOmega \end{array}\right. } \end{aligned}$$
(14)

The problem defined by Eq. (13) is equivalent to a weighted L2 low rank matrix factorization problem and any weighted L2-norm solvers such as WPCA [3], WLRA [17] and DN [2] can be used to solve it. We used WPCA in our evaluation. The process of our noise estimation and rating prediction is given in Algorithm 2. The convergence is achieved when the change between two consecutive U latent factor matrices is smaller than a predefined threshold or if the maximum number of iterations is reached.

figure b

4 Evaluation

In this section, we discuss the evaluation of our proposed BLP based MF with MoG approach (BLP-MoG-MF). To demonstrate the effectiveness of our proposed approach, we compare it with the following methods:

  • Non-Private Matrix Factorization (Non-Private MF): This is the baseline method we compare our approach with. This method does not perturb any user’s ratings and uses SGD based matrix factorization for rating prediction. RMSE value of the baseline method reflects the lower bound for prediction error that can be obtained without any privacy constraints.

  • Input Perturbation SGD Method (ISGD) [1]: ISGD method perturb ratings using Laplace mechanism and clamp the resulting perturbed ratings using a clamping parameter locally at the user’s device. DA uses MF with SGD method for rating prediction.

  • Private Gradient-Matrix Factorization (PG-MF) [16]: In this method initially DA computes item latent factors and sends to each user. Then each user computes their latent factors locally in their device and submits a perturbed gradient to DA. DA updates the item latent factors using the aggregated perturbed gradients from each user.

4.1 Datasets

We used two popular public rating datasets in our evaluation: Movielens [8] and Jester [7]. Among several different version of Movielens dataset, we used the dataset which consists of 100k ratings of 1682 movies rated by 943 users. The minimum rating given is 0.5 and the maximum rating is 5. The Jester dataset consists of 2M ratings of 100 jokes rated by 73,421 users. The minimum rating given in this dataset is −10 and maximum rating given is +10.

4.2 Evaluation Metrics

We measure the accuracy of prediction using the metric Root Mean Squared Error (RMSE) given by:

$$\begin{aligned} RMSE=\sqrt{\frac{\sum _{i=0}^{n-1}{(r_i-\hat{r_i})^2}}{n}} \end{aligned}$$
(15)

in which \(r_i\) is the actual rating, \(\hat{r_i}\) is the predicted rating and n is the total number of ratings. We use 10-fold cross-validation to train and evaluate our proposed BLP-MoG-MF approach for both Movielens and Jester datasets over various privacy budget \(\varepsilon \). The prediction accuracy is dependent on the privacy budget \(\varepsilon \), higher values of \(\varepsilon \) lead to weaker privacy protection levels. As there can be discrepancies while introducing noise through the Bounded Laplace mechanism, the computed RMSE is averaged across multiple runs.

4.3 Results

Bounded Laplace Noise Distribution. In this experiment, we generated noise samples using Laplace and Bounded Laplace mechanisms. We generated 100,000 random noise samples for both mechanisms while setting their privacy budget \(\epsilon \) to 0.1 and 1 respectively using Movielens dataset. Figure 2(a) and (b) illustrates the frequency of noise under Laplace and Bounded Laplace mechanism. The noise samples generated using Laplace mechanism approximates to a Laplace distribution while the noise samples generated using BLP produces an unknown continuous distribution. As BLP follows a conditional probability density function (see Definition 1), it no more produced noise that can approximate to a Laplace distribution. Hence, MoG is effective in estimating the noise generated by BLP.

Fig. 2.
figure 2

Laplace vs Bounded Laplace Noise Distribution

Prediction Accuracy over Various Privacy Budget. In this experiment, we compare the prediction accuracy of BLP-MoG-MF with other two LDP based methods. First, we compare the prediction accuracy of our BLP-MoG-MF with PG-MF [16] by varying the privacy budget \(\varepsilon \) from 0.1 to 1.6 for Movielens dataset. Figure 3 shows the RMSE values for both methods and the baseline method. As expected prediction accuracy of all the methods except the Non-private MF method improves with increase in privacy budget. Because, an increase in privacy budget implies that the magnitude of privacy leakage the mechanism allows is substantial, which in turn, leads to an increase in the utility, i.e. the prediction accuracy.

Secondly, we compare BLP-MoG-MF with ISGD [1] by varying the privacy budget \(\varepsilon \) from 0.1 to 3 for both Movielens and Jester datasets. Figure 4(a) and (b) illustrate the RMSE values for both methods and Non-Private MF. The RMSE values of Jester dataset are larger than that of Movielens as Jester is sparser than Movielens. Figure 4(a) and (b) shows that the prediction accuracy of BLP-MoG-MF outperforms ISGD for all values of privacy budget \(\varepsilon \). The results also show that BLP-MoG-MG produces a higher increase in prediction accuracy for Jester than Movielens for all the values of privacy budget \(\varepsilon \). Jester dataset RMSE values show \(35 \%\) and \(28 \%\) of improvement in prediction accuracy when privacy budget \(\varepsilon \) is 0.1 and 1 respectively. However, the improvement percentage for Movielens dataset is \(21 \%\) and \(16 \%\) for the same values of privacy budget \(\varepsilon \). This implies that BLP-MoG-MF outperforms ISGD even better when the data is sparse.

Communications Cost. We compare the communication cost of our approach to [16] and [1] in Table 1. In BLP-MoG-MF and ISGD approaches, regardless of the number of items that a user rates, the user always transmits each perturbed rating individually to the DA, once. PG-MF method requires the user to transmit only the perturbed gradient to DA. BLP-MoG-MF and ISGD methods do not require the DA to transmit any information back to the user. However, PG-MF method requires the DA to transmit updated item latent vectors matrix back to the user. The estimated transmission size for PG-MF method is approximately 0.15 MB for Movielens dataset [1], whereas BLP-MoG-MF and ISGD methods will be transmitting approximately 1 byte of data each time user send their data to DA.

Fig. 3.
figure 3

PG-MF vs BLP-MoG-MF RMSE comparison

Fig. 4.
figure 4

BLP-MoG-MF vs ISGD RMSE comparison

Table 1. Communication cost comparison

5 Conclusion

In this work, we propose a local differentially private matrix factorization with mixture of Gaussian (BLP-MoG-MF) method under the consideration of an untrustworthy data aggregator. Our proposed recommendation system guarantees strong user privacy and completely hides a user’s preferences over an item from DA. It also pursues better prediction accuracy than the existing LDP based solutions [16] and [1]. Additionally, our method does not incur any additional communication cost to the user side. In future, we intend to explore approaches to improve the robustness of achieved local minima for the non-convex cost function used in MF with MoG.