Introduction

With the rapid development of mobile Internet, digital information, Internet of Things and other technologies, mobile commerce have changed significantly. Under this background, Point-of-interest (POI) recommendation service is widely used. POIs usually represent localization service places including amusement park, restaurant, chess room, cinema, tea shop, supermarket, etc. Like traditional commodity recommendation service [1,2,3,4], the common POI recommendation service analyze user’s behavior information at the check-in POIs, calculate users’ similarity, and recommend unchecked-in POIs to the target users [5, 6]. Many techniques such as matrix factorization, attentional memory network and graph convolutional network have been applied to POI recommendation methods [7,8,9,10,11,12]. Nowadays, POI recommender system has been embedded in many popular platforms, such as foreign platforms (e.g. Foursquare and Gowalla) and domestic platforms (e.g., Meituan and Dianping). Users can share their experience, ratings, and comments of each POI on these platforms. The platforms provide POI recommendation services to target users through analyzing users’ massive ratings, comments, and relevant behavior information. Despite the benefits that can come from collecting data, users are exposing sensitive and private information with possibly untrustworthy entities. These entities can process, analyze and mine data to extract useful information, but also sell and/or share the collected data with third parties, using it maliciously. Some scholars realize the importance of privacy-preserving in POI recommendation and design a certain number of methods to protect users’ information [13,14,15]. These methods adopt several strategies such as privacy parameter optimization, tuning the influence of disturbances, and controlling the modeling errors [16, 17] to solve the dilemma between privacy protection effect and recommendation quality. Although the existing methods promote the advancement of POI recommendation, there are still some shortcomings:

  1. 1.

    The most of privacy-preserving techniques used in current POI recommendation methods are centralized protection techniques (i.e., based on the third-party assumption of security) and relatively simple which may lead to risk of information disclosure.

  2. 2.

    The response-ability to complex contexts is insufficient, that is, the multiple factors are not fully considered in current POI recommendation methods. This case may lead to poor recommendation performance.

The purpose of this paper is to solve the above deficiencies and provide more accurate POI recommendation results while the users’ privacy information can be well protected. Unlike existing approaches, we analyze the main cause of issues and propose a hybrid POI recommendation model based on local differential privacy (LDP). Firstly, randomized response techniques k-RR [18] and RAPPOR [19] are introduced to disturb users’ ratings and social relationships, respectively. Secondly, user preference, social relationship, forgetting feature, check-in trajectory, geographical correlation of POIs and categories of POIs are combined to make the recommendation performance better. Detailed numerical analysis on three real-world datasets shows that the presented method outperforms other state-of-the-art methods.

The innovations and contributions are as follows:

  1. 1.

    Facing user privacy protection, we disturb users’ ratings and social relationships using local differential privacy technique. To solve the problem of missing check-in time after disturbance, a virtual check-in time generation method is proposed.

  2. 2.

    We propose a hybrid POI recommendation model containing three sub-models to improve recommendation effect. Sub-model 1 designs a novel similarity calculation method combining user preference, social relationship, forgetting feature, and check-in trajectory. Sub-model 2 focuses on geographical correlation of POIs. Sub-model 3 focuses on categories of POIs.

  3. 3.

    We use emotional score (i.e. considers emotional intensity and emotional polarity) to reflect user preference of comments. According to the forgetting feature, an effective forgetting function is designed.

To sum up, the advantage of our approach is that it can effectively protect users’ privacy information and provide high-quality recommendation results at the same time. Theoretically, our study contributes to the effective and safe usage of multidimensional data science and analytics for privacy-preserving POI recommender system design. Practically, our findings can be used to improve the quality of POI recommendation services. The rest of this paper is organized as follows. We introduce the related works in “Literature review”. A hybrid POI recommendation model based on local differential privacy is constructed in “The hybrid POI recommendation model based on LDP”. The experimental results on three real-world datasets are described in “Experimental results”. In “Discussion”, we discuss the experimental results. Finally, we conclude the whole paper in “Conclusions and future work”.

Literature review

In recent years, personalized recommendation methods have been rapidly developed, most of them are focusing on the methods’ accuracy and ignore problems related to security and the users’ privacy. Therefore, some scholars pay attention to the studies on privacy-preserving recommendation methods that can achieve safe and effective recommendation services. Despite the efforts to overcome these issues employing different risk reduction techniques, none of them has been completely successful in ensuring security of the users’ private information. In this section, we will introduce some representative privacy-preserving recommendation methods which are committed to protecting individual privacy during the recommendation process.

Liu et al. [20] presented an efficient privacy-preserving social POI recommender system (PPS-POI-Rec) which generated recommendation results relying on the cooperation between SNS provider and LBS provider. In the recommendation process, they protected user privacy information. Yin et al. [21] proposed a privacy-preserving POI recommendation method using differential privacy technique. They set a threshold to classified location sensitivity levels through analyzing the users’ trajectories and check-in frequencies. Chen et al. [22] proposed a privacy-preserving POI recommendation (PriRec) framework. They analyzed the features of static data and dynamic data derived from users and designed a linear model and a feature interaction model. In terms of privacy, they presented a secure iterative solution method to protect user privacy. Wang et al. [23] proposed a group preference-based privacy-preserving POI recommendation scheme. They designed anonymous ad hoc wireless peer-to-peer communication to protect users’ privacy. Kuang et al. [24] constructed a model of users’ check-in sequences based on hidden Markov model (HMM), then used EM algorithm to estimate the parameters. After that, they presented a weighted noise injection method to protect users’ location information and predicted user’s next movement. Wang et al. [25] proposed a privacy-preserving POI recommendation method using deep learning in location-based social networks. Taiwo et al. [26], proposed a novel privacy-preserving framework using a homomorphic encryption scheme for cross-domain recommender systems that provided a generic template for other secure cross-domain recommender systems. Huo et al. [27] proposed GLP algorithm and FRP algorithm to protect geographical location and friend relationship respectively. Specifically, they designed a virtual circle to obscure the exact location of user and used Laplacian differential privacy to disturb the friend relationship. Finally, they integrated the two privacy-preserving algorithms into recommender systems. Zhang et al. [28] proposed an LDP-friendly POI recommendation method based on improved Hawkes process (HawkesRec). They also introduced the LDP technique to protect user privacy Information. Selvi and Kavitha [29] proposed a stacked discriminative de-noising convolution auto-encoder–decoder with a two-way recommendation scheme to deal with the issues that derive from the lack of security constraints. Himeur et al. [30], discussed the security and privacy challenges in recommender systems. They pointed out that the blockchain technology was presented as a promising strategy to promote security and privacy preservation in recommender systems, not only because of its security and privacy salient features, but also due to its resilience, adaptability, fault tolerance and trust characteristics.

To sum up, scholars use different privacy-preserving techniques to protect users’ information and combine many factors such as user profiles, social relationship, geographical information, and temporal information to ensure the better performance of privacy-preserving recommendation. However, the privacy protection strategy is relatively simple and most of protection techniques are centralized protection techniques. In addition, simultaneously combining multiple types of information is insufficient that leads to poor performance.

The hybrid POI recommendation model based on LDP

Local differential privacy (LDP) is proposed as a distributed variant of differential privacy, which locally perturbs the data of each user on the client-side [31,32,33]. It also inherits the comprehensive characteristics of centralized differential privacy. By using randomized response techniques such as W-RR [34], MeanEst [35], Harmony-mean [36], LDP can resist privacy attacks from untrusted third-party with arbitrary background knowledge.

Definition 1

Given a privacy-preserving algorithm M and any two items t and t’ (\(t,t^{\prime} \in Dom(M)\)) derived from user relevant dataset. If algorithm M gets the same output t^{*}(\(t^{*} \subseteq Ran(M)\)), which satisfies the following inequality. Then M satisfies \(\varepsilon\)-local differential privacy.

$$ \Pr \left[ {M(t) = \mathop t\limits^{*} } \right] \le e^{\varepsilon } \times \Pr \left[ {M(t^{\prime}) = \mathop t\limits^{*} } \right] $$
(1)

in which \(\varepsilon\) represents privacy budget, which is greater than 0. The smaller the privacy budget is, the stricter the privacy-preserving will be.

The recommendation process of the proposed hybrid POI recommendation model based on local differential privacy (LDP) is shown in Fig. 1. In User terminal module, we utilize LDP techniques to disturb user’s relevant information. For users, attackers can usually capture the user’s preference for a certain POI based on his/her rating information. In the same way, attackers can also infer the check-in probability of a certain POI by a target user based on other associated users’ check-in behavior in social network. Therefore, we mainly protect users’ ratings and social relationship. In Server module, three sub-models are designed to calculate different similarities. The functions of sub-models can be briefly described as follows. Sub-model 1 integrates user preference, social relationship, forgetting feature, and check-in trajectory into similarity calculation and gets check-in probability. Sub-model 2 gets check-in probability through analyzing geographical correlation of POIs. Sub-model 3 analyzes categories of POIs and gets check-in probability. Finally, we generate comprehensive recommendation results. The adoptive LDP techniques and three sub-models are introduced in detail below.

Fig. 1
figure 1

The recommendation process of the hybrid POI recommendation model based on LDP

User information protection

In this section, we introduce two kinds of LDP techniques to disturb users’ ratings and social relationships, respectively. Specially, we propose a virtual check-in time generation method to solve the issue of missing check-in time after disturbance.

Ratings disturbance using k-RR

This paper introduces randomized response technique k-RR to disturb users’ ratings. k-RR proposed by Kairouz et al., [18] mainly overcomes the problem of W-RR which only deals with binary variables. For the case that k (k > 2) candidate values are contained in variables, k-RR can deal with it and output corresponding results directly.

Definition 2

Given the candidate set \(\chi\) and \(|\chi | = k\),for any input \(R \in \chi\), the response output \(R^{\prime} \in \chi\) is.

$$ P(R^{\prime}|R) = \frac{1}{{k - 1 + e^{\varepsilon } }}\left\{ \begin{gathered} e^{\varepsilon } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} R^{\prime} = R \hfill \\ 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} R^{\prime} \ne R \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} $$
(2)

The probability of \(\frac{{e^{\varepsilon } }}{{k - 1 + e^{\varepsilon } }}{\kern 1pt}\) responds to the real result, while the probability of \(\frac{1}{{k - 1 + e^{\varepsilon } }}{\kern 1pt}\) responds to one of the rest k-1 values. When k = 2, the form is the same as W-RR’s.

Assume that user ui (i = 1,2,…,n) has checked-in POI lj (j = 1,2,…,m), the corresponding rating is \(R_{{u_{i} l_{j} }}\). \(R_{{u_{i} l_{j} }} { = }0\) indicates that the POI has not been checked-in, while \(R_{{u_{i} l_{j} }}\) > 0 means checked-in, and the specific ratings {1,…, R} corresponds to the degree of satisfaction. The response output is as follows:

$$ P(R_{{u_{i} l_{j} }}^{{\prime}} |R_{{u_{i} l_{j} }} ) = \frac{1}{{k - 1 + e^{\varepsilon } }}\left\{ \begin{gathered} e^{\varepsilon } {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} R_{{u_{i} l_{j} }}^{{\prime}} = R_{{u_{i} l_{j} }} \hfill \\ 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} R_{{u_{i} l_{j} }}^{{\prime}} \ne R_{{u_{i} l_{j} }} \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} $$
(3)

in which \(R_{{u_{i} l_{j} }}^{{\prime}}\) represents the disturbed rating.

For a user, the POI that he has not checked-in (rating is 0 and no check-in time) in raw data, after disturbing randomly, responds to the other k-1 ratings with the probability of \(\frac{1}{{k - 1 + e^{\varepsilon } }}{\kern 1pt}\). So, we need to add virtual check-in time to match the corresponding ratings in disturbed data. Basic principle: in accordance with the time series characteristics of user check-in trajectory, we find out two checked-in POIs which are closest to the unchecked-in POI in raw data, and use their check-in times to generate the virtual check-in time for the unchecked-in POI which will turn checked-in POI in disturbed data.

Step 1: calculate the distance d(la, lb) between unchecked-in POI la and other checked-in POI lb \((b \in [1,m])\) in raw data. The formula of d(la, lb) is as follows:

$$ \begin{aligned} d\left( {l_{a} ,l_{b} } \right) = 2R\arcsin \sqrt {\sin^{2} \left(\frac{{lat_{{l_{a} }} - lat_{{l_{b} }} }}{2}\right) + \cos (lat_{{l_{a} }} )\cos (lat_{{l_{b} }} )\sin^{2} \left(\frac{{lon_{{l_{a} }} - lon_{{l_{b} }} }}{2}\right)}\end{aligned} $$
(4)

in which R represents the radius of the earth. Two-tuples \((lat,lon)\) represents the set of longitude and latitude.

Step 2: determine lc and lg which are closest to la;

Step 3: obtain the check-in time \(t_{{l_{c} }}\) and \(t_{{l_{g} }}\) of lc and lg respectively, suppose that \(t_{{l_{c} }}\) is earlier than \(t_{{l_{g} }}\);

Step 4: generate virtual check-in time \(t_{{l_{a} }}\):

$$ t_{{l_{a} }} = \frac{{d(l_{a} ,l_{c} )}}{{d(l_{a} ,l_{c} ){ + }d(l_{a} ,l_{g} )}}(t_{{l_{g} }} - t_{{l_{c} }} ) + t_{{l_{c} }} $$
(5)

Social relationship disturbance using RAPPOR

In this paper, randomized response technique RAPPOR [19] is introduced to disturb users’ social relationships. Let A(ui,ub) = 1 denote that user ui follows user ub, and A(ui,ub) = 0 denote that user ui does not follow user ub. So set of users followed by ui can be expressed as \(A_{{u_{i} }}\) = {0, 0, 1, 0, 0, …, 1, 0, 0, 0, 1}. Firstly, we disturb initial relationship A(ui, ub) to obtain the permanent randomized response result A’(ui,ub). The disturbance mode is carried out as the following formula, in which \(f \in [0,1][0,1]\) indicates the probability:

$$ A^{\prime}(u_{i} ,u_{b} ) = \left\{ \begin{gathered} 1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 0.5f \hfill \\ 0{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 0.5f \hfill \\ A(u_{i} ,u_{b} ){\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1 - f \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} $$
(6)

Then, we make a second disturbance that disturbs A’(ui,ub) to obtain the instantaneous randomized response result F(ui,ub). The second disturbance mode is carried out as the following formula.

$$ P(F(u_{i} ,u_{b} ) = 1) = \left\{ \begin{gathered} p{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A^{\prime}(u_{i} ,u_{b} ) = 1 \hfill \\ q{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A^{\prime}(u_{i} ,u_{b} ) = 0{\kern 1pt} \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} $$
(7)

in which \(p \in [0,1]\) and \(q \in [0,1]\) represent the probability of F(ui,ub) = 1 when A’(ui,ub) is 1 and 0 respectively.

Three recommendation sub-models

The sub-model combining multiple factors of user

In this sub-model, we fully consider user preference, social relationship, forgetting feature and check-in trajectory.

The analysis of user preference

We measure user preference from two aspects: ratings and comments. For calculation of comments, this paper uses emotional score to reflect preference of comments. Firstly, we need to distinguish the emotional degree (i.e., emotional intensity levels). After that, emotional polarity which contains positive emotion and negative emotion is proposed [37].

To be specific, in each comment, the emotional words can be assigned with different values (1–5 points) according to their emotional intensity levels. Then, the positive and negative emotions are assigned with their corresponding values based on the emotional polarity. In the case that a privative appears before an emotional word, the user’s real emotion is reflected by multiplying by a certain negative value. Specifically, when the absolute value of the emotional score exceeds 4, we multiply it by – 1 to obtain the final emotional score; when the absolute value of the emotional score is 1 to 3 points, we multiply it by – 0.5. By considering some special cases that the preceding privative slightly reverse the emotional polarity, for example, in the case of a privative appears before highly positive or negative adjectives, instead of reversing it directly, we multiply it by – 0.5. Finally, we analyze the influence of modal verbs in comments as the emergence of modal verbs may weaken the emotion of comments, such as “may”, “should”, and other words. This article multiplies it by 0.5 to weaken the corresponding emotional level. Based on the above principles, we segment the users’ comments and calculate the emotional scores. The emotional score of a comment is calculated by all the emotional words that appeared in the comment.

Assume that user ui comments on POI lj, the obtained comment contains w emotional words. So the formula of emotional score can be expressed as:

$$ R_{{u_{i} l_{j} }}^{{\prime\prime}} = \frac{{\sum\nolimits_{\sigma = 1}^{w} {Emotionalscore(word_{\sigma } )} }}{w} $$
(8)

Special attention should be paid to both positive and negative values of \(R_{{u_{i} l_{j} }}^{{\prime\prime}}\). Usually, we only consider the case that the emotional score is positive and the negative value set to 0. Finally, user preference \(R^{{\prime\prime}}_{{u_{i} l_{j} }}\) can be calculated as follows:

$$ R^{{\prime\prime}}_{{u_{i} l_{j} }} = R_{{u_{i} l_{j} }}^{{\prime\prime}} + R_{{u_{i} l_{j} }} $$
(9)

in which \(R_{{u_{i} l_{j} }}\) represents historical rating.

The analysis of user social relationship

This paper measures user social relationship based on trust transfer. If user ui follows user ub, it means that user ui directly trusts user ub, otherwise, it is an indirect trust relationship.

Given user set U = {u1,..,un}, the trust relationship between ui and ub can be expressed as:

$$ T(u_{i} ,u_{b} ) = \left\{ \begin{gathered} DTrust(u_{i} ,u_{b} ){\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A(u_{i} ,u_{b} ) = 1 \hfill \\ IDTrust(u_{i} ,u_{b} ){\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} A(u_{i} ,u_{b} ) = 0{\kern 1pt} \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} $$
(10)

in which A(ui,ub) = 1 indicates a direct trust relationship between ui and ub; A(ui,ub) = 0 indicates an indirect trust relationship between ui and ub.

The direct trust relationship DTrust(ui,ub) is measured by overlap degree of followed users and checked-in POIs.

$$ DTrust(u_{i} ,u_{b} ){\kern 1pt} { = }\frac{{A_{{u_{i} }} \cap A_{{u_{b} }} }}{{A_{{u_{i} }} \cup A_{{u_{b} }} }}{ + }\frac{{L_{{u_{i} }} \cap L_{{u_{b} }} }}{{L_{{u_{i} }} \cup L_{{u_{b} }} }} $$
(11)

in which \(A_{{u_{i} }}\) and \(A_{{u_{b} }}\) represent the user sets followed by ui and ub respectively.\(L_{{u_{i} }}\) and \(L_{{u_{b} }}\) represent the checked-in POI sets of ui and ub, respectively.

The indirect trust relationship IDTrust(ui,ub) can be calculated through trust transferring. If there are multiple paths, we select path according to the largest value of IDTrust(ui,ub). Specially, with the increasing of transfer path’s length, the user’s indirect trust will gradually decline. According to the Six-degree Separation Theory in social networks, the maximum length of trust transfer path is set to 6. The IDTrust(ui,ub) is calculated as follows:

$$ IDTrust(u_{i} ,u_{b} ){\kern 1pt} = \left\{ \begin{gathered} \prod\limits_{i = 1}^{b - 1} {DTrust(u_{i} ,u_{i + 1} ) \quad len(d) \le 6} \hfill \\ 0 {\kern 1pt} {\kern 36pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} len(d) > 6{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \hfill \\ \end{gathered} \right. $$
(12)

in which len(d) indicates the path’s length.

The similarity of user trajectory

This paper calculates similarity of user trajectory based on geographic location of checked-in POIs. The trajectory similarity simtrack(ui,ub) can be expressed as follows:

$$ sim_{track} (u_{i} ,u_{b} ) = \frac{1}{{\sum {d(l_{{u_{i} }} ,l_{{u_{b} }} )} }} $$
(13)

in which \(d(l_{{u_{i} }} ,l_{{u_{b} }} )\) can be calculated by formula (4).

The comprehensive preference

As we know, the user’s interest changes over time due to the forgetting feature. Therefore, an effective forgetting function is proposed in this paper to deal with the issue. The formula is as follows:

$$ h(t) = z\left(\frac{{t - t_{\min } }}{{t_{\max } - t_{\min } }}\right)^{2} + 1 - z{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} $$
(14)

in which t denotes the check-in time, z (\(0 \le z \le 1\)) denotes the forgetting coefficient, and tmin and tmax represent the earliest and latest check-in times, respectively. In formula (14), parameter z adjusts the rangeability of user interest. According to the average rangeability of users’ interests, we set the value of z to 0.5.

After that, user preference, social relationship, forgetting feature and check-in trajectory are integrated into the final similarity calculation. The formula can be expressed as follows:

$$ sim(u_{i} ,u_{b} ) = \frac{{\sum {(R^{{\prime\prime}}_{{u_{i} l_{j} }} - \overline{R}_{{u_{i} }} )h_{{u_{i} }} (t)\sum {(R^{{\prime\prime}}_{{u_{b} l_{j} }} - \overline{R}_{{u_{b} }} )h_{{u_{b} }} (t)} } }}{{\sqrt {\sum {((R^{{\prime\prime}}_{{u_{i} l_{j} }} - \overline{R}_{{u_{i} }} )h_{{u_{i} }} (t))^{2} } } \sqrt {\sum {((R^{\prime\prime}_{{u_{b} l_{j} }} - \overline{R}_{{u_{b} }} )h_{{u_{b} }} (t))^{2} } } }}sim_{track} (u_{i} ,u_{b} )T(u_{i} ,u_{b} ) $$
(15)

in which \(\overline{R}_{{u_{i} }}\) and \(\overline{R}_{{u_{b} }}\) denote the arithmetic mean of ratings of user ui and ub, respectively.

Finally, Rpreference(ui,lo) denotes the comprehensive preference score of user ui, and its formula is as follows:

$$ R_{preference} (u_{i} ,l_{o} ) = \sum\limits_{{u_{b} \in U}} {sim(u_{i} ,u_{b} )C(u_{b} ,l_{o} )} $$
(16)

in which C(ub,lo) represents the check-in record. If POI lo has been checked-in by user ub, then C(ub,lo) = 1; otherwise C(ub,lo) = 0. For subsequent calculation, we change the comprehensive preference score into check-in probability Ppreference(ui,lo) based on probabilistic processing.

The sub-model based on geographical correlation of POIs

The geographical correlation of POIs also influences the probability of check-in a POI. We use 2-dimensional kernel density estimation to calculate the check-in probability. Kernel density estimation [38, 39] can learn user’s historical checked-in location, and estimate the unknown probability distribution without user’s reference location or current location.

Assume that \(l_{v} = (lat_{v} ,lon_{v} )\) is a two-dimensional vector, latv and lonv represents the longitude and latitude, respectively. Then the check-in probability is:

$$ P_{geo} (u_{i} ,l_{o} ) = \frac{1}{{2\pi m\sigma^{2} }}\sum\limits_{v = 1}^{m} {{\text{e}}^{{( - \frac{1}{{2\sigma^{2} }}(l_{o} - l_{v} )^{T} (l_{o} - l_{v} ))}} } $$
(17)

in which \(\sigma\) represents the smoothing parameter. The optimal parameter is:

$$ \sigma = m^{{ - \frac{1}{6}}} \sqrt {\frac{1}{2}\hat{\sigma }^{T} \hat{\sigma }} $$
(18)

in which \(\hat{\sigma }\) represents marginal standard deviation.

The sub-model based on categories of POIs

In general, the categories of POIs can also influence user check-in behavior. In this paper, we use categories of checked-in POIs to make a multi-layer TF-IDF tree for the convenience of calculation. The constructed tree has many nodes, each node represents a category of checked-in POI (i.e., each category has parent category and contains sub-categories). The corresponding value reflects the preference degree. Because sub-category reflects the fine-grained interest of user generally, corresponding category level should be higher.

Assume that c denotes a category, the preference value of this category can be expressed as follows:

$$ {\text{TF}-\text{IDF}}(c) = \frac{{n_{c} }}{n}\log \frac{|L|}{{|L_{c} |}} $$
(19)

in which nc is the check-in times of POIs of category c, n is the check-in times of all POIs, |L| is the total number of POIs, and |Lc| is the number of POIs of category c.

The preference degree of user ui for a POI can be calculated by weighting the preference value of the corresponding category. We let \(C = \{ C_{1}^{{l_{o} }} ,C_{2}^{{l_{o} }} ,...,C_{H}^{{l_{o} }} \}\) denote category set of POI lo, in which H denotes category level. Rcate(ui,lo) denotes the preference of ui for POI lo. The calculation formula is as follows:

$$ R_{cate} (u_{i} ,l_{o} ) = \sum\limits_{{h \in \{ 1,2,...,H\} }} {\varphi {\text{TF - IDF}}(C_{h}^{{l_{o} }} )} $$
(20)

in which \(\varphi { = }\frac{\ln h}{H}\) indicates that high category level’s weight is larger. When the category level is higher, the difference between parent category and sub-category of recommendation results is smaller. Then, we change the preference degree into check-in probability Pcate(ui,lo) based on probabilistic processing.

The generation of privacy-preserving POI recommendation results

With privacy-preserving, the user’s raw rating data are changed into disturbed rating data, and the user’s social relationship data are changed into disturbed social relationship data. After that, the three sub-models are fused to calculate the comprehensive check-in probability of a POI. The formula can be expressed as follows:

$$ \begin{aligned} P_{final} {(}u_{i} {,}l_{o} ) & = \alpha P_{preference} {(}u_{i} {,}l_{o} )\\ & \quad + \theta P_{geo} {(}u_{i} {,}l_{o} {) + }\lambda P_{cate} {(}u_{i} {,}l_{o} {)} \end{aligned} $$
(21)

in which \(\alpha {,}\theta {,}\lambda \in [0,1],\alpha \,{ + }\,\theta \,{ + }\,\lambda { = }1\).

In the process of recommendation, we calculate user’s final check-in probability Pfinal(ui,lo), sort the unchecked-in POIs in descending order and then recommend Top-K POIs.

Experimental results

To test the stability and generalization ability of the proposed method, we introduce cross-validation [40]. In the experiment, we use fivefold cross-validation that means the experimental results are the mean values over 5 runs.

Datasets

In experiment, three datasets, two benchmark datasets Yelp and Gowalla and another crawled from Meituan platform, are used to verify the performance of the proposed method. These datasets are briefly described as follows.

YelpFootnote 1 is a popular local businesses platform where users score and comment restaurants, shopping, nightlife, home service, etc. In this paper, we use the open dataset containing 30,887 users, 18,995 POIs and 860,888 check-in records.

GowallaFootnote 2 is a famous location-based social platform where users share their check-in locations. The used open dataset contains 18,737 users, 32,510 POIs and 1,278,274 check-in records. The main categories of POIs are community, entertainment, food, nightlife, outdoors, shopping and travel.

MeituanFootnote 3 is a leading localized life service platform in China where users score and comment gourmet restaurants, hotels, home decoration, beauty salons, etc. We crawled data of Meituan platform from March 2019 to September 2019. The dataset contains 14,362 users, 11,233 POIs and 501,436 check-in records.

Tables 1, 2 and 3 show partial data derived from Yelp, Gowlla and Meituan datasets respectively.

Table 1 The data derived from Yelp dataset
Table 2 The data derived from Gowalla dataset
Table 3 The data derived from Meituan dataset

Evaluation metrics

Metrics of recommendation

In previous studies, scholars adopt many metrics to verify the performance of recommendation methods. Therefore, we select four widely used metrics including Precision, Recall, F-score, and Normalized discounted cumulative gain (nDCG) [1, 2, 7,8,9] to evaluate the performance of recommendation methods. The metrics are described as follows.

Precision is defined as the ratio of the number of recommended POIs which correctly appeared in the positive set to the total number of the recommended POIs. The higher the precision is, the better the recommendation performance will be.

$$ {\text{Precision}} = \frac{{|{\text{Positive}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{POIs}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \cap {\kern 1pt} {\kern 1pt} {\text{Recommended}}{\kern 1pt} {\kern 1pt} {\text{POIs}}|}}{{|{\text{Recommended}}{\kern 1pt} {\kern 1pt} {\text{POIs}}|}} $$
(22)

Recall is defined as the ratio of the number of recommended POIs which correctly appeared in the positive set to the total number of the positive POIs. The higher the recall is, the better the recommendation performance will be.

$$ {\text{Recall}} = \frac{{|{\text{Positive}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{POIs}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \cap {\kern 1pt} {\kern 1pt} {\text{Recommended}}{\kern 1pt} {\kern 1pt} {\text{POIs}}|}}{{|{\text{Positive}}{\kern 1pt} {\kern 1pt} {\text{POIs}}|}} $$
(23)

F-score is defined as the harmonic mean of precision and recall to comprehensive measure the performance. A higher F-score corresponds to better recommendation performance.

$$ {\text{F-score}} = \frac{{{2} \times {\text{Precision}} \times {\text{Recall}}}}{{\text{Precision + Recall}}} $$
(24)

nDCG is an evaluation metric of recommendation performance. The higher the nDCG is, the better the recommendation performance will be. The nDCGk can be expressed as follows:

$$ {\text{nDCG}}_{k} = \frac{{{\text{DCG}}_{k} }}{{{\text{IDCG}}_{k} }} $$
(25)
$$ {\text{DCG}}_{k} = \sum\limits_{i = 1}^{k} {\frac{{2^{{rel_{i} }} - 1}}{{\log_{2} (i + 1)}}} $$
(26)

in which reli denotes the graded relevance of the result ranked at position i. Generally, scholars use the binary relevance. k is the length of recommendation list. IDCGk denotes the ideal value.

Metrics of privacy-preserving

This paper uses privacy gain [27] to measure the performance of privacy-preserving. The privacy gain is the difference between the information entropy of disturbed output data and raw data. The greater the privacy gain is, the tighter the privacy-preserving will be.

Definition 3

.Assume that the probability distribution of discrete random variable X is.

$$ \left( \begin{gathered} x \hfill \\ \beta (x) \hfill \\ \end{gathered} \right) = \left( \begin{gathered} x_{1} ,x_{2} ,...,x_{n} \hfill \\ \beta_{1} ,\beta_{2} ,...\beta_{n} \hfill \\ \end{gathered} \right),\beta_{i} \in [0,1],\sum\limits_{i = 1}^{n} {\beta_{i} } = 1 $$
(27)

Then the information entropy of x is calculated as follows:

$$ H = - \sum\limits_{i = 1}^{n} {\beta_{i} } \times \ln \beta_{i} $$
(28)

The privacy gain of each user is calculated as follows:

$$ {\text{Privacy}}{\kern 1pt} {\kern 1pt} {\text{gain}}_{{u_{i} }} = H_{{u_{i} }}^{^{\prime}} - H_{{u_{i} }}^{{}} $$
(29)

in which \(H_{{u_{i} }}^{^{\prime}}\) is the information entropy of the disturbed output data and \(H_{{u_{i} }}^{{}}\) is the information entropy of the raw data.

The average privacy gain is.

$$ {\text{Privacy}}{\kern 1pt} {\kern 1pt} {\text{gain}} = \frac{{\sum\nolimits_{i = 1}^{n} {{\text{Privacy}}{\kern 1pt} {\kern 1pt} {\text{gain}}_{{u_{i} }} } }}{n} $$
(30)

Experimental design and compared baselines

In this paper, three main groups of experiments are designed to comprehensively evaluate the recommendation performance of the proposed method. The first group of experiments evaluate the method without privacy-preserving contrast with other methods. The second group of experiments evaluate the method based on privacy-preserving contrast with other methods. The third group of experiments evaluate the impact of privacy-preserving on recommendation performance of the proposed method. The compared baselines are briefly introduced below. In addition, we analyze the effect of data sparsity on recommendation results.

POI recommendation methods without privacy-preserving

The compared methods are described as follows.

Pearson [41] is a traditional method that calculates user similarity based on Pearson correlation coefficient.

IRenMF [42] is a POI recommendation method using matrix factorization technique.

CoRe [43] is a POI recommendation method integrating geographical location and social relationship.

UFC [44] is a POI recommendation method combining user preference, friendship, and check-in relevance.

DSPR [45]: is a POI recommendation method integrating user preference and real-time needs.

POI recommendation methods with privacy-preserving

The compared methods are described as follows:

PMLS [21]: this method achieves user privacy protection using Laplace mechanism in the recommendation process.

PPNPR [24]: this method uses the weighted noise injection to protect location information in the recommendation process.

PRGS [27]: this method protects information of geographical location and friend relationship in the recommendation process.

HawkesRec [28]: this method utilizes improved Hawkes process and local differential privacy to achieve user privacy protection in the recommendation process.

Experimental setup and results

First group of experiments

To test the performance of the proposed recommendation without considering privacy-preserving, we need to set the values of parameters at the first. The hybrid POI recommendation model proposed in this paper (named MFRM) contains three parameters \(\alpha {,}\theta {,}\lambda\), and \(\alpha {,}\theta {,}\lambda \in [0,1],\alpha { + }\theta { + }\lambda { = }1\). We select F-score as the objective function and solve the optimal value of each parameter. To find optimal value range quickly, we set the interval between each value as 0.02. After iterative calculation, on three real-world datasets (recommendation list K = 15), we find that when the parameters \(\alpha \in (0.64,0.66)\),\(\theta \in (0.19,0.21)\) and \(\lambda \in (0.14,0.16)\), the F-score is the largest, that is, the recommendation performance is the best. After that, we set the interval between each value as 0.01 and perform more fine-grained iterations, the results are shown in Table 4. Therefore, the parameters in this paper are set as \(\alpha = 0.65,\theta = 0.20,\lambda = 0.15\). Furthermore, we analyze the sensitivity of the parameters on recommendation performance. As shown in Table 4, parameter \(\alpha\) plays a major role on influencing the recommendation results, and the greater the value of \(\alpha\) is, the better the result will be. However, the results decrease when \(\alpha\) surpasses a certain threshold. In addition, the changes of parameters \(\theta ,\lambda\) have little influence on the results and smaller values of them would bring better recommendation performance.

Table 4 F-score varies with different parameters

And then, we observe recommendation performance with different lengths of recommendation list K (i.e. set K as 5, 10, 15 and 20 respectively) on three datasets. The experimental results are shown in Figs. 2, 3, and 4.

Fig. 2
figure 2

Comparison results on Yelp dataset

Fig. 3
figure 3

Comparison results on Gowalla dataset

Fig. 4
figure 4

Comparison results on Meituan dataset

As shown in Fig. 2, compare MFRM with Pearson, on Yelp dataset, when the number of recommendations is 15, the MFRM increases the values of precision, recall, F-score and nDCG by 52.68%, 53.52%, 52.90% and 32.57% respectively; Compared with IRenMF, MFRM increases the values of precision, recall, F-score and nDCG by 34.36%, 31.47%, 33.57% and 17.44% respectively; Compared with UFC, MFRM increases the values of precision, recall, F-score and nDCG by 20.28%, 15.54%, 8.98% and 11.03% respectively; Compared with CoRe, MFRM increases the values of precision, recall, F-score and nDCG by 13.63%, 6.08%, 11.57% and 5.45% respectively; Compared with DSPR, MFRM increases the values of precision, recall, F-score and nDCG by 12.38%, 4.70%, 10.28% and 7.27%, respectively. Similarly, when the number of recommendations is 5, 10 or 20, our method also has a better performance than other five recommendation methods.

By analyzing the results on Gowalla dataset (the number of recommendations is 15), shown in Fig. 3, we can find that the proposed method is superior to the other methods on all metrics. Compare MFRM with Pearson, MFRM increases the values of precision, recall, F-score and nDCG by 71.26%, 54.58%, 7.90% and 32.94% respectively; Compared with IRenMF, MFRM increases the values of precision, recall, F-score and nDCG by 30.51%, 30.77%, 30.58% and 16.96% respectively; Compared with UFC, MFRM increases the values of precision, recall, F-score and nDCG by 24.31%, 15.41%, 22.79% and 12.58% respectively; Compared with CoRe, MFRM increases the values of precision, recall, F-score and nDCG by 13.84%, 6.22%, 12.60% and 7.07% respectively; Compared with DSPR, MFRM increases the values of precision, recall, F-score and nDCG by 11.56%, 3.59%, 10.21% and 8.87% respectively. When the number of recommendations is 5,10 or 20, our method also generates the best results.

As illustrated in Fig. 4, on Meituan dataset, when the number of recommendations is 15, compared with Pearson, MFRM increases the values of precision, recall, F-score and nDCG by 43.63%, 33.30%, 40.65% and 36.98% respectively; Compared with IRenMF, MFRM increases the values of precision, recall, F-score and nDCG by 24.67%, 27.72%, 25.55% and 21.21% respectively; Compared with UFC, MFRM increases the values of precision, recall, F-score and nDCG by 14.92%, 15.43%, 15.07% and 10.15% respectively; Compared with CoRe, MFRM increases the values of precision, recall, F-score and nDCG by 16.91%, 23.38%, 18.64% and 7.72% respectively; Compared with DSPR, MFRM increases the values of precision, recall, F-score and nDCG by 11.44%, 15.56%, 12.63% and 5.29%, respectively. When the number of recommendations is 5,10 or 20, our method is still the best.

Second group of experiments

In the second group of experiments, we test the performance of the proposed privacy-preserving POI recommendation method (named LDP-MFRM). The values of parameters \(\alpha ,\theta ,\lambda\) are also set as 0.65, 0.20 and 0.15, respectively. Then we observe the effect of privacy budget \({\varepsilon }_{1}\) of check-in rating privacy-preserving and \({\varepsilon }_{2}\) of social relationship privacy-preserving on recommendation results. The recommendation results varying with privacy budgets \({\varepsilon }_{1}\) and \({\varepsilon }_{2}\) are shown in Tables 5, 6 and 7.

Table 5 Recommendation results vary with privacy budgets \({\varepsilon }_{1}\) and \({\varepsilon }_{2}\) on Yelp dataset
Table 6 Recommendation results vary with privacy budgets \({\varepsilon }_{1}\) and \({\varepsilon }_{2}\) on Gowalla dataset
Table 7 Recommendation results vary with privacy budgets \({\varepsilon }_{1}\) and \({\varepsilon }_{2}\) on Meituan dataset

Furthermore, we observe the privacy-preserving effects vary with privacy budgets \(\varepsilon_{1}\) and \(\varepsilon_{2}\). We also calculate the privacy gains on Yelp, Gowalla and Meituan datasets respectively, and the results are shown in Fig. 5.

Fig. 5
figure 5

Privacy gains vary with privacy budgets \(\varepsilon_{1}\) and \(\varepsilon_{2}\)

As shown in Tables 5, 6, 7 and Fig. 5, when \({\varepsilon }_{1}<0.5\) and \({\varepsilon }_{2}<0.5\), the local differential privacy seriously reduces the recommendation performance, that is, the availability of disturbed data is not high. So the values of this interval are not considered in this paper; when \({\varepsilon }_{1}>2.5\) and \({\varepsilon }_{2}>2.5\), the recommendation performance is the best, but the privacy gains are low, so it cannot achieve acceptable privacy-preserving; while when \({\varepsilon }_{1}\in (\mathrm{1.2,2.0})\) and \({\varepsilon }_{2}\in (\mathrm{1.0,1.8})\), we can achieve better recommendation performance and get higher privacy-preserving effect.

And then, we compare LDP-MFRM with other four privacy-preserving recommendation methods to observe recommendation performance. According to the setting of privacy budgets of compared methods, we set \(\varepsilon =0.5\) for PRGS method; set \(\varepsilon =0.7\) for PMLS method; set \(\varepsilon =0.6\) for PPNPR method; set \(\varepsilon =1.8\) for HawkesRec method. The comparison of the average privacy gain of each method on Yelp, Gowalla and Meituan datasets is shown in Table 8.

Table 8 Comparison of average privacy gain of each method on different datasets

As shown in Table 8, LDP-MFRM is higher than other methods in average privacy gain which indicates that LDP-MFRM has a better privacy-preserving effect.

After that, we observe recommendation performance with different lengths of recommendation list K (i.e. set K as 5, 10, 15 and 20, respectively) on three datasets. The comparison results are shown in Figs. 6, 7 and 8.

Fig. 6
figure 6

Comparison results on Yelp dataset

Fig. 7
figure 7

Comparison results on Gowalla dataset

Fig. 8
figure 8

Comparison results on Meituan dataset

As shown in Fig. 6, on Yelp dataset, when the number of recommendations is 15, compare LDP-MFRM with PRGS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 7.28%, 3.57%, 6.28% and 2.81%, respectively; compared with PMLS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 11.31%, 4.28%, 9.43% and 7.66%, respectively; compared with PPNPR, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 17.57%, 11.85%, 16.04% and 11.80%, respectively; compared with HawkesRec, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 3.17%, 8.44%, 4.58% and 5.94% respectively. In the same way, when the number of recommendations is 5, 10 or 20, our method is still the best.

As shown in Fig. 7, on Gowalla dataset, when the number of recommendations is 15, compared with PRGS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 11.14%, 3.10%, 8.89% and 2.62%, respectively; compared with PMLS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 17.68%, 3.79%, 13.79% and 7.12%, respectively; compared with PPNPR, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 21.99%, 11.11%, 18.95% and 11.07%, respectively; compared with HawkesRec, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 7.59%, 7.82%, 7.66% and 4.97%, respectively. In the same way, when the number of recommendations is 5, 10 or 20, the performance of our method on four metrics also surpassed other methods.

As shown in Fig. 8, on Meituan dataset, when the number of recommendations is 15, compared with PRGS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 4.19%, 7.02%, 4.96% and 3.96%, respectively; compared with PMLS, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 15.59%, 13.12%, 14.92% and 9.31%, respectively; compared with PPNPR, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 17.67%, 13.65%, 16.57% and 8.65%, respectively; compared with HawkesRec, LDP-MFRM increases the values of precision, recall, F-score and nDCG by 7.44%, 7.75%, 7.52% and 5.73%, respectively. In the same way, when the number of recommendations is 5, 10 or 20, our method is still the best.

Third group of experiments

In this experiment, we observe the effect of privacy-preserving on recommendation performance by comparing MFRM with LDP-MFRM. The metric values of LDP-MFRM are derived from the arithmetic mean with different privacy budgets. We also set the number of recommendations as 5, 10, 15, and 20. The comparison results on three datasets are shown in Figs. 9, 10, and 11.

Fig. 9
figure 9

Comparison results on Yelp dataset

Fig. 10
figure 10

Comparison results on Gowalla dataset

Fig. 11
figure 11

Comparison results on Meituan dataset

As shown in Fig. 9, on Yelp dataset, when the number of recommendations is 15, compared with MFRM, LDP-MFRM reduces the values of precision, recall, F-score and nDCG by 4.75%, 2.54%, 2.03% and 3.20%, respectively. In the same way, when the number of recommendations is 5, 10 or 20, LDP-MFRM is inferior to MFRM.

As shown in Fig. 10, on Gowalla dataset, when the number of recommendations is 15, compared with MFRM, LDP-MFRM reduces the values of precision, recall, F-score and nDCG by 5.80%, 2.97%, 4.98% and 6.57%, respectively. In the same way, when the number of recommendations is 5,10 or 20, the recommendation performance of LDP-MFRM is slightly lower than MFRM’s.

As shown in Fig. 11, on Meituan dataset, when the number of recommendations is 15, compared with MFRM, LDP-MFRM reduces the values of precision, recall, F-score and nDCG by 10.51%, 2.53%, 8.20% and 7.26% respectively. In the same way, when the number of recommendations is 5, 10 or 20, LDP-MFRM is inferior to MFRM.

Effect of data sparsity

The difference in data sparsity will pose a strong impact on POI recommendation performance. Compared with sparse data, the sufficient data can support more useful information which is beneficial to improve the recommendation performance. To investigate the effect of data sparsity on recommendation results, we adopt different proportion of training data in the range of [70%, 90%] scaled by 10%. And we select F-score metric to analyze the changes in recommendation results. The results are shown in Table 9, where we observe that the proportion of training data increases, the performance of all methods improve gradually, which prove that sufficient data can help improve the performance. In addition, we can also see that our proposed method still outperforms other methods in the face of different data sparsity.

Table 9 Comparison of the performance of all privacy-preserving POI recommendation methods in different proportion of training data
Discussion

The experimental results show that our proposed method is superior to some excellent methods and has better privacy-preserving and recommendation performance. In this section, we discuss the results and draw the implication of this study.

Analyzing the difference of POI recommendation methods in the first group of experiments

By exploring Pearson and IRenMF, we find that the former calculates user similarity just based on users’ check-in ratings, while the latter utilizes the characteristics of users and POIs to calculate similarity. Our method, by contrast, not only combines the comprehensive preference (e.g., user preference, social relationship, forgetting feature and check-in trajectory), but also considers geographical correlation of POIs and categories of POIs. UFC integrates user preference and check-in correlation to recommend POIs. As a contrast, MFRM considers more factors which brings a better performance. Compare CoRe with MFRM, we find that CoRe analyzes geographical location and social relationship, while MFRM further considers the impact of user comments and categories of POIs. Compare DSPR with MFRM, it can be seen that the former utilizes context information to recommend POIs, while the latter combines social relationship, geographical correlation of POIs and categories of POIs further. In addition, we analyze the recommendation performance with different lengths of POI recommendation list K. Comprehensive metric will increase with the growth of recommended number. Specially, this increase occurs when a certain number of recommendations are satisfied, that means excessive recommendations will lead to performance deterioration. Furthermore, we consider the impact of dataset differences on recommendation performance. Compared with Yelp and Gowalla datasets, Meituan dataset’s experimental results are better, while Yelp and Gowalla datasets are sparser. It can be inferred that the sparser the dataset is, the worse the recommendation performance will be.

Analyzing the difference of privacy-preserving POI recommendation methods in the second group of experiments

PMLS and PPNPR only protect the information of geographical location, and HawkesRec only protects the information of check-in ratings. PRGS protects the information of geographical location and friend relationship simultaneously. But our method utilizes LDP technique to protect check-in ratings and social relationship. Generally, the attack on user check-in rating is easy to infer user preference and obtain corresponding geographical location. Therefore, the disturbance of user check-in rating can prevent the disclosure of user’s real preference and location. In addition, attackers can also infer user preference from analyzing user social relationships. Therefore, it is necessary to protect user social relationships. Based on the above analysis, our method has a better privacy-preserving effect. Besides, in the recommendation process, PMLS, PPNPR, HawkesRec and PRGS fail to consider the impact of complex factors, while our method considers more factors and brings a better recommendation performance.

Impact of privacy-preserving on POI recommendation performance

By analyzing the experimental results, it is not difficult to find that privacy-preserving reduces the recommendation performance to a certain extent, but the sparser the dataset, the smaller degree of the reduction. Therefore, it can be judged that when facing massive data (in reality, the dataset is usually very sparse), the performance of our hybrid POI recommendation method based on local differential privacy is almost the same as the method without privacy-preserving.

Implication

The findings demonstrate that simultaneously combining multiple types of information is helpful in recommending POIs and achieving a better performance. By analyzing multiple types of information, we can accurately predict users’ preferences and characteristics. The more information we obtain, the more accurate user portrait we design. Although privacy-preserving reduces the recommendation performance to a certain extent, the effect is acceptable when dealing with massive data. Compared with other privacy-preserving strategies, ratings and social relationship disturbance can fuzz more relevant information and achieve a better privacy-preserving effect. In addition, we should set suitable length of recommendation list. In reality, the POI is less likely to arouse user interest when it keeps on a low position in the recommendation list.

Theoretically, our study contributes to the effective and safe usage of multidimensional data science and analytics for privacy-preserving POI recommender system design. Practically, our findings can be used to improve the quality of POI recommendation services.

Conclusions and future work

With the continuous application of POI recommender systems, the issues of privacy disclosure and unsatisfactory recommendation results are gradually exposed. To solve the shortcomings of existing POI recommendation methods in privacy-preserving and recommendation performance, this paper proposes a hybrid POI recommendation model based on local differential privacy.

The innovations of this paper can be described as follows.

  1. 1.

    We introduce random response techniques to disturb user check-in rating and social relationship respectively, and then design a virtual check-in time generation method to solve the problem of missing check-in time after disturbance. Furthermore, these privacy-preserving strategies can be applied to information protection of multiple-attribute decision-making problem [46, 47].

  2. 2.

    Three sub-models with their own characteristics are combined to generate recommendation results. Specifically, we design corresponding sub-models for processing different influential factors, and then achieve higher quality recommendation services by integrating the results generated by these sub-models. This hybrid POI recommendation model enriches the previous research.

The limitations of this study are as follows. First, the proposed method fails to consider the balance of recommendation accuracy, diversity, and novelty. The second limitation is that the relationship between users is considered simply while it is complex. In the future works, we will focus on the extension of our privacy-preserving POI recommendation. It can be studied from two aspects: we will further investigate local differential privacy methods to improve privacy and usability of disturbed data. In addition, we will subdivide users’ social relationships and design a self-adjusting model to meet users’ demand which will consider the balance of recommendation accuracy, diversity, and novelty.