Introduction

With the increasingly mature positioning technology and the increasingly powerful performance of smart mobile devices, LBSN have become more and more widely used in daily life [1]. As an information point in a geographic information system, a POI can be a landmark point such as a store, a scenic spot, or a site. In LBSN, users can sign-in on POI to indicate that they have reached this point [2]. Using the user's existing check-in records to recommend their favorite POI has become one of the research contents that has attracted much attention [3].

However, the POI recommendation system is more complicated than the classic recommendation system [4,5,6]. The first thing we face is the more severe data sparsity problem. Among the massive POIs provided by LBSN, the POIs that can be accessed by users are almost negligible, resulting in extremely sparse data sets. In addition, user sign-in data is a kind of implicit feedback. Whether the user has visited the check-in record of the place will cause the problem of only positive samples but lack of negative samples in the data set. Compared to watching movies or shopping online, people visiting a location in the real world will take more time and effort. For privacy protection, I tend not to leave personal visit records [7]. The second is the cold-start problem that is commonly encountered in recommended system tasks. There are mainly three types of POI recommendation tasks: locations that have never been visited are called cold-start POIs. Users who have never visited any location are called cold-start users. Users who move from one place to another unfamiliar place to live or travel will also encounter cold-start problems [8]. Finally, there is the issue of user dynamic preferences. That is, user preferences will change with the passage of time and changes in the environment [9]. Therefore, it is necessary to consider a variety of influencing factors to improve the recommendation performance of this task. At the same time, in the Internet information age where user privacy is increasingly valued, POI recommendations in LBSN that consider privacy protection are even more challenging [10].

In recent years, the rapid development of deep learning technology has brought new opportunities for the development of POI recommendation systems. Based on the above problems and difficulties, a POI recommendation method using deep learning is proposed for the POI recommendation algorithm in LBSN. The innovative points of the proposed method are:

  1. 1)

    Aiming at the problem of data sparsity, the Embedding model is used to quantify the initial information. And for low-dimensional feature models of different information sources, different neural network layers are used to extract high-dimensional features. This further guarantees the accuracy of the next POI recommendation.

  2. 2)

    To solve the problem of information compression loss caused by too long input information, a kind of LSA is proposed. It can dig out the user's long-term preferences from all the historical POI sign-in sequences of the user. At the same time, it focuses on short-term preferences in the current POI sign-in sequence. Comprehensively grasp the dynamic changes of user preferences and improve the efficiency and accuracy of recommendations.

The rest of this paper is arranged as follows. The second section introduces the related research progress in this field; the third section introduces the related technologies used by others, including embedding model and attention layer model; the fourth section introduces the proposed recommendation method of POI based on LSTM-attention; in the fifth section, the feasibility and optimality of the proposed method are simulated by comparing with the current POI recommendation method; the sixth section is the conclusion of this paper.

Related research

Since the POI recommendation task is restricted by physical distance in real life, current research usually focuses on analyzing the influence of location factors caused by spatial information [11,12,13,14]. At present, many algorithms at home and abroad have studied the POI recommendation task. Reference [15] proposed a POI recommendation method in LBSN that includes a special three-layer network structure. A good recommendation performance is achieved, but the recommendation ranking of the POI recommendation task is ignored. Reference [16] proposed a context and preference awareness model for the poor context-awareness in the POI recommendation process. Incorporate contextual influence and user preferences into POI recommendations. The above method predicts by the absolute score of the approximate location, which can effectively extract the global characteristics of the relationship between the user and the POI. However, the ability to express local features will be slightly worse [17]. Therefore, the reference [18] proposed a unified framework called VCG to enhance POI recommendation, incorporating visual content and geographic influence into LBSN. The test results show that the proposed method has strong effectiveness. However, less consideration is given to the influencing factors of users' geographic preferences, and user privacy issues are not considered. Reference [19] proposed a POI recommendation scheme based on group preferences. The solution combines matrix factorization and clustering technology, and can provide quality recommendations without sacrificing user privacy. Although it solves the difficulty of data sparseness caused by privacy issues, this method still has the problem of cold start and poor interpretability.

With the introduction of deep learning algorithms, its advantages have gradually been reflected. First of all, deep learning technology uses nonlinear activation functions for nonlinear modeling of data. To make up for only simple linear assumptions in the traditional recommendation methods, which results in insufficient model capacity limitations [20]. Second, the deep neural network (DNN) helps to express the characteristics of the input data, and can obtain a large amount of relevant descriptive information about users and items. In practical applications, DNN is used for characterization learning work, which can not only reduce the workload of manual feature extraction, but also integrate multi-source and multi-modal data into the recommendation system [21]. In addition, the user's access behavior in real life is highly correlated and even causal. Therefore, many researchers try to apply the recommendation task of deep learning technology to overcome the problems in the traditional recommendation model. Reference [22] proposed a real-time POI embedding model for the POI recommendation task in LBSN. Convolutional neural network (CNN) is used to mine the textual information of POI and learn its internal representation. Combining real-time POI embedding and matrix decomposition methods makes the proposed POI recommendation algorithm more comprehensive. Reference [23] proposed a POI recommendation method based on Recurrent Neural Network (RNN). Taking into account the location interest and context information of similar users, a comprehensive feature representation of user interest and context information is formed, which is more conducive to the accurate recommendation of POI. However, the proposed method takes less consideration of influencing factors such as privacy protection, and the accuracy may be reduced in actual applications. Reference [24] proposed a Deep Representation Learning-Based Model (DRLM). Four different original features are generated for each POI by constructing four co-occurrence matrices. And use the principal component analysis algorithm to generate the semantic feature of each POI from its four original features. Improve the model performance, although it has a good recommendation effect, but cannot solve the cold-start problem well. In reference [25], a POI recommendation model based on user context behavior semantics is proposed. The meta-path of heterogeneous information network is used to represent the complex semantic relationship between users and poi, and the fusion method of learning ranking is used to sort the recommendation results. The results show that the recommendation performance is ideal in simple scenarios. But how to integrate the context information into the model has become a problem to be solved. At present, only a single factor is considered when using deep learning technology for item recommendation tasks. However, more factors need to be included in the personalized POI recommendation task to improve the recommendation effect [26]. Existing research cannot solve the problem that users' POI preferences are dynamically updated with time and geographical changes under the premise of ensuring user privacy. Therefore, the method in this paper is proposed.

Related technology

Embedding model

The Word Embedding algorithm was proposed by Bengio et al. in 2003. It is a natural language model of a three-layer neural network. The main process is to transform the sparse matrix into a dense matrix through a series of linear transformations [27]. The purpose is to associate independent location vectors, so as to automatically learn and obtain the internal connections between locations. The network structure of the model is shown in Fig. 1.

Fig. 1
figure 1

Embedding model structure

  1. (1)

    Input layer The representation of the input layer is the one-hot encoding vector of the location. For example, assuming that there are Q locations in total, the one-hot encoding vector size of each location is 1 × Q. The one-hot encoding vector of the first location is [1,0,0,…0], that is, the corresponding location is 1, the rest are 0, and the rest can be deduced by analogy. The C in the figure above represents the number of context locations.

  2. (2)

    Hidden layer The one-hot encoding vectors of C context locations are, respectively, multiplied by the shared initial input weight matrix \(\mathbf{w}_{{Q \times K}}\) (where K is artificially set), and the corresponding positions of the vectors obtained are superimposed, and averaged to obtain the hidden layer output vector \(\mathbf{h}_{i}\), where the dimension of \(\mathbf{h}_{i}\) is 1 × K.

  3. (3)

    Output layer The output vector \(\mathbf{h}_{i}\) of the hidden layer is multiplied by the initial output weight matrix \(\mathbf{w^{\prime}}_{{K \times Q}}\) to obtain the output vector \(\mathbf{y}_{j}\) of the output layer. The dimension of \(\mathbf{y}_{j}\) is 1 × Q. Finally, the Q-dim probability distribution is obtained through activation function processing.

  4. (4)

    For the Q-dim probability distribution, the location represented by the subscript index with the highest probability is used as the predicted target location. Compared with the one-hot of the real location, the initial weight matrix \(\mathbf{w}_{{Q \times K}}\) and \(\mathbf{w^{\prime}}_{{K \times Q}}\) are updated and iterated through the loss function and the stochastic gradient descent algorithm. The one-hot encoding vector of each location is multiplied by the updated \(\mathbf{w}_{{Q \times K}}\) matrix to get its own Embedding vector.

Attention mechanism model

Attention mechanism is a brain signal processing mechanism similar to human vision. By calculating the weights of the feature vectors output from the bidirectional long short-term memory (Bi-LSTM) network at different times, highlighting some important features, the entire network model can show better performance [28]. In behavior recognition, the neural network focuses on some key actions and objects by adding an Attention mechanism when training the model. For example, playing basketball, where jumping, raising hands, basketball, and hoop objects will be assigned more weight by the Attention mechanism to deepen the model's memory. When the model encounters these types of actions or objects next time, it will give priority to predicting these behaviors and narrow the scope of recognition. Then adjust the weight distribution according to the relationship between the actions to achieve more accurate recognition [29, 30]. The attention model is shown in Fig. 2.

Fig. 2
figure 2

Attention mechanism model

Among them, \(\mathbf{f}_{o}^{t}\) represents the \(t\)th feature vector output from the Bi-LSTM network. The feature vector is transferred to the attention mechanism model, and the initial state vector \(\mathbf{s}_{t}\) (1024 × 1) is obtained through the hidden layer in the attention model. The weight coefficient \(\gamma _{t}\) represents the proportion of the initial input state vector in the final output state vector \(\mathbf{Y}\). The product of each initial input state vector \(\mathbf{s}_{t}\) and the weight coefficient \(\gamma _{t}\) is accumulated and summed to obtain the final output state vector \(\mathbf{Y}\). The calculation is as follows:

$$ \begin{gathered} e_{t} = \tanh \left( {\mathbf{w}_{t} \mathbf{s}_{t} + b_{t} } \right) \hfill \\ \gamma _{t} = {\raise0.7ex\hbox{${\exp \left( {e_{t} } \right)}$} \!\mathord{\left/ {\vphantom {{\exp \left( {e_{t} } \right)} {\sum\limits_{{j = 0}}^{t} {e_{j} } }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\sum\limits_{{j = 0}}^{t} {e_{j} } }$}} \hfill \\ \mathbf{Y} = \sum\limits_{{t = 0}}^{{19}} {\gamma _{t} \mathbf{s}_{t} } . \hfill \\ \end{gathered} $$
(1)

In the formula, \(b_{t}\) is an energy bias, \(e_{t}\) is the energy value determined by the state vector \(\mathbf{s}_{t}\) of the \(t\)th eigenvector, and \(\mathbf{w}_{t}\) is the weight matrix from one unit layer to another. Using the ratio of the power of the energy value of each part with \(e\) as the base to the cumulative sum of the energy value of the previous part, the weight coefficient that has an impact on the recognition result can be obtained. This realizes the transition from the initial state to the attention state, and then obtains the final output state vector \(\mathbf{Y}\). Finally, \(\mathbf{Y}\) is integrated through a fully connected layer as an output value to reduce the impact of feature positions on classification. The Softmax classifier is used to map the output of multiple neurons to the (0,1) interval to achieve multi-classification.

Proposed POI recommendation method based on LSTM-attention

The main problem of the existing LSTM model and Embedding model is that they can only simulate single-source information. In the POI recommendation scenario, it contains a lot of social information and semantic information. Users do not simply move from one location to another. In addition to following their own preferences, users may also be influenced by friends. Moreover, not all users have a favorite attitude towards POIs they have visited before. At this time, the user's past review information is very important. The pure LSTM model can only model the user's behavior sequence. Inputting the location information that the user has visited into the network can only capture the interest information of the user's visited location. Without adding the user’s historical comment information, the user’s interest changes cannot be described in detail. At the same time, the influence of the user's friend relationship on the user's choice cannot be accepted. And simply using Embedding, the existing recommendation model can independently vectorize users and friends, POIs, and comments, and apply them to the POI recommendation problem. Using one of this information alone cannot make fine-grained recommendations. However, if you train three kinds of vectors, it is difficult to grasp the weights and proportions, which will often cause model training to collapse. Even if it can be successfully trained, it is difficult to successfully apply it on the test set. This model adopts an end-to-end training method, and considers privacy protection, and regards the relationship between users and their friends as inherent attributes. The historical comment information is vectorized and then combined with the POI information visited at the time and input into the LSTM network, and user interest information is obtained as a dynamic attribute. Then, the three are combined to recommend the next location for the user.

Problem definition

For the POI recommendation problem, define \(U\) as a user set, \(P\) as a POI set, and \(T\) as a label set. Among them, the user \(u\), the point of interest \(v\) , and the label \(g\) are \(d\)-dimensional quantities. Each point of interest \(v\) has its latitude and longitude information \(\left\{ {x_{v} ,y_{v} } \right\}\) and a label set \(\left\{ {g_{1}^{v} ,g_{2}^{v} , \ldots } \right\}\). Each user \(u\) is bound with its tag set \(\left\{ {g_{1}^{u} ,g_{2}^{u} , \ldots } \right\}\), friend list \(\left\{ {f_{1}^{u} ,f_{2}^{u} , \ldots } \right\}\), and historical visit record \(\left\{ {{\text{p}}_{{t_{1} }}^{u} ,p_{{t_{2} }}^{u} , \ldots } \right\}\). Where \(p_{{t_{i} }}^{u}\) indicates that the user \(u\) visited the point of interest \(p\) at time t.

In the POI recommendation problem, \(U = \left\{ {U_{1} ,U_{2} , \ldots ,U_{N} } \right\}\) represents the set of user locations, \(U_{n}\) is the users included in the data set, and \(N\) is the number of users.\(P = \left\{ {P_{1} ,P_{2} , \ldots ,P_{M} } \right\}\) represents the collection of locations, \(P_{m}\) is the location of the data collection, and M is the number of POIs of the location.

Definition 1

The user's POI location is composed of multiple locations. Since the research is about Next POI location recommendations, the user POI locations are arranged in chronological order. The POI location of user \(U_{n}\) is represented as \(U_{n} = \left\{ {P_{{s1}} ,P_{{s2}} , \ldots ,P_{{sh}} } \right\}\), \(U_{n} \in U\). Where \(P_{{s1}}\) is the first place the user has visited in chronological order, and the last place is denoted as \(P_{{sh}}\).

Definition 2

Based on the implicit feedback model, the places that the user has visited are marked as 1, and the places that have not been visited are marked as 0. The difference from the explicit feedback model is that there is no preference for location and no score. Implicit feedback involves the positive sample and negative sample of the user, \(O\) represents the positive sample of the user (the place the user has been to), \(O \in U_{n}\). X represents the negative sample of the user (a place that the user has not been to), \(X \in U_{n} /P\), where the symbol / represents the difference between the two sets.

Definition 3

For next POI recommendations for timing issues, use \(\psi\) to represent the user's target location. If the location that the user has been to continuously is \(P_{{s1}} ,\;P_{{s2}} ,\;P_{{s3}}\), then the next location that the user has visited is the location that needs to be predicted. Therefore, \(P_{{s4}}\) is the target location, denoted by \(\psi\).

Long- and short-term attention mechanism

The attention mechanism can filter out the key information that is more important to the current task goal from a large number of information. In the process of sequence-aware recommendation, the attention mechanism can solve the problem of information compression loss caused by too long input information to a certain extent [31]. To better capture the long-term and short-term preferences of the user in the current POI check-in sequence, based on the user's historical POI check-in sequence set and the current POI check-in sequence, an LSA using all the user's check-in sequences is proposed. The long-term preference of the user can be mined from all the historical POI sign-in sequences of the user, while focusing on the short-term preference in the current POI sign-in sequence. The structure is shown in Fig. 3.

Fig. 3
figure 3

LSA structure

Assume that the user has a total of \(m\) POI check-in sequences, and the \(m\)th is the current POI check-in sequence. The overall hidden state of the user's first \(m - 1\) historical POI sign-in sequences is denoted as \(\left\{ {\mathbf{h}_{t}^{1} ,\mathbf{h}_{t}^{2} , \ldots ,\mathbf{h}_{t}^{{m - 1}} } \right\}\). The hidden state obtained at each moment of the user's current sequence is denoted as \(\left\{ {\mathbf{h}_{1}^{m} ,\mathbf{h}_{2}^{m} , \ldots ,\mathbf{h}_{t}^{m} } \right\}\).\(\left\{ {\lambda _{1} ,\lambda _{2} , \ldots ,\lambda _{t} } \right\}\) represents the weight of each moment of the current sequence calculated by the network. \(\mathbf{h}_{{{\text{LSA}}}}^{m}\) is the final attention representation of the current sequence.\(\tau\) generally refers to the end time of each POI check-in sequence.

LSA mainly includes two parts, a forward network used to calculate the attention score of each moment, and an attention synthesis function used to weigh the attention representation of the sequence. The specific forward propagation process is expressed as

$$ \begin{aligned} \mathbf{h}_{{{\text{LSA}}}}^{m} & = \sum\limits_{{i = 1}}^{\tau } {\lambda _{i} \mathbf{h}_{i}^{m} } \\ \lambda _{i} & & = \mathbf{w}_{0} \sigma \left( {\mathbf{w}_{1} \mathbf{h}_{i}^{m} + \mathbf{w}_{2} \mathbf{h}_{\tau }^{m} + \mathbf{w}_{3} \mathbf{\bar{h}}_{\tau }^{{}} + \mathbf{b}_{a} } \right). \\ \end{aligned} $$
(2)

In the formula, \(\lambda _{i}\) is the weight of the \(i\)th moment in the current POI sign-in sequence. \(\mathbf{h}_{{{\text{LSA}}}}^{m}\) is the attention representation of the current POI sign-in sequence obtained by the attention mechanism.\(\mathbf{h}_{i}^{m}\) represents the hidden state at the \(i\) th moment in the current POI sign-in sequence.\(\mathbf{h}_{\tau }^{m}\) represents the overall representation of the current POI check-in sequence.\(\mathbf{\bar{h}}_{\tau }\) is the long-term preference of the user.\(\mathbf{w}_{*}\) and \(\mathbf{b}_{a}\)are weight matrix and deviation vector, respectively. \(\sigma\) is the activation function.

The introduction of \(\mathbf{\bar{h}}_{\tau }\) means that the calculation of the attention score will consider the user's long-term preferences. The user may have multiple historical POI check-in sequences. The overall representation of the \(j\)th sequence is denoted as \(\mathbf{h}_{\tau }^{j}\). To capture the user's long-term preference reflected in the historical sequence, the overall representation of all historical POI check-in sequences of the user is averaged to obtain the user's long-term preference representation. The specific calculation is as follows:

$$ \mathbf{\bar{h}}_{\tau } = \frac{1}{{m - 1}}\sum\limits_{{j = 1}}^{{m - 1}} {\mathbf{h}_{\tau }^{j} } . $$
(3)

In practical applications, to save the amount of calculation and speed up the generation of the recommendation list, the long-term preference representation vector \(\mathbf{\bar{h}}_{\tau }\) obtained from the historical POI check-in sequence can be stored for each user in the system.

If the current sequence is the user's first POI sign-in sequence, the value of \(\mathbf{\bar{h}}_{\tau }\) is 0. At this time, the forward network calculation of LSA becomes

$$ \lambda _{i} = \mathbf{w}_{0} \sigma \left( {\mathbf{w}_{1} \mathbf{h}_{i}^{{}} + \mathbf{w}_{2} \mathbf{h}_{\tau }^{{}} + \mathbf{b}_{a} } \right) $$
(4)

For ease of presentation, the calculation process of the above-mentioned LSA is abbreviated as

$$ h_{{\rm LSA}}^{m} = {\rm LSA}\left( {\left\{ {\mathbf{h}_{\tau }^{1} ,\mathbf{h}_{\tau }^{2} , \ldots ,\mathbf{h}_{\tau }^{{m - 1}} } \right\},\left\{ {\mathbf{h}_{1}^{m} ,\mathbf{h}_{2}^{m} , \ldots ,\mathbf{h}_{\tau }^{m} } \right\}} \right). $$
(5)

Model framework

The user's vector input is similar to Word2Vec's input, input one-hot vector as input, and then get the user's vector through the Embedding layer. Similarly, the vector of friends is k one-hot vectors as input, and then, the vector of k friends is obtained as input. In the implementation, directly input the k-hot vector, and get the vector representing the relationship network as the input of this layer. For the user's evaluation information for each point of interest, the pre-trained vector corresponding to the word is used as input. Then, through the convolutional layer and the region of interest (ROI) pooling layer, the evaluation information is compressed into a one-dimensional vector as part of the input of the LSA. This vector, the geographic location vector of the point of interest, and the vector of the point of interest are joined together as the input of the LSA layer. The overall network architecture is shown in Fig. 4.

Fig. 4
figure 4

The overall architecture of POI-LSA network

Inputs from different sources have been extracted into high-dimensional features through the above-mentioned process. They are the user's own characteristics, social characteristics, evaluation, and behavior sequence characteristics [32, 33]. After these features are spliced together, the vector representation of the points of interest to be visited next time can be obtained through the activation layer. At the same time, to ensure the accuracy of the model, the high-dimensional features are passed through the fully connected layer, and a geographic location information and access ranking information are output. To ensure that the location of the visit and the location of the next visit has a small deviation in geographic location.

Especially for low-dimensional features of different information sources, the model uses different neural network layers to extract high-dimensional features. User features are directly connected to the last fully connected and activated layer. The social features are the information of the user's friends, and the high-dimensional features are extracted by CNN and then converted into fixed-length high-dimensional features through the ROI pooling layer. User comments on points of interest, like social information, need to go through the CNN layer to extract high-dimensional features, and then go through the ROI pooling layer to obtain fixed-length high-dimensional features. Then < comment feature, geographic location, point of interest feature > will be entered as a whole into the LSA network as the user's historical behavior feature. Because LSA can well express the user's long-term interest, and can also accurately capture the user's interest characteristics in the recent period. After the high-dimensional features pass through the tanh activation layer, the user's existing interest features can be obtained. This feature expresses the user's next point of interest. At the same time, this high-dimensional feature should also be able to express the user's range of activities. Using the fully connected layer, the corresponding geographic location information in high-dimensional features can be extracted.

Loss function and optimization algorithm

The loss function of the model is divided into two parts: the distance between the predicted interest point vector and the real vector and the distance between the predicted geographic location and the real geographic location. The gap between the predicted vector and the real vector can describe the accuracy of the predicted vector and is the most important part of the loss function. The difference between the predicted geographic location and the actual location is also added to the loss function as part of the adjustment. This is because the user’s own range of activities is limited, and the predicted POI should not deviate too much from the user’s regular range of activities. Otherwise, the recommendation will become meaningless [34]. The specific loss function is as follows:

(1) The difference between the predicted vector and the real vector is measured by cosine similarity. The predicted interest point vector is \(\tilde{p}_{t}\). We use m points of interest visited by the user after the current moment l to determine the prediction accuracy

$$ {\rm Loss}_{{\rm sim}} = \sum\limits_{{i = 1}}^{m} {\frac{{\tilde{p}_{l} \cdot p_{{v_{l} + i}} }}{{\left| {\tilde{p}_{l} } \right| \times \left| {p_{{v_{l} + i}} } \right|}}} . $$
(6)

In the formula, \(p_{v}\) and \(p_{l}\) represent the historical visit records of \(v\) and \(l\).

(2) Euclidean distance is used for the similarity between geographic locations, and the predicted geographic location is set to \(\left\{ {\tilde{x}_{l} ,\tilde{y}_{l} } \right\}\). Also measure the distance between it and m points of interest in the future. The loss function is as follows:

$$ {\rm Loss}_{{\rm loc}} = \sum\limits_{{i = 1}}^{m} {\sqrt {\left( {\tilde{x}_{l} - x_{{v_{l} + i}} } \right)^{2} + \left( {\tilde{y}_{l} - y_{{v_{l} + i}} } \right)^{2} } } . $$
(7)

Finally, the loss function of the model is as follows:

$$ {\text{Loss}}_{{{\text{total}}}} = {\text{Loss}}_{{{\text{sim}}}} + {\text{Loss}}_{{{\text{loc}}}} $$
(8)

The proposed method uses adaptive moment estimation Adam as the optimization algorithm. The algorithm uses the gradient's first-order and second-order matrix estimation to dynamically adjust the learning rate of each parameter. The advantage is that after adjustment, each iterative learning rate has a certain range, and the parameters are relatively stable. The mathematical theory is expressed as follows:

$$ \begin{gathered} \alpha _{t} = \mu \times \alpha _{{t - 1}} + \left( {1 - \mu } \right) \times g_{t} \hfill \\ \beta _{t} = v \times \beta _{{t - 1}} + \left( {1 - v} \right) \times g^{2} _{t} \hfill \\ \end{gathered} $$
(9)
$$ \hat{\alpha }_{t} = \frac{{\alpha _{t} }}{{1 - \mu ^{t} }},\;\hat{\beta }_{t} = \frac{{\beta _{t} }}{{1 - v^{t} }} $$
(10)
$$ \Delta \theta _{t} = - \frac{{\hat{\alpha }_{t} }}{{\sqrt {\hat{\beta }_{t} + \varsigma } }} \times \eta . $$
(11)

Equation (9) is the first-order moment estimation and the second-order moment estimation of the gradient. Equation (10) is a correction to the first and second moment estimation, which can be approximated as an unbiased estimation of the expectation. Equation (12) is a dynamic constraint on the learning rate \(\eta\). Among them, \(\varsigma\) is a very small number to avoid the denominator being 0.

Experimental results and analysis

In the experiment, the Gowalla and Brightkite datasets are used, both of which are location-based social networking sites. It can provide users with services such as location sharing, activity sharing, line sharing, and so on by way of sign-in. From the Gowalla data set, 3,500 users and 3230 points of interest were randomly selected, 61,742 sign-in records, and the density of the extracted data set was 6.89 × 10–3. From the Brightkite data set, 6000 users and 3,525 points of interest were randomly selected, 51,815 sign-in records, and the density was 3.17 × 10–3. The description of the two datasets is shown in Table 1.

Table 1 Datasets’ description

Evaluation index

To accurately evaluate the performance of each method on the Next POI recommendation task, Precision is used to measure its performance. The calculation is as follows:

$$ \begin{gathered} \Pr ecision = \frac{{\sum\nolimits_{{n = 1}}^{N} {\phi _{n} } }}{N} \hfill \\ \phi _{i} = \left\{ \begin{gathered} 1,O \in {\rm sort}_{n}^{\kappa } \hfill \\ 0,O \notin {\rm sort}_{n}^{\kappa } \hfill \\ \end{gathered} \right.. \hfill \\ \end{gathered} $$
(12)

Each user has one positive sample data and multiple negative sample data in the test set. Where \(O\) represents the positive sample data of user \(n\) in the test set. Both positive and negative sample data have a score, and the model predicts the score of the user's next destination. \({\text{sort}}_{n}^{\kappa }\) \({\text{sort}}_{n}^{\kappa }\) represents the set of \(\kappa\) POI locations with the highest predicted scores of the positive and negative samples of user \(n\) in the test set. If the location of the positive sample data of user \(n\) is in the \({\text{sort}}_{n}^{\kappa }\) set, then \(\phi _{i}\) is recorded as 1, which means that the model predicts the user's POI location correctly; otherwise, it is recorded as 0. N represents the total number of users. Generally, the larger the \(\kappa\) value, the higher the accuracy of the model's prediction.

Parameter discussion

In the proposed method, the learning rate \(\varepsilon\) controls the rate of decline of the objective function, and \(\chi\) determines the size of the regular term. Based on the Gowalla and Brightkite datasets, the influence of \(\varepsilon\) and \(\chi\) on the recommendation results is shown in Fig. 5. In addition to the precision, the used evaluation indicators also use two indicators: Recall and Mean Average Precision (MAP).

Fig. 5
figure 5

The influence of \(\varepsilon\) and \(\chi\) on recommendation effect

As can be seen in Fig. 5a–c, on the Gowalla and Brightkite data sets, there is a negative correlation between accuracy, recall, and MAP and \(\varepsilon\). When \(\varepsilon = 10^{{ - 5}}\), the three all reach the maximum. Take the Gowalla dataset as an example, the order is 0.047, 0.25, and 0.135. Similarly, it can be seen from Fig. 5d–f that when \(\chi\) does not exceed 10–4 on the two datasets, the accuracy, recall, and MAP are increasing. When \(\chi\) exceeds 10–4, both MAP and recall rate decrease significantly. Therefore, \(\chi\) is set to 10–4 and \(\varepsilon\) is set to 10–5.

Performance comparison with comparison method

To verify the accuracy of the proposed method, compare it with reference [15, 24, 25]. Since the accuracy of the recommendation result is related to the sparseness of the data set, the accuracy of the prediction result is different if the data set with different sparseness is selected. For the two data sets under different sparsity, ten tests were performed, respectively. Take the average precision as shown in Fig. 6. The algorithm randomly filters out some training sets through the random function to control the sparsity of the training set.

Fig. 6
figure 6

Comparison results of precision under different sparsity

It can be seen from Fig. 6 that the accuracy of the proposed method is significantly better than other comparison methods under different sparsity. And when the data sparsity is 20%, its recommendation accuracy is nearly 50% higher than the traditional method in the Reference [15]. Higher than the deep learning method Reference [24] and [25] is about 14% to 21%. As the amount of data increases, the difference between the recommended accuracy of the proposed method and the comparison method has increased significantly. When the sparsity is 100%, the ultra-deep learning method is about 28% to 32%. It can be argued that the proposed method is very effective for mining user points of interest to realize the next POI point of interest recommendation task.

When the data set is preprocessed, different sequence lengths have different effects on the recommended accuracy. Therefore, the effectiveness of the proposed method is verified by setting up sequences of different lengths and comparing other methods. In the experiment, the length of the time series location ranges from 4 to 14. The precision comparison results of sequence lengths at different locations under the same data set are shown in Fig. 7.

Fig. 7
figure 7

Comparison results of precision under different location sequence length

It can be seen from Fig. 7 that the method in reference [15] does not perform well on timing issues. Because it integrates social and geographic influences into the matrix decomposition framework, it cannot directly analyze the changes in the sequence of locations, so the accuracy rate is affected. Similarly, the accuracy of the proposed method under different location sequence lengths is better than the other two deep learning methods. When the location sequence length is 6, the performance of the proposed method is the best, and the recommended accuracy is the highest 0.24. When the sequence of locations exceeds 6, the recommended performance decreases, but it is still higher than other methods. It can be seen that the choice of the timing length has to be considered for the impact of the recommended method.

Similarly, the number of iterations also determines the performance of the recommended method. Some methods converge faster, and some methods converge slowly. The accuracy comparison results of each method under the same data set with different iteration times are shown in Fig. 8.

Fig. 8
figure 8

Comparison results of precision under different iterations

It can be seen from Fig. 8 that when the number of iterations is 500, the recommended accuracy of each method reaches the maximum. And the recommended accuracy of the proposed method is 0.27, which is better than other comparison methods. Because it uses the LSA model, it can accurately capture the user's recent interest characteristics. Reference [24] uses DRLM analysis to generate semantic features of each POI, and then realizes location recommendation. Reference [25] uses meta-path of heterogeneous information network to represent the complex semantic relationship between users and poi, and combined with the fusion method of learning ranking to sort the recommendation results, and the accuracy of the two is not much different. When the number of iterations is 500, the accuracy rates are 0.16 and 0.18, respectively. Since the two contrasting deep learning methods do not consider issues such as user dynamic preferences, the recommendation performance still needs to be improved.

In addition, the number of different dimensions of the POI vector will also affect the performance of the recommendation method. Every time the dimension is set in the experiment, it is necessary to test the effect of the recommended method. Under the same sparsity, the precision comparison results when the Embedding dimensions are 8, 16, 32, and 64 are shown in Fig. 9.

Fig. 9
figure 9

Comparison of precision under different Embedding

It can be clearly seen from Fig. 9 that when the Embedding dimension is 32, the recommendation performance of each method is the best. The recommended accuracy of the proposed method is 0.34, which is 20.59% higher than the reference [25]. When the Embedding dimension is 16, the recommendation accuracy is the lowest. Because the Embedding dimension is lower, the ability to process sparse data is weaker, and the result of recommendation is not ideal.

The time consumption of the recommended method is an important indicator. The comparison results of the time consumption of each recommended method under different sparsity are shown in Fig. 10.

Fig. 10
figure 10

Comparison of time loss under different sparsity

It can be seen from Fig. 10 that the recommended time-consuming of the proposed method is shorter than that of References [24, 25], and the maximum time is no more than 130 ms. Because it uses the LSA model, it can quickly obtain the key information that assists the POI recommendation, which is less time-consuming than other deep learning methods. However, compared with Reference [15], the recommendation takes a little longer. Because the recommended method in the Reference [15] is simple, the calculation amount is not large, so it takes a short time. When the sparsity is 40%, it takes only 36 ms. It can be argued that the proposed method has a certain guarantee in the real-time performance of POI recommendations.

Conclusion

Traditional POI recommendation focuses on the user's check-in frequency and check-in time, etc., and rarely pays attention to the user's front and back behavior associations reflected in the POI check-in sequence. And the capture of the dynamic variability and periodic characteristics of user access preferences is not satisfactory. For this reason, a POI recommendation method using deep learning in LBSN considering privacy protection is proposed. Use Embedding ideas to quantify user information, friend relationships, POI information, etc. The information is jointly inputted into the LSA model for analysis and processing to capture the user's interest characteristics and interest changes. And use the time and geographic location information of the user's historical behavior to recommend the next point of interest for the user. The Gowalla and Brightkite datasets are used to experimentally demonstrate the proposed method.

The context information used in the proposed method only includes geographic distance context, time context, and POI classification information. In the future research, we will try to incorporate more contextual information related to the POI sign-in sequence into the recommendation model, or propose a general model structure that can effectively integrate all relevant contextual information.