Spatio-Temporal Representation Learning with Social Tie for Personalized POI Recommendation

Recommending a limited number of Point-of-Interests (POIs) a user will visit next has become increasingly important to both users and POI holders for Location-Based Social Networks (LBSNs). However, POI recommendation is a challenging task since complex sequential patterns and rich contexts are contained in extremely sparse user check-in data. Recent studies show that embedding techniques effectively incorporate POI contextual information to alleviate the data sparsity issue, and Recurrent Neural Network (RNN) has been successfully employed for sequential prediction. Nevertheless, existing POI recommendation approaches are still limited in capturing user personalized preference due to separate embedding learning or network modeling. To this end, we propose a novel unified spatio-temporal neural network framework, named PPR, which leverages users’ check-in records and social ties to recommend personalized POIs for querying users by joint embedding and sequential modeling. Specifically, PPR first learns user and POI representations by joint modeling User-POI relation, sequential patterns, geographical influence, and social ties in a heterogeneous graph and then models user personalized sequential patterns using the designed spatio-temporal neural network based on LSTM model for the personalized POI recommendation. Furthermore, we extend PPR to an end-to-end recommendation model by jointly learning node representations and modeling user personalized sequential preference. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms state-of-the-art baselines for successive POI recommendation in terms of Accuracy, Precision, Recall and NDCG. The source code is available at: https://www.anonymous.4open.science/r/DSE-1BEC.


Introduction
Newly emerging LBSNs has become an important mean for people to share their experience, write comments, or even interact with friends. With the prosperity of LBSNs, many users check in at various POIs via mobile devices in real time. Therefore, a large amount of check-in data is being generated, which is crucial to understand the users' preferences and behaviors. POI recommendation not only helps users explore attractive and interesting places, but also gives guidance to locationbased service providers, where to launch advertisements to target customers for marketing. Due to the great significance to both of users and businesses, how to use spatio-temporal information effectively, and recommend a limited number of POIs users more likely visit next have been attracting increasing attention in both industry and academia.
In particular, several studies [2,10,14,20,24] have been conducted to recommend successive POIs for users based on users' spatio-temporal check-in sequence in LBSNs. Based on Markov chain model, LORE [24] and NLPMM [2] explore users' successive check-in patterns by considering temporal and spatial information. ST-RNN [10] employs RNN to capture the users' sequential check-in behaviors. In a follow-up work, STGN [28] carefully designs the time gates and distance gates in LSTM to model users' sequential visiting behaviors by enhancing long short term memory. Additionally, some models [1,4] based on Word2Vec [15] framework to capture the preference and mobility pattern of users and the relationship among POIs also achieved decent performance. GE [20] uses graph embedding to combine the sequential effect, geographical influence, temporal effect and semantic effect in a unified way for location-based recommendation. Recently, SAE-NAD [14] utilizes a self-attentive encoder to differentiate the user preference and a neighbor-aware decoder to incorporate the geographical context information for POI recommendation.
However, location-based POI recommendation still faces three major challenges. First, data sparsity, unlike the general e-commerce, music and movie recommendation, which can be collected and verified just online, location-based POI recommendation systems usually associate with the POI-entities. Only when a user visits a POI-entity, a check-in record is generated. Therefore, the checkin records in the POI recommendation task is much sparser. This issue has plagued many POI recommendation models based on the collaborative filtering. Furthermore, data sparsity problem in check-in data makes it difficult to capture user's sequential pattern, because the check-in sequence is very short or is not continuous in time. Second, contextual factors, POI recommendation may be affected by various contextual factors, including social tie influence, geographical influence, temporal context, and so on. In fact, social ties are often available in LBSNs, and recently studies show that social networks associated with users are important in POI recommendation task since users are more likely to be influenced by their close friends (Who keeps company with the wolf will learn to howl). In this work, we incorporate social ties, check-in time interval, sequential and geographical effect into user-POI interaction graph to joint learn user and POI representations. Lastly, dynamic and personalized preferences, users' preferences are changing dynamically over time. At different time and circumstances, users may prefer different POIs. For example, some users prefer to visit gourmet restaurants in the local area, but when they go to a new city, some prefer to visit the cultural landscapes, while some prefer the natural landscapes. Dynamically and accurately capturing this trend has been proved to be essential for personalized POI recommendation task. However, effectively modeling the personalized sequential transitions from the sparse check-in data is challenging.
To address the aforementioned challenges, in this work, we stand on advances in embedding technique and RNN network, and propose our model, named PPR, which is a spatial-temporal representation learning framework for personalized and successive POI recommendation. First, we jointly model the user-POI relation, sequential effect, geographical influence and social ties by constructing a heterogeneous graph, and then develop a densifying trick by adding secondorder neighbors to nodes with low in/out-degrees to alleviate the data sparsity issue. Then, we learn user and POI representations by embedding the densified heterogeneous graph into a shared low-dimensional space. Furthermore, to better capture the user dynamic and personalized preference, we also design a spatiotemporal neural network by concatenating user embedding, POI embedding and POI category as personalized sequence input to feed the network.
The main contributions of this paper are summarized as follows: -We propose a novel PPR model for personalized POI recommendation, which incorporates users' check-in records and social ties. We construct a heterogeneous graph by jointly taking user-POI relation, sequential pattern, geographical effect and social ties into consideration to learn the representations of users and POIs. -We propose a spatio-temporal neural network to model users' dynamic and personalized preference by concatenating user, POI embedding and POI category to generate personalized behavior sequence. -We conduct extensive experiments to compare our method with state-ofthe-art baselines, and our method significantly outperforms state-of-the-art baselines for successive POI recommendation task.

Related Work
General POI Recommendation. The most well-known approaches of personalized recommendation are collaborative filtering (CF) and Matrix Factorization (MF). The conventional CF techniques have been widely studied for POI recommendation. LARS [6] employs item-based CF to make POI recommendation with the consideration of travel penalty. FCF [22] is a friend-based CF model based on the common visited POIs among friends, which considers the social influence. UTE [23] is a collaborative recommendation model that incorporates with temporal and geographical information. However, such methods suffer the data sparsity problem, leading them difficult to identify similar users. Recommendation models based on MF and embedding learning [7,11,12] have been intensively studied. Rank-GeoFM [9] fits the users' preference rankings for POIs to learn the latent embeddings. By incorporating the geographical context, it utilizes a geographical factorization method for calculating the recommendation score. TSG-MF [26] models the multi-tag influences via extracting a user-tag matrix and the social influences via social regularization, and uses a normalized function to model geographical influences.
Next POI Recommendation. In the literature, next POI recommendation issues have been studied in [19,28], in which the main objective is to exploit the user's check-in sequence between different POIs and dynamic preference.
Markov Chains (MC) Based Methods. MC based models aim to predict the next behavior according the historical sequential behaviors. FPMC-LR [3] considers first-order Markov chain for POI transitions and distance constraints. HMM [21] exploits check-in category information to capture the latent user movement pattern by using a mixed hidden Markov chain. LORE [24] incrementally mines sequential patterns and represents it as a dynamic location-location transition graph. By utilize an additive Markov chain, LORE fuses the sequential, geographical and social influence in a unified way.
Graph-Based Methods. Graph-based approaches are exploring in the literature of next POI recommendation. GE [20] jointly captures the latent relations among the POI, region, time slot and words related to the POIs by constructing four bipartite graphs. HME [5] projects the entities into a hyperbolic space after study multiple contextual subgraphs. Although the above approaches achieve promising performance, they can not model the sequential patterns effectively.

RNN-Based Methods.
Recently, RNNs such as LSTM or GRU have demonstrated groundbreaking performance on predicting sequential problem. ST-RNN [10] utilizes RNN structure to model the temporal contexts by carefully designing the time-specific and distance-specific transition matrices. NEXT [25] encodes the sequential relations within the pre-trained POI embeddings by adopting DeepWalk [16] technique. Time-LSTM [30] employs LSTM with time gates to capture time interval among users' behaviors. CAPE [1] first uses a check-in context layer to capture the geographical influence of POIs and a text content layer to model the characteristics of POIs from text content. Then, CAPE employs RNN as recommendation component to predict successive POIs. PEU-RNN [13] proposes a LSTM based model that combines the user and POI embeddings, which are learned from Word2Vec. ASPPA [27] proposes to identify the semantic subsequences of POIs and discover their sequential patterns. Recently, STGN [28] extends the LSTM gating mechanism with the spatial and temporal gates to capture the user's space and time preference. However, these approaches fail to capture users' personalized preferences.

Problem Definition
In this section, we first give the key concepts used in this paper. Then, the problem definition for personalized POI recommendation is formulated.

Definition 1 (POI).
A POI is a uniquely identified venue in the form of p, , cat , where p is the POI identifier, cat denotes the category of the POI, and represents the geographical coordinates of the POI (i.e., longitude and latitude).

Definition 2 (Check-in record). A check-in record is a triple
The collection of all users is denoted as U , and the collection of all POIs is denoted as V .

Definition 4 (Social Ties). Social ties among users is defined as a graph
where U is the set of users, and E u is the set of edges between the users. Each edge e ij ∈ E u represents users u i and u j being friends in LBSNs and is associated with a weight w ij > 0, which indicates their tie strength.

Problem 1 (Successive POI Recommendation). Given users' check-in records and their social ties, and a querying user u with his/her current check-in
u, v, t , our goal is to recommend top-k POIs that user u would be interested in in the next τ time period.

Methodology
In this section, we first present the details of the proposed framework PPR. Then we introduce our model how to utilize PPR model to make personalized POI recommendation.

Heterogeneous Graph Construction
We first introduce the heterogeneous User-POI graph to model users' sequential check-ins and social relationships. Specifically, we employ a heterogeneous graph G = (V, U, E, W ) to jointly model the multiple relations between users and POIs. U and V are the user collection and POI collection respectively, and E is the set of all edges between nodes in G, which are categorized into three edge types, i.e., E u , E v , and E u,v . As mentioned in Definition 4, each edge e i,j ∈ E u represents that user u i and u j are friends. Each edge e i,j ∈ E v denotes that there exists at least one user visits POI v j after visiting POI v i , and each edge e i,j ∈ E u,v indicates that user u i visits POI v j at least one time. Notice that each edge e ∈ E u ∪ E u,v is a bi-directed edge and each edge e ∈ E v is a directed edge, and each edge is associated with a weight w ∈ W (w > 0), which indicates the strength of the relation.
Modeling User-POI Relation. Intuitively, we consider that if user u i visit POI v j more frequent, u i and v j have a stronger relation than with other POIs. Therefore, we formulate the weight between user u i and POI v j as: where freq(, ) denotes check-in frequency of user u i visiting POI v j . Since we aim to build a directed graph to accommodate the following work, we define w i,j = w j,i for the bi-directed edge e i,j ∈ E u,v between user u i and POI v j .
Modeling Sequential and Geographical Effect. Compared with general POI recommendation, successive POI recommendation pays more attention to sequential pattern. The impact of user's recent check-in behaviors are greater than those of a long time ago when making POI recommendations [20]. To further model the sequential effect, we carefully design a weighting strategy for the edges in E v . Let Δt u k,k+1 be the time interval between two consecutive check-in records in the trajectory T u of user u. l u k,k+1 is the flag that indicates the status of a pair of consecutive check-in records in the trajectory T u , which is defined as: where θ is a predefined time threshold.
for the edge e i,j is defined as: Namely, the weight w for the edge from POI v i to POI v j is the total number of times that all users visit v i first and then v j in their trajectories.
Furthermore, geographical influence indicates the impact of geographical distance to the users' spatial behaviors. According to [5,8], the distribution of the geographical distance between two successive POIs follows the power-law distribution, which means users are more willing to visit POIs close to the current location. Therefore, we incorporate the geographical distance into our model as follows: where N (v i ) represents the set of out-neighbor POIs of POI v i in E v , d i,j denotes the Euclidean distance between POIs v i and v j , and κ is the negative exponent (i.e., κ < 0). Finally, we combine the sequential and geographical influence as follows: In such way, the sequential, time interval and geographical information are all reflected in graph G.
Modeling Social Tie Strength. Users in an LBSN have multiple types of relations with other users, such as friends, family and colleagues. The preference of a user in social network are easily affected by his/her close friends or other users which has some kind of relations with them. Recently, these social ties are incorporated into the POI recommendation system [25] to improve the recommendation performance. In this work, we propose to assign the weight between the users based on their historical check-in interactions. Specifically, for two socially connected users u i and u j , we assign the edge weight w i,j as: where ε is a very small float number to avoid two users have connection but no common visited POIs, f ui,v denotes the frequency of user u i visiting at POI v, and |T ui ∩ T uj | represents the number of the common visited POIs for user u i and u j . Therefore, the common preferences between socially connected users are also taken into account in the User-POI graph G.
Densifying Graph. Most recommendation models need to take the data sparsity into consideration, but the check-in data in POI recommendation area is much sparser. To address the data sparsity issue, we propose to construct a dense graph based on the graph G. Specifically, we regard each user and POI as a node, and expand the neighbors of those nodes with low in/out degrees by adding higher order neighbors. In this work, we only consider expanding secondorder neighbors to every node. If the out-degree of a node in G is less than a predefined threshold ρ, we create an edge from node v i to its second-order out-neighbor node v j and assign the weight as follows: where N (v i ) is the set of out-neighbors of node v i , and d (o) k is the out-degree of the node v k . The densifying method for nodes with a low in-degree less than ρ is same. After densifying the User-POI graph, we can get the a more dense network, denoted by G dense . Then we use G dense instead of G and exploit embedding technique to learn the nodes' representation vectors.

Learning Latent Representation
Inspired by LINE [17], which learns the first-and second-order relations representations of homogeneous networks. We develop it to learn heterogeneous node representations on our constructed heterogeneous graph G dense .
Specifically, we regard each user or POI as a node v and ignore their node type. In graph G dense , each node plays two roles: the node itself and a specific "context" of other nodes. We use − → v i to denote the embedding vector of node v i when it is treated as a node, and − → v i to denote the embedding vector of v i when it is treated as a specific "context". In particular, we use a binary cross-entropy loss to encourage nodes and their "context" connected with an edge, to have similar embeddings. Therefore, we minimize the following objective function: where σ() is the sigmoid function, − → v j T denotes vector transpose, Neg(v i ) is a negative edge sampling w.r.t. node v i in G dense , and w n denotes the negative sampling ratio, which is a tunable hyper-parameter to balance the positive and negative samples. By minimizing the objective function O with ASGD (asynchronous stochastic gradient) optimization and edge sampling technique, we can learn a ddimensional embedding vector for each user and POI in G dense . Additionally, the representation learning is highly efficient and is able to scale to very large graphs because of the use of edge sampling technique. After representation learning, all users and POIs are mapped into a low dimensional space. However, the latent representations only capture the users' preferences or POIs' characteristics in a general way. Although it can model sequence transition patterns and geographical influence, some personalized preference may not be preserved in the node representations.

Modeling User Dynamic and Personalized Preference
Furthermore, the categories of POIs are very useful to make a better representation of venues and improve the recommendation performance. In order to model user dynamic and personalized preference, we propose to concatenate user embedding, POI embedding and POI category to generate a new and more personalized embedding to represent a check-in record. More concretely, we use one-hot encoding to represent the POI category information.
Additionally, to better model user dynamic preference and sequential behavior patterns, we utilize LSTM model to construct a spatio-temporal neural network. As illustrated in Fig. 1, h t and c t denote the hidden state and cell state of LSTM at time t respectively. Given a user u and his/her trajectory sequence T u , first, we concatenate the user embedding, POI embeddings with POI categories that he/she visited, and we can get a new embedding sequence. Second, we feed LSTM network with these new embedding sequences of all users. Specifically, we utilize the first i − 1 POIs as input to train the network, and predict the (i + 1)-th POI as the recommended POI based on the current i-th POI. At the output layer, we also connect a multi-layer perceptron (MLP). Therefore, we use the following objective function to train the model: where h t is hidden representation at time step t, MSE(·, ·) is a criterion that measures the mean squared error (e.g., squared L2 norm) between each element.

Personalized POI Recommendation
As described in Sect. 4.3, the user embedding and the first i POI embedding sequence are used to train the spatio-temporal neural network. For the querying user u, the embedding vector of the (i + 1)-th POI can be predicted by the Therefore, for each POI v, we calculate its recommendation score as follows: Finally, we rank all POIs by their recommendation scores and select top-k POIs as the candidate that user u is more likely to visit in the next τ time period.

Datasets
We conduct extensive experiments on three public real-world large-scale datasets: Foursquare 1 , Gowalla 2 and Brightkite 3 . The basic statistics of these three datasets are summarized in Table 1. Notice that we preprocess these datasets utilizing the same method of [29] by filtering the POIs visited by less than five users and the users with less than ten check-in records. Notice that there are 35 POI categories in Foursquare, and no category information is attached to Gowalla and Brightkite datasets.

Evaluation Metrics
To evaluate the recommendation model performance, we use four widely-used metrics, i.e., Accuracy (Acc@k), Precision (P re@k), Recall (Rec@k) and Normalized Discounted Cumulative Gain (NDCG@k), which are also used to evaluate top-k POI recommendation in [1,18,27,28]. Let #hit@k denote the number of hits in the test set, and |D T est | is the number of all test records. Acc@k is defined as: Let R k denote the top-k POIs with the highest recommendation score, and T k be the ground truth of the corresponding record, respectively. P re@k and Rec@k are defined as: To better measure the ranking quality, we further utilize NDCG@k, which assigns higher scores to POIs at top position ranks, to evaluate the model. The NDCG@k for each test case is defined as: where 1 log 2 (i+1) and rel i = 1 refers to the graded relevance of result ranked at position i. We use the binary relevance in our experiments, i.e., rel i = 1 if the recommended POI is in the ground truth, otherwise, rel i = 0.

Baselines
We compare our model against the following baselines for successive POI recommendation: -Rank-GeoFM [9]: It is a ranking based geographical factorization model, which earns the embeddings of users and POIs by combining geographical and temporal influence in a weighting scheme. -ST-RNN [10]: ST-RNN is a RNN-based model with spatial and temporal contexts for next POI recommendation. -GE [20]: GE jointly learns the embedding of POIs, regions, time slots and word into a shared low dimensional space by constructing four bipartite graphs. -PEU-RNN [13]: It is a LSTM based model that combines the user and POI embeddings, which are learned from Word2Vec, for modeling the dynamic user preference and successive transition influence. -SAE-NAD [14]: SAE-NAD exploits the self-attentive encoder to differentiate the user preference and the neighbor-aware decoder to incorporate the geographical context information for POI recommendation.
Notice that STGN [28] and ASPPA [27] are not compared in our experiment due to no publicly available source code. However, our PPR consistently outperforms ASPPA and STGN in terms of Acc@k on both Foursquare and Gowalla datasets according to the experimental results reported in [27]

Parameter Setting
Following [24,28,29], we utilize the first 80% chronological check-ins of each user as the training set, the remaining 20% as the test data. We use the source code released by their authors for baselines. We set learning rate to 0.0025 in graph embedding, embedding dimension d to 128, the number of negative samples to 5, threshold θ to 24 h, κ to −2, ε to 0.5 and in/out-degree threshold ρ to 400. Following [5], we uniformly set the next time period as τ = 6 h for all methods unless stated otherwise, and other parameters of all baselines are tuned to be optimal. In the experiment, we use a two-layer stacked LSTM, the hidden state size is 128. The learning rate of LSTM is set to 0.001 with epoch decay, which makes the learning rate becomes 1/10 of the original value when the number of training rounds reaches 75%.   First, we evaluate the overall performance of our model PPR compared with five baselines on three real-world datasets. We repeat 10 runs for all methods on each dataset and report average Acc@k, P re@k, Rec@k and NDCG@k in Table 2, Table 3 and Table 4, respectively.

Performance Comparison
From Table 2, we observe that PPR is significantly better than all baselines in terms of four evaluation metrics on Foursquare dataset. Specifically, PPR achieves 0.3008 in Acc@5 and 0.3935 in Acc@10, improving 22.5% and 22.2% over second-best baseline Rank-GeoFM and SAD-NAE, respectively. Additionally, our PPR slightly outperforms the strong baselines (e.g., SAD-NAE) in P re@k, but it is significantly better than the strong baselines in Rec@k.
As depicted in Table 3, our PPR also significantly outperforms all baselines in terms of Acc@k, P re@k, Rec@k and NDCG@k on Gowalla dataset. In particular, PPR performs better than the second-best baseline by 14.6% in Acc@k and 9.2% in NDCG@k on average. PPR shows slightly poor performance compared to PEU-RNN in terms of Rec@10. This phenomenon can be explained that PEU-RNN uses a distance constraint, which may significantly reduce the potential POIs as k increases.
As we can see in Table 4, PPR consistently significantly outperforms all baselines in terms of all evaluation metrics on Brightkite dataset. PPR achieves the state-of-the-art performance, e.g., 0.8717 in Acc@5 and 0.8485 in Rec@5. More specifically, our PPR achieves about 21.3%, 24.4%, 22.2% and 22.4% improvement compared to state-of-the-art RNN-based method PEU-RNN in terms of Acc@5, P re@5, Rec@5 and NDCG@5, respectively. Furthermore, all methods achieve better performance on Brightkite than the other datasets. This is because users in Brightkite have more check-in records than users in Foursquare and Gowalla on average, which may enable all methods to model users' behavior and preference more accurately.

Fig. 2. Performance comparison of variations
To explore the benefits of incorporating the sequential and geographical effect, densifying technique and modeling personalized preference into PPR respectively, we compare our model with four carefully designed variations, i.e., PPR-RL, PPR-Seq, PPR-Den and PPR-GRU. We show the results in terms of Acc@5, P re@5, Rec@5, and NDCG@5 on three datasets in Fig. 2.
Based on the results, we have the following observations: First, PPR achieves the best performance in most cases on three datasets, indicating that PPR benefits from simultaneously considering the various contextual factors and personalized preference in a joint way. Second, the contributions of different components to recommendation performance are different. Sequential and geographical effect and modeling personalized preference have comparable importance, specifically, the later contributes more on Gowalla, and the former contributes more on Foursquare. And both of them are necessary for improving performance. Furthermore, through the comparison of PPR and PPR-Den, it is obvious that the densifying trick works for alleviating the data sparse issue. Third, our PPR and PPR-GRU exhibit a decent performance compared to other variations, which indicates that sequential pattern and users' dynamic and personalized preference play an important role in location-based recommendation.

Sensitivity of Hyper-parameters
We now investigate the sensitivity of our model compared against three strong baselines (i.e., Rank-GeoFM, PEU-RNN, and SAE-NAD) with respect to the important parameters, including embedding dimension d, the number of recommended POIs k, and next time period τ . To clearly show the influence of these parameters, we report Acc@5 with different parameter settings on Foursquare and Gowalla datasets. Figure 3 and Fig. 4 show the experimental results.
As shown in Figs. 3(a) and 4(a), PPR achieves best performance compared to the three strong baselines with the increasing number of dimension d. Meanwhile, PPR achieves the best result when d = 128, and then begins to decline as d further increases. From the results in Figs. 3(b) and 4(b), we can see that the recommendation accuracy of all methods increases as k increases. This is expected, because the more results are recommended, the easier they are to fall into the ground truth. However, we also observe that our PPR exhibits an increasing performance improvement compared to all baselines, as k increases. In Figs. 3(c) and 4(c), as τ increases, our PPR is also consistently better than the strong baselines. More specifically, PPR improves the recommendation accuracy more significantly for near future prediction (e.g., τ = 2 vs. τ = 12), indicating that our PPR can effectively capture users' personalized preferences, especially short-term preferences.

Conclusion
In this work, we propose a novel spatio-temporal representation learning model for personalized POI recommendation. By incorporating the user-POI relation, sequential effect, geographical effect and social ties, we construct a heterogeneous network. Afterwards, we exploit the embedding technique to learn the latent representation of users and POIs. In light of recent success of RNN on sequential prediction problem, we feed the spatio-temporal network with concatenated user and POI embedding sequences for capturing the users' dynamic and personalized preference. The results on three real-world datasets demonstrate the superiority of our proposal over state-of-the-art baselines. Furthermore, we explore the importance of each factor in improving recommendation performance. We observe that sequential effect, geographical effect, and users' dynamic and personalized preference play a vital role in POI recommendation task.