1 Introduction

In the past decade, the advance of mobile computing techniques has led to the widespread popularity of location-based service (LBS) in mobile networks. Many companies have launched LBS applications over mobile devices such as electronic map service, online ride-hailing service, online reviewing service and check-in service, which greatly facilitate the process for people to acquire the information and to access the wanted service. Among all of LBS applications, one of the key techniques is the next Point-of-Interest (POI) recommendation that seeks to accurately predict the POI where a user is interested in or she/he may visit in the future [1].

The spatial information (i.e., the located latitude–longitudes) and temporal information (i.e., timestamps) play a very important role in the next POI recommendation in the LBS applications when compared to the recommendation for common items [2, 3]. Note that people’s trajectories in a short period are always within a small region, and they tend to focus on several main POIs (e.g., home and company) in many daily scenarios. Typically, there are three most practical features in the POI recommendation in LBS applications.

  1. 1.

    The content information helps identify the correlation among users such as attributes [4, 5], and the correlation among POIs such as distances [6], such that the recommendation can be conducted upon the correlations or similarities of users and POIs.

  2. 2.

    The collaborative information contributes to the factorization-based POI recommendation strategies, which locate the possibly interesting POIs by the historical implicit feedback [7, 8].

  3. 3.

    The sequence information captures the users’ moving patterns and their preferences [9], where the spatial-temporal features can be appended for task-aware optimization [10,11,12].

The advances of localization and navigation techniques have helped in acquiring the high-precision position of users and a huge mass of “check-in” data [13], which facilitate the data analysis regarding the above three features. For instance, a typical way is to extract a user’s moving patterns, identify their travel interests via the session of latitude–longitude traces, and showcase the sequence of POIs that the user has visited, from the set of “check-in” data [14]. Then, it is feasible to learn descriptive embeddings for modeling the geographical attributes [15] and to take advantage of the advanced recurrent neural networks [16] for exploiting the sequence information for next POI recommendation.

However, many state-of-the-art recommender systems do not work well for the next POI recommendation task, when certain important data is missing (e.g., missing the data of various moving patterns), which raises four types of challenges.

  1. 1.

    Lack of User Attributes For privacy concerns, many LBS applications do not require users to log in or register specific attributes. Instead, the server allocates a unique ID to each client and stores the records in cookies.

  2. 2.

    Lack of POI Labels There can be a vast number of POIs that may attract users, while it is not possible to obtain comprehensive labels for all the POIs. For instance, the label of a POI can be a restaurant, shopping mall, office building, hospital, etc.

  3. 3.

    Discontinuous Mobility Traces People travel from one POI to another, whereas the mobility traces only expose partial observations of the entire trajectory.

  4. 4.

    Various Mobility Patterns. People have various mobility patterns due to their different capabilities of traveling over distance (e.g., travel within a small region, or over a wide range). It is necessary to learn the mobility patterns and infer the user preferences within the sessions of observations.

These four challenges are common in the anonymous LBS applications, and thus the anonymous recommendation of POI should be addressed to provide the next POI for users.

In this paper, we propose the Geographical Attentive Recommendation via Graph (GARG) for the task of the anonymous recommendation of POI, which combines the state-of-the-art attention mechanism and the graph convolution network (GCN) to address the above challenges. Specifically, we implement a collaborative preference module in GARG to learn the embeddings as the user attributes over the collaborative information. Besides, GARG employs a geographical preference module to investigate the content-aware information and overcome the lack of labels, where GCN is leveraged upon the content information, i.e., distance, between each pair of POIs to adaptively identify the correlations. Beyond that, we utilize a gated recurrent unit (GRU) network with attention mechanism in the geographical preference module to tackle the discontinuous observations and recognize the mobility patterns via the sequence information. Overall, we make the following contributions in this work.

  • Data-Driven Approach We observe from the real-world check-in data that people have their own mobility patterns and they tend to visit a series of POIs within a certain area.

  • Automatic Identification of Geographical Influence Previous works design different functions over distances between paired POIs to model the influence of historical POIs on the objective. GARG automatically learns POI correlation from location data and mass of check-ins.

  • Efficiently Capture Users’ Current Activity Area Users’ check-ins cover a long range of time. Some might move to another city and some might temporarily travel far away. GARG can adaptively differentiate the contribution of historical POIs and efficiently recognize users’ current activity areas.

We conduct evaluations over three real-world datasets with various moving patterns to compare the performance of GARG with several state-of-the-art POI recommendation strategies. Experimental results confirm that GARG makes a remarkable improvement in the precision and recall metrics, while the fine-grained analyses indicate that the modules of GARG all contribute to the recommendation task. Furthermore, GARG can be easily embedded into existing mobile applications for improving customer satisfaction owing to its generality and efficiency.

The rest of the paper is organized as follows. Related works are introduced in Sect. 2. In Sect. 3, we analyze the characteristics of data from three LBS applications and point out the aspects to be considered in the next POI recommendation. Section 4 presents the overview of the GARG architecture and explains the rationality of each component. Experimental results and conclusions are provided in Sects. 5 and 6, respectively.

2 Related Work

Due to the widespread use of recommender systems in mobile LBS applications, researchers have proposed efficient recommendation strategies using either the implicit feedback or the sequences of POIs.

POI Recommendation by Implicit Feedback Some POI recommender systems equally treat the POIs in the historical traces of users as the implicit feedback and attempt to learn the embeddings via reconstruction of these implicit feedback records [17]. Based on the naïve probabilistic matrix factorization methods [18], many works focus on involving extended features for improvement on accuracy. Regarding learning the user embeddings, existing works are conducted under the assumption that users may present similar interests with those sharing the same attributes [19]. In terms of the POI embeddings, the most significant issue is to model the geographical similarity or influence [8, 20] between each pair of POIs. Besides, UCF+G [21], Rank-GeoFM [22] and GeoIE [6] assume that POIs with close distance share strong correlation, and retrieve the geographical influence by the manually defined functions over distances. Moreover, due to the spatial-clustering phenomenon in the POI recommendation, researchers pay attention to inferring users’ activity areas from their historical behavior. MGMPFM [23] utilizes the multi-center Gaussian model to learn regions of activity, GeoMF [24] introduces activity vectors of users and influence vectors of POIs to augment factorization model and Geo-ALM [25] fuses users’ preferences for the POIs and the regions that the POIs belong to. Recent works consider the user-POI interactions, POI-POI relations and other information such as social relationship, in the view of graphs. HRec [26] leverages social relationship to enhance user representations and adopts graph learning approach to learn the user/POI representations from three graphs. JLGE [27] jointly learns the embedding from six graphs of the user-POI-Time period relationships. However, these methods fail to learn the sequence information and may lose efficacy when the attribute features are absent.

POI Recommendation by Sequence Recently, many researchers notice the importance of sequence information for recommending POIs [15, 28, 29]. In addition to the context information contained in the sequence, time interval and spatial transformation should not be ignored as well. Some researchers try to model spatio-temporal context and sequence information by augmenting traditional matrix factorization [30]. STELLAR [31] extends matrix factorization by time vectors to explicitly model the POI-time interactions. Inspired by the great success of the sequence to sequence model in the natural language processing, there have been efforts resorted to the recurrent neural networks. ST-RNN [32] adapts RNN structure to model local temporal and spatial contexts with time-specific transition matrices for different time-intervals and distance-specific transition matrices for different geographical distances. Beyond the legacy long-short term memory (LSTM) [33] network, ST-LSTM [12] proposes a new recurrent cell with time-aware and space-aware gates owing to spatio-temporal characteristics. TMCA [34] introduces the attention mechanism to learn the weights over each POI in the traces and produces the prediction under the consideration of some POIs with high weights. Further, GT-HAN [35, 36] adopts the attention mechanism to capture the geographical relations and employs bi-LSTM to capture sequence dependence. Some works utilize side information, such as comments, to enhance the user/POI representations. NEXT [29] incorporates meta-data information, time interval and visit time, and leverages the DeepWalk method to encode such knowledge. MMR [37] employs graph embedding technique to learn embeddings from heterogeneous graph, and adopts LSTM with attention to capture sequence pattern. Although these strategies are adept at representing the sequences, a major issue is that they do not make full use of the collaborative information to identify the POI-wise correlation.

The GARG proposed in this paper fills this gap via a comprehensive consideration of both the collaborative information and the sequence information, and therefore the performance can be further improved.

3 Measurements and Observations over Anonymous LBS Datasets

In this section, we discuss the characteristics of data from LBS applications and show the aspects to be considered in the POI recommendation. Previous works have pointed out the “clustering phenomenon” that people tend to visit places concentrated in a region especially within a short period. In addition, many users routinely revisit familiar places while exploring new places in their living area simultaneously. Moreover, users’ preference for POIs is a unity of common and personality.

Anonymous LBS Datasets We use three real-world datasets to investigate the user behaviors and mobility patterns within the LBS applications.

  • GowallaFootnote 1 [14] records the traces of users on POIs from 7 main categories, i.e., Community, Entertainment, Food, Nightlife, Outdoors, Shopping and Travel. It can be inferred that the dataset tends to describe users’ interests via the sessions of traces.

  • FoursquareFootnote 2 [38] contains check-in traces in New York, which is originally used for studying the spatial-temporal regularity of user activity in LBS networks.

  • BrightkiteFootnote 3 [2] was proposed to analyze users’ mobility patterns, exposing that users move back and forth among a few of POIs.

The last two datasets expose users’ habits more than their interests; meanwhile, Foursquare tends to have a shorter average distance of POI traces than Brightkite. These three datasets are the most commonly used anonymous datasets for POI-recommendation, where the user attributes are not provided and the annotations to POIs are not available in all datasets.

Observation 1 There are clustering effects among users’ POI trajectories. The basic assumption behind collaborative filtering is that users with similar preferences might be interested in similar items. We adopt the Jaccard similarity coefficient as depicted by Eq. 1 to measure the similarity of users’ trajectories:

$$\begin{aligned} \text {Sim}(u_i,u_j)=\frac{|P_{u_i}\cap P_{u_j}|}{|P_{u_i}\cup P_{u_j}|}, \end{aligned}$$
(1)

where \(P_{u_i}\) is the set of POIs visited by user \(u_i\). For each user, we take the average similarity to five users with the most similar POI trajectories (highest Jaccard similarity coefficient values) as the user-specific similarity index. Figures 1, 2 and 3 depict the distribution of the similarity on three datasets, respectively, indicating that for each user, we can always find users with relatively similar trajectories. Therefore, collaborative filtering works to some degree for inferring the POIs which users would be potentially interested in. Besides, we take a further look at the correlation between similarity and distance between the estimated activity center. We take the average latitude and longitude of visited POIs as users’ activity center and calculate the geographical distance between centers. Given the latitude and longitude of two points \(P_1(lng_1,lat_1), P_2(lng_2,lat_2)\), the distance between them is calculated as follows [21]:

$$\begin{aligned} \begin{aligned}&\phi _1 = (90.0-lat_1)\times \frac{\pi }{180.0}, \quad \phi _2 = (90.0-lat_2)\times \frac{\pi }{180.0}\\&\theta _1 = lng_1\times \frac{\pi }{180.0}, \quad \theta _2 = lng_2\times \frac{\pi }{180.0}\\&\text {dist}(P_1, P_2) = R_e\times \arccos {(\sin {\phi _1}\sin {\phi _2}\cos {(\theta _1-\theta _2)} + \cos {\phi _1}\cos {\phi _2})} \end{aligned} \end{aligned}$$
(2)

where \(R_e\) is the radius of earth, i.e., 6371km. As shown in Fig. 4, it is reasonable to infer that users with higher similarity might live in the same neighborhood with high probability.

Fig. 1
figure 1

Distribution of trajectory similarity on Gowalla

Fig. 2
figure 2

Distribution of trajectory similarity on Foursquare

Fig. 3
figure 3

Distribution of trajectory similarity on Brightkite

Fig. 4
figure 4

Correlation of distance and similarity of POI trajectories between users

Observation 2 Users tend to present revisiting behaviors and various mobility patterns. The decisions of people can be categorized into exploration and exploitation. Regarding general items such as movies, people hardly watch one movie repeatedly but tend to explore new ones. On the contrary, we find out that most people not only try to explore unvisited places but also frequently revisit familiar places.

Fig. 5
figure 5

The column-wise cumulative probabilities of the most frequently visited 10 POIs from 10 randomly selected users. Each row represents the data of a selected user, and each column corresponds to one of the POIs. The expected frequency of 10 POIs under uniform distribution assumption is also given in the last row

We randomly select 10 users from the Foursquare dataset and depict the cumulative probabilities of their most frequently visited 10 POIs in Fig. 5 by columns. We also present the expected frequency (probability) of 10 POIs if users would visit POIs in a uniform distribution, as shown in the last row of Fig. 5. Results reveal a big gap between the cumulative probabilities of 10 POIs under real-world scenario and uniform distribution, indicating that users tend to have revisiting behaviors in real life.

Observation 3 There are usually main interests within the POI trajectories, while the POI trajectories are usually monitored discretely and partially. Different users have their own preferences in visiting different kinds of places. We analyze users’ visiting frequency of different types of POIs. We randomly sample 100 users from Foursquare dataset and summarize the number of visiting times to different types of places. Figure 6 shows the proportion of different types of places visited by users, respectively. For simplification, we only keep the 19 most popular types and categorize the remaining into “Other Places”. For example, we can see that some users might be students or teaching staffs since they always go to campus and some might be in the habit of exercising for their frequent visit to the stadium. It is worth noting that not all users frequently go back home according to the check-ins. The reason might be that the LBS application only captures users’ partial trajectories, which raises the difficulty of extracting users’ mobility patterns and inferring sequence behavior.

Fig. 6
figure 6

Repeat visiting behaviors to familiar POIs

4 Anonymous Recommendation of POI by GARG

Based on the three observations in Sect. 3, we introduce a novel recommender system, named Geographical Attentive Recommendation via Graph (GARG) for addressing the challenges in recommending POIs in anonymous LBS applications. GARG consists of two modules: (1) the collaborative preference module learns the interest-aware embeddings and attempts to model the general user/POI preference via the collaborative information; and (2) the geographical preference module manages the sequential and the content-aware information for identifying location-based geographical correlation among POIs. Figure 7 presents an overview of GARG, and the detailed design will be presented in this section.

Fig. 7
figure 7

The architecture of GARG, consisting of a collaborative preference module and a geographical preference module

4.1 Problem Statement

Suppose the set of users and POIs on the LBS platform are denoted by U and P, respectively. For each POI \(p\in P\), the only accessible feature is the geographical coordinates \(\delta _p\) of latitude and longitude. We can then retrieve the physical distance \(d_{pp'}\) between POI p and POI \(p'\) and construct the adjacency matrix \(A_P\) based on the distance between each pair of POIs. Let \(H_u=[x_1, x_2, \ldots , x_{N_u}]\) represent the sequence of POIs visited by user \(u\in U\) with length \(N_u\) in a chronological order. A POI recommender system is expected to recommend several POIs for each user u which she/he will be interested in but has not visited yet.

4.2 Embedding Layer

We are going to project the user preferences into latent space of dimensionality h, where each user \(u\in U\) corresponds to an embedding vector \(q_u\in {\mathbb {R}}^{h}\). Meanwhile, we transform the POIs into embeddings from the same latent space, and thus we could measure similarity and correlation between users and POIs or between pairs of POIs by dot-product. Specifically, we build two sets of embeddings for the POIs, namely the preference embeddings, denoted as \(l_p\) for \(p\in P\), and the geographical influenced embeddings, denoted as \(g_p\) for \(p\in P\), to capture the collaborative aspect and the content aspect of POIs. Therefore, the historical POI sequence \(H_u\) can be transformed into a sequence of embeddings \(E_u = [l_{x_1},l_{x_2},\ldots ,l_{x_{N_u}}]\).

4.3 Collaborative Preference Module

Many works regard the users’ check-in records as implicit feedback, and they assume that if user u once visited POI p, the probability that u is interested in p is quite high. Under this assumption, the POI recommendation task turns into a classic recommendation problem, i.e., to learn representative user embeddings and POI embeddings for the optimal implicit feedback reconstruction. Specifically, we map the users and POIs into embeddings of the latent vectors, i.e., \(q_u\) and \(l_p\), to represent the user u’s characteristics and POI p’s features, respectively. We keep the embeddings normalized, and we leverage the inner product of \(q_u\) and \(l_p\) to measure the general preference of user u to POI p without explicitly taking spatio-temporal constraint into consideration, named as the collaborative preference. Let \(y_{up}^{\text {C}}\) denote the collaborative preference between user u and POI p, which is computed by

$$\begin{aligned} y_{up}^{\text {C}}={q_u}^\intercal l_p. \end{aligned}$$
(3)

4.4 Geographical Preference Module

Sequence information in POI recommendation. The collaborative information generates the latent feature vectors of users and POIs based on the user-POI visiting history, which only records the preference/interaction frequency of users on POIs. However, the schemes have not taken user mobility patterns on POIs into account. Besides, the order of POIs users visited contains rich information about the correlation/similarity between POIs. Owing to the mobility patterns of the individual user or the private user interests, POIs appearing under similar contexts usually tend to be more similar. Furthermore, we observe that the latest activity area of users would be an essential factor for reducing the number of candidate POIs, as people may not frequently visit POIs far away.

Learning sequential features by GRU. To take full advantage of the sequence information, we apply the recurrent neural network to extract sequence information and recurrently propagate the evidence for inferring the users’ mobility patterns and interests. Specifically, we leverage the advanced gated recurrent unit (GRU) network to tackle the sequential data, i.e.,

$$\begin{aligned} c_{i} = \text {GRU}(c_{i-1}, l_{x_{i}}; {\varvec{\theta }}^{\text {GRU}}), \end{aligned}$$
(4)

where \(c_{i}\) stands for the cell state at the i-th position of the sequence, and \(\varvec{\theta }^{\text {GRU}}\) represents the trainable parameters of the GRU cell with dimensionality of h. We could then evaluate the preference of user u to POI p from the sequential aspect by the dot-product.

Learning sequence representations by attention. However, as we do not have access to the explicit attitudes of a user to POIs from the implicit feedback sequence, some POIs in the sequence may not have strong correlation with the prediction of future POIs, such as the POIs that impose negative impression to the user, or the POIs that the user just visited by chance. The naïve GRU network cannot identify and skip the impact of these irrelevant POIs. Inspired by the great success of the neural translation techniques [39], we adopt the attention mechanism over the GRU network to extract the information behind the sequence. The attention mechanism assigns various weights to the POIs at different positions and computes the weighted sum of the outputs of all positions to be the representation of the sequence. We use GRU to capture the sequential behavior at first, as depicted by Eq. (4). GARG further adopts an item-level attention mechanism with parameter \(W^{\text {A}}\) to analyze to what extent the output of the last position of GRU attend to the outputs at the other positions, and probe into the important POIs for the prediction.

$$\begin{aligned} \alpha _i& = \frac{c_{i}^{\intercal }W^{\text {A}}c_{N_u}}{\sum _{i'=1}^{N_u}c_{i'}^{\intercal }W^{\text {A}}c_{N_u}} \end{aligned}$$
(5)
$$\begin{aligned} s_u& = \sum _{i=1}^{N_u}\alpha _i c_i \end{aligned}$$
(6)

where \(s_u\) indicates the sequence representation, and \(\alpha _i\) (\(i\in \{1,2,\ldots ,N_u\}\)) stands for the weight at each position. The rationale behind this design is that we assume the most recent POI will roughly limit the mobility regions of the user, while some of the previous POIs implicitly depict her mobility pattern and interest.

Content-aware correlation between POIs. Intuitively, people are likely to know the POIs around those they once visited and the premise for users to visit new POIs is to know those places. Hence, people within the region would have a high probability of visiting them one after another. It is essential to pay attention to POIs’ influential power and model the potential strong correlation with their neighbors. As we only have access to the latitude–longitude of these POIs, the most significant factor to model the correlation between pairs of POIs is the distance. We could construct the geographical graph and extract the adjacent measurement matrix \(A_P\) by the distances, where each element of \(A_P\) is computed by:

$$\begin{aligned} a_{pp'}=\left\{ \begin{aligned} \exp (-\frac{d_{pp'}^2}{\sigma ^2}) & p\ne p'\text { and }d_{pp'} \ge \epsilon \\ 0&\qquad \text {otherwise}\\ \end{aligned} \right. \end{aligned}$$
(7)

where the distance \(d_{pp'}\) is calculated by Eq. 2, and the \(\sigma \) and \(\epsilon \) are thresholds to control the scope and degree of influence, the closer the distance, the greater the influence. Assuming that the content-aware correlation is negatively correlated to the distance, many researchers have manually proposed mapping relations to compute the influence [6, 12]. However, they ignore the fact that the attributes of the POIs, which can be represented by the sequential features and the collaborative embeddings, also determine the correlation between POIs.

Learning Content-Aware Correlation by GCN To take full advantage of the content-aware correlation and the collaborative information of POIs, we use the advanced Graph Convolutional Network (GCN) [40] to model the geographical influence of the neighboring POIs for each POI \(p\in P\). GCN in GARG is built upon the adjacent measurement matrix \(A_P\), and learns the geographical features by integrating the knowledge from neighboring POIs. Each layer of GCN propagates the knowledge from each POI to its one-hop neighbors (i.e., the corresponding adjacent value greater than zero). Regarding the k-th layer of the GCN, it maintains trainable parameter \(W^{(k)}\) and generates the output \(G^{(k)}\), following:

$$\begin{aligned} G^{(k)}=\text {ReLU}({\tilde{D}}^{-\frac{1}{2}}{\tilde{A}}\tilde{D}^{-\frac{1}{2}} G^{(k-1)} W^{(k)}), \end{aligned}$$
(8)

where \({\tilde{A}} =A_P+{\mathbf {I}}_{|P|}\) is the adjacent matrix with added self-connections, \({\mathbf {I}}_{|P|}\) is the identity matrix, and \({\tilde{D}}\) is a diagonal matrix, \({\tilde{D}}_{pp}=\sum _{p'\in P} {\tilde{A}}_{pp'}\). Note that the embeddings of the GCN component \(G^{(0)}\) is initialized by the preference embeddings. Empirically, the correlation between pairs of POIs decays along with the length of the propagation route in the geographical graph between them. In this paper, we apply a two-layer GCN for GARG to capture the content-aware correlation, namely \(g_p \in G^{(2)}\) for \(p\in P\). Therefore, instead of merely using the sequence information, we could measure the preference of user u to POI p by combining the sequence information of the historical check-ins and the content-aware correlation, namely

$$\begin{aligned} y_{up}^{\text {S}}={s_u}^{\intercal } g_p. \end{aligned}$$
(9)
Fig. 8
figure 8

Pre@K of the compared methods on Gowalla

Fig. 9
figure 9

Rec@K of the compared methods on Gowalla

Fig. 10
figure 10

mAP@K of the compared methods on Gowalla

4.5 Optimization

To measure the intention that user u would like to visit a target POI p, we take both the collaborative preference and geographical preference into account, namely

$$\begin{aligned} y_{up}=y_{up}^{\text {C}}+\beta y_{up}^{\text {S}}, \end{aligned}$$
(10)

where \(\beta \) is a trainable factor to find the balance between these two types of preferences. Under various scenarios, \(\beta \) would be different to cope with the sequences in the datasets. We apply negative sampling to prevent overfitting on the historical sequences and enhance the representing capability of the embeddings. We organize the loss function as the negative likelihood:

$$\begin{aligned} {\mathcal {L}}(U,H) = -\sum _{u\in U}\sum _{p\in H_u}\frac{\Big (\log (\sigma (y_{up}))+ \log (1-\sigma (y_{u\bar{p}}))\Big )}{|U||H_u|}. \end{aligned}$$
(11)

where \({\bar{p}}\) represents one negative randomly sampled POI relative to the existing POIs at each position of the sequence \(H_u\) and \(\sigma (\cdot )\) is the sigmoid function. Considering that the negative sampling strategies do impact the performance of the model, we investigate the different sampling methods in Sect. 5.4. We adopt the Adam algorithm to adjust the learning rate for minimizing this likelihood objective function. To prevent overfitting, during each iteration, we re-sample the negative samples randomly and construct the negative set. We pretrain the embeddings of users and POIs with matrix factorization method. In this paper, we fix the dimensionality of all latent embeddings, i.e., h, as 128.

4.6 Analytical Results

We then analyze the outputs of GARG to show whether the prediction of GARG is feasible. Specifically, we find that GARG can precisely identify the preferences of users, and yield the POIs that are reachable by the users. The analytical results are mainly two-fold:

Proposition 1

Given two POIs that are identical except for distance, GARG recommends the one near the users’ recent activity area.

Proof

User u’s preference score for POI p is the combination of general preference and geographical preference. Under the assumption that the two POIs are identical other than distance, the general preference part \(y_{up}^{\text {C}}\) should be equal. Considering the geographical preference part, the sequence representation \(s_u\) attentively aggregates the sequence information from u’s historical trajectories which reflects her recent activity area, while the \(g_p=f(p'\in {\mathbb {N}}_p)\) integrates the POIs from its neighborhood. Thus, the geographical preference score should be higher if p belongs to the u’s recent activity area. \(\square \)

Proposition 2

GARG falls into sequence model category but fuses geographical information.

Proof

The classical sequence model combines general preference and sequential preference. FPMC [41] consists of the inner product of user and item factors (capturing general preference) and the inner product of previous and next item factors (capturing sequential dynamics). NARM [42] only concentrates on sequential preference and utilizes GRU with attention mechanism to capture sequence information. If the coefficient \(\beta \) approximates positive infinity and the adjacent matrix is an identity matrix, GARG degenerates into NARM. If the adjacent matrix is an identity matrix and all the attention is on the last POI, GARG degenerates into FPMC. However, GARG enjoys the benefits of GCN and attention over GRU to fully capture sequence and geographical preference. \(\square \)

5 Evaluation

5.1 Setups

Datasets. We use the three real-world datasets introduced in Sect. 3 to examine the performance of GARG. Following the data preprocessing approach deployed by [6], we filter out the users and items with less than 40 records for Gowalla; for Foursquare, we filter out those with less than 10 records; for Brightkite, we filter out the users with less than 40 records and the items with less than 10 records. The preprocessed datasets are split into training set, validation set and test set with the proportion of 8:1:1 by the check-in timestamps.

As illustrated in Sect. 4.4, the adjacent measurement matrix depends on the threshold \(\sigma \) and \(\epsilon \) to control the scope and degree of influence. For the Gowalla and Brightkite dataset, the threshold \(\sigma \) and \(\epsilon \) are set to be 10 and 0.5, respectively, inspired by the parameter setting in [43]. Since POIs visited by users in Foursquare tend to concentrate in a more narrow area, we set \(\sigma \) to be 5 (Table 1).

Table 1 Statistics of the datasets
Fig. 11
figure 11

Pre@K of the compared methods on Foursquare

Fig. 12
figure 12

Rec@K of the compared methods on Foursquare

Fig. 13
figure 13

mAP@K of the compared methods on Foursquare

Fig. 14
figure 14

Pre@K of the compared methods on Brightkite

Fig. 15
figure 15

Rec@K of the compared methods on Brightkite

Fig. 16
figure 16

mAP@K of the compared methods on Brightkite

Baseline methods As there have been many POI recommending strategies for specific LBS applications, we select several state-of-the-art methods to compare the performance with GARG in our settings. For the recommendation by implicit feedback, we run or reproduce the following eight strategies: (1) UCF+G [21] fuses user preference and geographical influence by combining a user-based collaborative filtering approach and a power-law function of distance to capture check-in probability. (2) MGMPFM [23] combines a probabilistic factor model and a multi-center Gaussian model to learn regions of activity. (3) GeoMF [24] integrates geographical influence by modeling users’ activity regions and pre-defined POIs’ influence areas. (4) RankGeoFM [22] is a ranking-based MF model which uses two latent matrixes to represent user preference and users’ geographical preference, respectively, and includes the geographical influence of neighboring POIs. (5) GeoIE [6] differentiates the impact between each pair of POIs on bi-directions and embeds users and POIs into latent space to calculate similarities between users and POIs. (6) SAE-NAD [8] deploys self-attentive mechanism to encode the session for maximizing the utilization of collaborative information. Regarding the recommendation by sequence information, we examine the performance of: (7) ST-LSTM [12] modifies the LSTM cell by integrating the spatial gate via distance and the temporal gate via time difference between two consecutive POIs. (8) TMCA [34] applies the attention mechanism over the recurrent neural network to learn the session-level representation for capturing users’ mobility patterns. (9) GT-HAN [35, 36] models the geographical relations by leveraging attention mechanism and employs bi-LSTM to capture sequence dependence. (10) NEXT [29] models the influence of the last POI on the next move based on the time interval and adopts DeepWalk mechanism to pretrain the POI embeddings. These baseline methods are fine-tuned on the validation set to ensure a fair comparison, and the performance on the test set will be reported.

Evaluation Criteria We evaluate the performance of the model and the baselines in terms of three kinds of criteria.

  • Recall@K: The recall metric measures in what proportion the recommendation of K POIs covers the ground truths of the POIs that the users would visit.

  • Precision@K: The precision metric indicates the proportion of correct recommended POIs where the users would visit within the K recommendations.

  • mAP@K: Mean average precision (mAP) is the mean of the average precision scores for each recommendation of K POIs. Suppose the recommended POI with rank k is the correct predicted POI with \(k'\)-th highest rank (\(k'\) is set to zero if it is not a correct prediction), the precision score of this POI would be \(\frac{k'}{k}\). Then, the mAP score can be defined as: \({\mathbb {E}} [\frac{1}{K}\sum _{k=1}^{K}\frac{k'}{k}]\).

Specifically, we set K to 1, 5, 10, 20 to measure the performance.

5.2 Performance of GARG

We compare the performance of GARG with the other state-of-the-art methods based on three kinds of criteria on the three datasets.

Geographical correlation captured by GARG. As shown in Figs. 8, 9, 10 , 14, 15 and 16, GARG improves the performance remarkably compared to the baselines on the Gowalla and Brightkite datasets. Specifically, the Gowalla dataset exposes rich check-in histories of users among nearby POIs, where the GCN can accurately capture the geographical correlation between pairs of POIs, which helps GARG perform best on all three metrics. In contrast, the strategies simply based on implicit feedback or sequence information do not work well, as they could not adaptively model the geographical influence. Meanwhile, the users in the Brightkite dataset tend to move back and forth from several regions. The GRU network in GARG is capable of identifying the moving patterns, and thus improves the accuracy of GARG by targeting the POIs from specific regions.

Collaborative Preference Correlation Captured by GARG In addition to geographical correlation, the collaborative preference correlation plays an important role in preference representation. On the Foursquare dataset, GARG performs similarly with UCF+G, while it outperforms the other baselines, as plotted in Figs. 11, 12 and 13. We examine the results and find that the distances between POIs of the Foursquare dataset are far less than the other datasets. Geographical information on the Foursquare dataset seems to have less value, which is consistent with the result in Sect. 5.4. In this scenario, the collaborative preference module of GARG captures the user interests and recommends the POIs that the users are most likely to visit. Thus, we see that some matrix factorization-based strategies (UCF+G, GeoMF) perform quite well compared to their performance on the other two datasets (Figs. 14, 15 and 16).

In all, the comparison over the three datasets confirms the efficiency and accuracy of the proposed GARG.

5.3 Case Study

We sample several cases from the dataset with results provided by GARG for demonstration. Figures 17, 18 and 19 show three typical scenarios labeled with the historical check-ins (divided into recently visited POIs and the previously visited POIs), the true-positive cases, true-negative cases and the false-positive cases. Specifically, the user of Fig. 17   visited POIs in a narrow range all the time and the POIs recommended by GARG are also concentrated on that small area; the user of Fig. 18 has a wide activity area and GARG recommends distributed POIs with high precision; the user of Fig. 19 has obviously changed her activity area, while GARG realizes this scenario benefiting from the GRU with attention mechanism and the POIs recommended by GARG are all around her new activity areas. These examples further illustrate that GARG is able to capture both collaborative information and sequence information successfully and it has the capability to automatically recognize different scenarios.

Fig. 17
figure 17

The prediction of GARG with ground truths over a mobility pattern of narrow activity area

Fig. 18
figure 18

The prediction of GARG with ground truths over a mobility pattern of broad activity area

Fig. 19
figure 19

The prediction of GARG with ground truths over a mobility pattern of changed activity area

5.4 Fine-Grained Analyses of GARG

Efficiency of the Modules in GARG In this section, we further analyze the rationality of GARG structure by examining the contribution of each component in GARG. We compare the performance of GARG by removing the collaborative preference module (denoted as GARG w/o CP), removing the GRU (denoted as GARG w/o GRU) and removing GCN (denoted as GARG w/o GCN), respectively. To ensure the comparability, we adopt the same hyper-parameter setting for three methods. Results are shown in Figs. 20, 21 and 22. It is evident that GARG outperforms the other three architectures, indicating that (1) the collaborative information and sequence information can complement each other, (2) GRU with the attention mechanism adaptively captures the mobility pattern contained in sequence, (3) and GCN component automatically models the geographical influence. Overall, this ablation study (by removing some “feature” of the model to see how that affects performance) confirms the ability of GARG to fully integrate the content-aware information and sequence information.

Fig. 20
figure 20

Efficiency of modules in GARG on Gowalla

Fig. 21
figure 21

Efficiency of modules in GARG on Foursquare

Parameter Sensitivity In this section, we study the influence of the variable h, which is the dimension of the latent embeddings, hidden state of GRU cells and the transformation matrices. In our experiment, h is set to be 32, 64, 128 and 256, respectively. Figures 23, 24 and 25 report the performance of GARG for the different values of h on the three datasets, respectively. The results demonstrate that the performance in all evaluation metrics has similar behavior with the varying value of h. For the Gowalla and Brightkite datasets, the performance increases with the increase of the h at the beginning, then achieves the best performance when \(h = 128\). For the Foursquare dataset, the performance achieves best at \(h = 256\). Overall, we fix the parameter \(h = 128\).

Fig. 22
figure 22

Efficiency of modules in GARG on Brightkite

Fig. 23
figure 23

Influence of latent factor dimensions in GARG on Gowalla

Fig. 24
figure 24

Influence of latent factor dimensions in GARG on Foursquare

Fig. 25
figure 25

Influence of latent factor dimensions in GARG on Brightkite

Impact of Different RNN Cells RNN mechanism is widely adopted to capture transition pattern from the sequence data. There are three common RNN cells, i.e., vanilla RNN, LSTM and GRU. LSTM employs gating mechanism to alleviate the vanishing gradient problem, and GRU can be considered as a variant of LSTM. In order to study the effect of these three RNN cells, we replace the GRU cell used in GARG with vanilla RNN (denoted as GARG w/ RNN) and LSTM (denoted as GARG w/ LSTM), respectively, and keep all other settings the equal. Figures 26, 27 and 28 show that the performance of the model with GRU cells is usually better than that of vanilla RNN and LSTM cells.

Fig. 26
figure 26

Impact of RNN cells on Gowalla

Fig. 27
figure 27

Impact of RNN cells on Foursquare

Fig. 28
figure 28

Impact of RNN cells on Brightkite

Negative Sampling Strategy The negative sampling techniques have impact on the modeling performance. We investigate three negative sampling strategies, i.e., random sampling, importance sampling and adversarial sampling. Random sampling assigns equal weights to the POIs except the positive sample to be the negative sample corresponding to the positive one. Importance sampling uses the frequency of the POIs being visited on the training set as the importance of the POIs. The negative samples in the adversarial sampling [44] are generated by the generator. Figures 29, 30 and 31 show the performance of GARG with different negative sampling methods. It can be seen that the importance sampling gets the poorest performance, because the embeddings of some unpopular POIs cannot be sufficiently learned. The random sampling outperforms than the adversarial sampling on the Gowalla and Brightkite dataset while the adversarial sampling perform better on the Foursquare dataset. Overall, the random sampling method is comparable. We believe the reason is that the random sampling assigns equal probability for each embedding to be trained, and thereby it is efficient in our setting.

Fig. 29
figure 29

Impact of negative sampling strategy on Gowalla

Fig. 30
figure 30

Impact of negative sampling strategy on Foursquare

Fig. 31
figure 31

Impact of negative sampling strategy on Brightkite

Fig. 32
figure 32

The average attention signals on three datasets

Attention Visualization We visualize the mean attention signals of the three datasets on the 20 recently visited POIs, as illustrated in Fig. 32. The higher the attention signal, the more contribution of this POI to the sequence representation. The result is consistent with our expectation that the latest visited POI is the most relevant POI to future predictions. Meanwhile, the non-zero attention on the previous POIs indicates the necessity of attention mechanism for modeling sequence information. Besides, there exist obvious differences among the three datasets, which supports our statement in Sect. 5.2. For the Foursquare dataset, the attention signals tend to be distributed more evenly, and the reason might be that the POIs are distributed in a narrow area. For the Brightkite dataset, the back-and-forth mobility pattern might explain the larger average attention signals on the last three POIs compared to the attention signals on Gowalla.

6 Conclusion

Mobile LBS applications have facilitated our daily life, while the data sparsity problem and the lack of POI labels significantly degrade the performance of the common POI recommender systems. In this paper, we propose the GARG, which combines the collaborative, sequential and content-aware information to provide accurate POI recommendation for anonymous mobile LBS applications. Specifically, GARG employs a collaborative preference module to learn the user embeddings via reconstructing the historical implicit feedback check-ins. Besides, GARG applies a Geographical Preference module by a GRU network to capture the sequence information and a GCN which adapts to learning the content-aware correlation between POIs. Evaluations over three real-world check-in datasets with various mobility patterns demonstrate the improvement of GARG compared to the state-of-the-art POI recommender systems. GARG can be flexibly embedded into existing mobile applications owing to its outstanding generality and automatic learning ability.