Social Relationship Link Inference Based on Graph Convolutional Networks

In the study of social relationship inference, social relationship link inference aims to infer whether there is a social relationship between users. Most previous works applied the unsupervised graph random walk sampling, which had sampling bias and lost much information. In this paper, a Social Relationship Link Inference Based on Graph Convolutional Networks (SLiGCN) is proposed, which learns the spatiotemporal information of check-ins and the impact of related users’ trajectories. It firstly employs the end-to-end supervised learning and applies a recurrent neural network to extract spatiotemporal sequence features from trajectories. Then the graph convolutional network fuses the features of neighboring nodes and employs the fully connected network to infer social relationship links. Finally, it is evaluated with AUC on three real-world datasets. The experiment results show that, compared with baseline models, it not only avoids hand-crafted feature construction that requires much prior knowledge, but also achieves 10% improvement on average.


Introduction
In recent years, online social network (OSN), e.g. Facebook, WeChat, Twitter and Foursquare, has rapidly developed. On these platforms, users can check in at scenic spots, share their current location, follow and chat with other users, thus generating a large amount of check-in and social relationship data. These data can assist in the comprehension of human behaviors and habits. Cho et al. [1] showed that social relationships are related to travel trajectories. In particular, 10 to 30% of travel is influenced by social relationships. People who have similar weekend trajectories are more likely to be close friends or family members, but those who have similar workday trajectories are more likely to be colleagues. This indicates that social relationships can be deduced from check-in data. This research direction is known as social relationship link inference, which provides a novel approach to solving variety of real-world problems, such as friend recommendation, transportation scheduling, criminal organization discovery, etc. It is helpful in providing high-quality Internet services, optimizing social resource scheduling, and combating criminal organizations.
The study on social relationship inference contains three main directions: (1) inferring the existence of social relationship between users (social relationship link inference); (2) inferring the type of social relationship (social relationship type inference); (3) inferring whether unidirectional social relationship will develop into bidirectional social relationship (social relationship interaction prediction). Current research on social relationship inference mainly focuses on the direction of social relationship link inference. Its related methods can be divided into two categories: (1) inference via hand-crafted features. Scellato et al. [2] constructed features, e.g. the number of common arrivals, the number of mutual friends, and the distance 1 3 from home, then used random forests to perform inference. This kind of method relies on a large amount of prior knowledge to construct features, and the quality of these features directly affects the performance of the model; (2) inference via embeddings obtained from graph sampling. Perozzi et al. [3] applied random walk sampling on the graph, fed embeddings into the skip-gram, and trained the model using hierarchical softmax. This kind of method relies on the sampling strategy and suffers from sampling bias.
This paper explores the check-in trajectories in a city from two aspects: friends (with social relationships) and strangers (without social relationships) on three real-world datasets, i.e. Brightkite, Gowalla and Foursquare. Figure 1 illustrates that the probability that friends have common arrivals is much higher than that of strangers, and the probability of co-occurrence of friends is much higher than that of strangers. This conclusion also holds true for one-degree related friends (users with at least one mutual friend). This is because friends will visit a location together or successively and this behavior happens among their respective friends as well.
This result verifies the conclusion that users' social relationship links can be deduced from the check-in data, and also demonstrates the importance of considering the checkin trajectories of users' friends in link inference.
Previous work [2,3] suffers from sampling bias and the complexity of hand-craft feature construction. Without spatiotemporal modeling and sequence modeling of trajectories, many models just employed location information from check-in data or regarded trajectories as discrete points during feature extraction, it caused significant information loss. Deep learning's capacity for feature mining and graph convolutional networks' capacity for feature fusing can solve these problems.
Deep learning [4], as a popular research direction in machine learning, is able to automatically learn the intrinsic patterns and representations from a large amount of data. Thanks to the popularization of electronic devices and the rapid development of hardware in recent years, massive amounts of data are continuously generated daily. Meanwhile, the improvement of computing power can support more efficient data operations, which enables the advantages of deep learning to be brought into play and quickly surpasses other machine learning methods in various tasks.
Graph convolutional networks [5] have been proposed in recent years, which can apply convolution operations on graphs of arbitrary structure. It combines the ability of deep learning to efficiently capture hidden patterns and the ability of convolutional neural networks to discover feature hierarchies. Thus it can learn the local patterns of each node at different scales from graph structure and node features in non-Euclidean spaces. Combining the node feature with its neighbors' features, graph convolutional networks can generate a new embedding that not only represents this node, but also incorporates its neighbors' features and the structural information of the graph.
In view of the aforementioned conclusion and the advantages of graph convolutional networks, this paper proposes SLiGCN, a social relationship link inference model based on graph convolutional networks. Specifically, it includes: (1) the spatiotemporal sequence features of check-in trajectory data are extracted by using recurrent neural networks as the initial node features of graph convolutional networks; (2) social relationships and trajectories are used to construct edges. Social edges are created for users with social relationships, and co-occurrence edges are added for users who have spatiotemporal co-occurrences; (3) neighbors' features are fused to obtain user embeddings in new representation space via graph convolutional networks; (4) the fully connected network and softmax are used to perform binary classification on embeddings of user pair to determine whether there is a social relationship link.
Our method differs from the previous unsupervised learning method of graph random sampling. An end-to-end supervised learning model is proposed via spatiotemporal sequence data mining, which can avoid sampling bias, and fully exploit the correlation between labels and features. In addition, the graph convolutional network combines the check-in trajectories of related users are fused to enhance inference. With the experiments on three real-world datasets, our model not only avoids hand-crafted feature construction that requires a lot of prior knowledge, but also outperforms baseline models.

Social Relationship Link Inference
Due to the development of the Internet and OSNs, research on social relationship link inference has gradually emerged. Eagle et al. [6] first found that people's travel trajectories are influenced by social relationships by analyzing cell phone mobile data. Cho et al. [1] further found that people's travels are characterized by cyclical and social behaviors. Social relationships can explain 10 to 30% of travel behavior, with a larger percentage of long-distance travel. Scellato et al. [2] used supervised learning to infer missing edges by constructing features such as the number of common arrivals, the number of mutual friends, and the distance from home. Liben-Nowell et al. [7] investigated the influence of several node proximity metrics on link inference. Based on this, Backstrom et al. [8] employed graph random sampling to learn an objective function that assigns strengths to edges, making random sampling more likely to visit nodes that will establish new links in the future, and performed social relationship link inference through edge strengths. Perozzi et al. [3] treated sampled edge paths as sentences and used deep unsupervised learning to obtain hidden space representations of nodes for downstream prediction tasks. Wu et al. [9] applied the skip-gram method to extract the user's trajectory feature using supervised learning, and then mixed the neighbors' features by unsupervised learning graph convolutional networks, and finally performed social relationship link inference by the obtained user features.

Graph Convolutional Networks
The graph convolutional network is an improvement of the traditional convolutional neural network, extending the application scope only in Euclidean spaces such as images to non-Euclidean spaces of arbitrary intent. The graph convolutional network is divided into two main branches: spectral-based and spatial-based. Bruna et al. [5] applied the convolution operation on the graph in the spectral domain through the Laplace spectrum, thus leading the study of convolution in the spectral domain. Thereafter, Defferrard et al. [10] optimized the process of solving the eigenmatrices using Chebyshev polynomial approximations as filters. Spatial-based is more intuitive, and research started earlier than spectral-based. As the first significant work in spatial-based domain, Micheli et al. [11] solved the problem of interdependence between nodes through a non-recursive hierarchical architecture based on message passing mechanism in RecGNNs. Atwood et al. [12] considered graph convolution as a diffusion process, assuming that information propagation between nodes follows a certain propagation probability and stabilizes after several rounds of propagation, thus proposed diffusion convolution neural networks.
The above methods have the following problems: (1) Using hand-crafted features, which requires a large amount of a prior knowledge, and the model performance depends on feature construction; (2) Using graph random sampling, which has sampling bias; (3) Using unsupervised learning, models are inefficient in discovering hidden patterns; (4) Not using the spatiotemporal information of check-ins and not considering the trajectories of related users.
Therefore, in our model, the spatiotemporal information of check-ins is extracted, related users' features are fused with the feature propagation capability of graph convolutional networks. With end-to-end supervised learning, it has better generalization ability and avoids manual feature construction.

Problem Definition
We use U to denote the user set, and each element u ∈ U represents a user in OSN. L denotes the location set and each element l ∈ L represents a geographic location. R denotes the social relationship set and each element r = u i , u j represents the social relationships between user u i and user u j . R ′ denotes the co-occurrence relation set and each element r � = u i , u j represents user u i and u j have co-occurrences. Definition 3: (Relation graph) The social relationships of users in OSN are organized as an undirected graph G = (V, E) , which is called the social relationship graph, where the node set V is the user set U and the edge set E is the combination of social relations R and co-occurrence relations R ′ . Problem 1: (Social relationship link inference) Given a check-in set C , a known social relationship set R and any two users u i and u j . The goal is to determine whether there is a social relationship between two users.

Methodology
The architecture of SLiGCN is shown in Fig. 2.
The SLiGCN model takes the check-in set C , the known social relationship set R and the user pairs (u i , u j ) to be inferred as input, and outputs whether there are social relationships between user pairs. It contains three parts: the check-in trajectory feature extraction module, the graph convolutional network module and the social relationship link inference module. The spatiotemporal trajectory features obtained from the check-in trajectory feature extraction module is fused with the neighbors' features by the graph convolutional networks module, and finally the social relationship link inference module infers the existence of social relationship for user pairs. This end-to-end model is able to treat known social relations as labels, back-propagate to update the parameters of each module, learn high-dimensional feature representation, and improve the model accuracy.

Check-in Trajectory Feature Extraction Module
In order to express the historical common arrival locations and spatiotemporal co-occurrences in check-in trajectories, relying solely on spatial information or disregarding sequence information would result in the loss of information.
The check-in trajectory feature extraction module extracts both spatiotemporal information and sequence information of check-in trajectories. It can obtain spatiotemporal sequence features representation.
For user u , locations in check-in trajectory S u are mapped to a low-dimensional space. Specifically, the location l u of user check-in, after deduplication, is mapped to a continuous integer space, and is subsequently transformed into a one-hot vector and sent to the embedding layer to finally obtain the location embedding l * u . In order to be able to extract spatiotemporal information and express co-occurrences, this module adds a temporal dimension. Specifically, for the time t in check-ins, the time zone effect is first eliminated and transformed into local time, and subsequently the number of hours in the week is extracted as time embedding t * . After these two steps, we obtain location embedding l * and t * to form check-in trajectory S * .
The spatiotemporal information embedding is obtained and then fed into a recurrent neural network to mine the spatiotemporal sequence features. Two common recurrent neural networks are considered, i.e. LSTM [13] and GRU [14]. LSTM consists of an input gate, an output gate and a forget gate, passes information and updates the state through the cell state and the hidden state. GRU is a lightweight LSTM that replaces the forget gate and the input gate with an update gate. LSTM is designed to solve the long-term dependence problem of sequences, however it incurs additional computing overhead. Our experiments have verified the performance of GRU and LSTM is basically the same in our model, due to the longer training time required by LSTM, we choose GRU as our recurrent neural network model.
Equations (1), (2), (3) represent the update gate, reset gate, and the candidate hidden state, respectively. Given the location embedding and time embedding [l * k , t * k ] , the hidden state of previous step h t−1 , the hidden state h t is obtained as shown in Eq. (4). where z t and r t are update gate and reset gate, respectively, is the sigmoid function, ⊙ is the Hadamard product, W z , W r , W are the parameters that can be learned in GRU. The final hidden state h (u) is obtained after GRU is applied to check-in trajectory S * .

Graph Convolutional Network Module
In the check-in trajectory feature extraction module, the hidden state h (u) is obtained for each user with the spatiotemporal sequence features extracted by GRU from check-in trajectories. This hidden state is extracted from check-in trajectories, which can not only characterize the spatiotemporal information, but also incorporates the sequence information. Hence, the acquired hidden state is employed as the initial node feature of graph convolutional network.
As mentioned earlier, users' travel trajectories are influenced by social relationships. For example, friends will be invited or recommended to visit the same location. Thus, while determining whether there is a social relationship between two users, it is necessary to look into not only the travel trajectories of both users, but also those of their respective friends. Splicing in feature fusion leads to an oversized feature space and different feature lengths, which makes it difficult to make subsequent inferences. Simply performing feature addition or subtraction also loses information. The graph convolutional neural network is able to perform convolutional operations on graphs and fuse the features of neighbor nodes to get higher-level feature representation. Hence, we choose graph convolutional network to do feature fusion.
During the construction of the relation graph G , if only the social edges of known social relationships R are used, feature fusion cannot be applied for sparse users. Therefore, the co-occurrence edges are added to enhance the fusion of neighbor nodes. Specifically, if users have checked in the same location within a time difference , they are considered to have co-occurrence, and the corresponding co-occurrence edges are added to the relation graph. The relation graph used by the graph convolutional network is formed by the social edges and co-occurrence edges together.
Features are extracted using layer stacking like CNN [15], with low-level layers capturing simple features and high-level layers capturing complex features, and larger size kernel means larger receptive field. GCN is similar to CNN. It uses layer stacking to improve the fitting ability and performance. Layers are connected by activation functions. The slight difference is that when using a one-layer convolutional network, only the features of the first-order neighbors are captured, while when using a two-layer convolutional network, the features of the higher-order neighbors are also captured. GCN not only improves the fitting ability, but also enlarges the receptive field when stacking more layers.
The propagation rule of GCN is shown in Eq. (5).
where Ã = A + I N is the adjacency matrix with self-connected edge, I N is identity matrix, D ii = ∑ jÃ ij is degree matrix, W (k) is the independently learnable parameters of k th layer, is the activation function,H (k) is the feature matrix of k th layer, and H (0) is the initial feature matrix.
The spatiotemporal sequence features of users obtained by check-in trajectory feature extraction module are combined into the initial feature matrix The social edges and co-occurrence edges are combined into adjacency matrix Ã , and then H is obtained through layer propagation in Eq. (5).

Social Relationship Link Inference Module
After feature fusion in graph convolutional network module, social relationship link is inferenced with user embedding in this module. Previous work applied vector inner product with sigmoid function to infer. It depends on the quality of the embedding extraction, and the sigmoid function has a large gradient when the input is close to 0. A little perturbation can affect the model performance. To avoid this, we use a fully connected layer with stronger robustness as well as fitting ability.
For the user u i and u j , the embeddings obtained from the graph convolutional network are H i and H j , respectively. The inference method is shown in Eq. (6). Two embeddings are fed into the fully connected layer by vector subtraction, preserving the dimensional differences. Inference result ŷ ij is obtained by the softmax function. Finally, back-propagation with the cross-entropy loss shown in Eq. (7) is used to update the parameters.

Datasets
The proposed model is evaluated in following three realworld datasets.
Gowalla: Gowalla is a location-based social network service that allows users to unlock spots collections or create spots landmarks by checking in through an application or website. Accounts can connect to the social platforms like Twitter or Facebook. The dataset was collected by Cho et al. Brightkite: Brightkite is a location-based social network site that allows users to check in at their current location and find who is nearby or who has been here before. The dataset was collected by Cho et al. [1] through a public API with 58,228 users, 214,078 social relationships and 4,491,143 check-ins between April 2008 and October 2010. The original social network is a directed graph. The collected dataset was converted to an undirected graph by creating bidirectional edge for users with social relationships.
Foursquare: Foursquare is a search and discovery application. The application gives personalized places recommendation near the user's current location based on user's previous browsing history and check-in history. Foursquare collects location information from not only check-ins, but also online location-sharing system based on GPS, cellular and WIFI. The dataset was collected by Yang et al. [16] from 114,324 users, 607,333 social connections and 22,809,624 check-ins between April 2012 and January 2014.
The above datasets have worldwide users, and check-ins in various countries. However, since most of the social relationships exist within the same city, there is no need to do social relationship link inference on whole datasets. For this reason, we filter out the users whose check-ins fall in a city with high frequency. Specifically, the boundaries of the city are first obtained to mark the check-ins that fall within the city. Then users whose number of marked check-ins accounts (7) L = − y ij logŷ ij + 1 − y ij log 1 −ŷ ij .
for less than 10% of the total number of user check-ins are eliminated. It is considered that these users' resident places are not in this city and are not included for model training.
Besides, to ensure the reliability of the results, users with less than one friend or less than 10 check-ins in the dataset are removed. For the eliminated users, their check-in data are also not included. This data subset generation method has been frequently used in previous works [16,17]. The statistic of the three preprocessed datasets is shown in Table 1. We randomly split the social relationships into training set, validation set and test set, each accounting for 80, 10 and 10% to validate our model. In addition, since the number of positive records (friends) in social relationships is much less than negative records (strangers), if all negative records are used for model training, it will lead to skewed data and consume a lot of training time. Therefore, we randomly sample negative records to keep the sampling size consistent with the positive records.

Evaluation Metrics
The proposed model learns vector representations by checkin trajectories and graph convolutional network. The user vector pairs are fed into the social relationship link inference module to obtain a score for the likelihood of the existence of social relationships. Thus AUC is used to evaluate the proposed model's performance.

Experiment Setup
The length of location embedding is 64, and the hidden units of GRU is 128. The graph convolution network module uses two layers with 128 and 256 hidden units, the inter-layer activation function uses the ReLU, and co-occurrence time difference is set for 1 h. The proposed model is trained using the Adam optimizer with the learning rate set to 1e-3. The maximum training rounds is 100.

Baselines
Deepwalk [3]: Node paths are first generated by random sampling on graph and subsequently fed into the skip-gram model. Node embeddings are obtained by maximizing the probability distribution through stacking softmax and using similarity comparison for inference. Walk2friends [18]: Users and locations are organized as a user-location bipartite graph from check-in trajectories, then skip-gram based graph embedding method with graph random sampling is applied to obtain user embeddings. Finally user embeddings are used for inference with similarity comparison.
Metapath2vec [19]: It learns node embeddings on heterogeneous graphs through meta-path based random sampling as well as skip-gram, and uses embedding similarity for inference at last.
Heter-GCN [9]: User-location heterogeneous graph is built from user check-in trajectories with manually added user-user, user-location, and location-location edges represent frequent co-occurrence user pairs, frequent visited locations of that user, popular subsequently visit locations respectively. The training process contains two steps. The trajectory is first segmented by a skip-gram-like method to extract embeddings using supervised learning. Then user embeddings are updated through unsupervised graph convolutional network and inference with similarity comparison.
SLiGCN-1: It removes the graph convolutional network module from SLiGCN to verify the effect of graph convolutional network module. Check-in trajectories features are extracted and then directly fed into the social relationship link inference module to infer whether they have social relationship.
SLiGCN-2: GraphSAGE [20] is applied in SLiGCN-2 instead of graph convolution to compare the performance with different feature fusion methods between graph convolution and graph sampling. GraphSAGE is a framework for inductive representation learning on large graphs, it samples node features in the local neighborhood of each node and then learns how to aggregate the information as it's passed through the GNN layers.
SLiGCN-3: Graph attention network [21] is a combination of a graph neural network and attention mechanism. Attention aids in extracting only useful information from huge amounts of data. Graph attention network is used in SLiGCN-3 to replace graph convolution network for feature fusion.

Results and Discussion
We performed serval experiments on three datasets for each model and took the optimal one as the performance of that model on that dataset, results are shown in Table 2. It can be found that SLiGCN outperforms other baseline models on all three datasets due to the use of the spatiotemporal features and graph convolutional network, which can fuse the trajectory features of relational users.
DeepWalk, Walk2friends and Metapath2vec all employ feature similarity for inference after graph random sampling, with only sampling strategies and training methods different. Thus, the difference inperformance between the three is not significant. Metapath2vec outperforms the other two models on the Brightkite and Foursquare datasets, while it performs weaker than Walk2friends on the Gowalla dataset. This is because the proportion of two types of heterogeneous nodes, i.e. user nodes and check-in nodes, in the Gowalla dataset is much lower than the other two datasets, which causes Meta-path2vec cannot take advantage of the random sampling on heterogeneous graph.
Heter-GCN outperforms DeepWalk, Walk2friends and Metapath2vec on all three datasets because it employs graph convolutional network to fuse the features of user nodes and location nodes on the heterogeneous graph. However, it ignores the temporal information of check-in trajectories and does not extract spatiotemporal features. In addition, it applies a two-step training process, i.e. using supervised learning to extract features and unsupervised learning to do feature fusion, resulting in a poor fitting ability. Therefore Heter-GCN is less effective than SLiGCN-1 as well as SLiGCN.
From the result comparison between SLiGCN-1 and Heter-GCN, it is found that the spatiotemporal features of check-in trajectories can significantly enhance the performance of social relationship link inference.
SLiGCN-2 outperforms SLiGCN-1 due to the feature fusion with GraphSAGE, but sampling bias causes less effect than SLiGCN. SLiGCN-3 uses graph attention network for feature fusion instead of graph convolution network, but the performance is not satisfactory. This is because the attention mechanism provides stronger fitting ability compared to SLiGCN, and the check-in trajectory features and graph structures of users are quite diverse, leading to overfitting.
Finally, the graph convolutional network is added to do feature fusion on the homogeneous graph. SLiGCN outperforms SLiGCN-1, SLiGCN-2 and SLiGCN-3 on all three datasets. It is worth to mention that since the Brightkite

More Discussion
Effect of : To avoid users with sparse social edges from failing feature fusion, we add co-occurrence edges to user pairs that co-occur in a certain time difference . We performed several experiments on three datasets with 1, 2, 5, 8, 12 and 24 h' time difference and took the optimal value of each as experimental result. As shown in Fig. 3, on the Gowalla and Foursquare datasets, the model performs best at 1 h, deteriorates at 2-8 h, and then stabilizes. For the Brightkite dataset, the model performs best at 2 h, again deteriorates at 2-8 h, and then stabilizes. In general, co-occurrence with shorter time differences is more likely to happen between friends, whereas those with larger time differences may accidentally happen between strangers. Then we choose the best average effect of 1 h as the co-occurrence time difference .

Conclusion
To address the problems and limitations in previous works on social relationship link inference, this paper developed a social relationship link inference model based on graph convolutional networks, namely, SLiGCN, which mines spatiotemporal features with check-in trajectory data, then fuses neighbor node features through graph convolutional network, and finally feeds into the social relationship link inference module composed of fully connected network to obtain inference results. The proposed model extracted high-dimensional spatiotemporal features directly from the original time-series data and used known social relationships for supervised back-propagation training, which can accurately infer the existence of social relationships among users. It outperformed other comparable models while avoiding hand-crafted features.