Abstract
The available spatial data are rapidly growing and also diversifying. One may obtain in large quantities information such as annotated point/place of interest (POIs), checkin comments on those POIs, geotagged microblog comments, and demarked regions of interest (ROI). All sources interplay with each other, and together build a more complete picture of the spatial and social dynamics at play in a region. However, building a single fused representation of these data entries has been mainly rudimentary, such as allowing spatial joins. In this paper, we extend the concept of semantic embedding for POIs (points of interests) and devise the first semantic embedding of ROIs, and in particular ones that captures both its spatial and its semantic components. To accomplish this, we develop a multipart network model capturing the relationships between the diverse components, and through randomwalkbased approaches, use this to embed the ROIs. We demonstrate the effectiveness of this embedding at simultaneously capturing both the spatial and semantic relationships between ROIs through extensive experiments. Applications like popularity region prediction demonstrate the benefit of using ROI embedding as features in comparison with baselines.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the last decade, locationbased social networks (LBSNs) like Facebook, Instagram, Foursquare, Twitter have attracted billions of users, where people can check in at point of interests (POIs) and share life experience in the physical world via mobile device promptly. It is crucial for such service providers to leverage the data they collected to make personalized recommendations that help their users to explore new places and facilitate targeting advertisement for generating revenue [3, 8, 26]. Recent literature suggests that distributed representation of point of interest (POI) or embedding can further improve the results [18, 43, 47, 56]. It is worth to note that point of interest (POI) is a single point/place on the map of Earth (e.g., New York Stock Exchange, New York).
Recently, an increasing interest on studying region of interest (ROI) [45] is rising [39], where the social dynamics occurring at POIs located in a particular region is considered as a whole. By picturing the semantic and spatial features of different regions intertwined with people’s activities can yield important information such as functional behavior, distinctive features, and social effects, which can be further utilized in urban planning and regionlevel recommendation.
An example of the application is shown in Fig. 1; ROIs 02000000 (blue), 08000005 (green) are semantically as well as spatially correlated with ROI 09000000 (yellow) from Manhattan, New York City. Semantic category information of ROI 09000000 is also presented in Fig. 1 where Outdoors and Recreation, College and Education, Nightlife and Pubs, Travel and Transport, Professional Services are presented as top five major categories based on cosine similarity metric. A careful observation in the map will reveal that ROI 09000000 consists of Statue of Liberty, Ellis Island and Battery Park and World Trade Center which has been visited by more than 3.5 million visitors in an average for the last 5 years [30], is a major reason for Outdoors and Recreation as the topmost category. New York University, The King’s College and Pace University, etc., are also demarked within the region that follows the second top spot as College and Education. The next three top categories are intuitive to estimate since Lower Manhattan is the hub of some popular old pubs, financial offices, hotels and wellconnected subway, transport and ferry system in the city. Though ROI 08000005 is geospatially distant from ROI 09000000, but they are semantically similar in terms of Outdoor and Recreation, College and Education and Travel and Transport because of Central Park, New York University Midtown Campus, Pace University, Grand Central Terminal and major subway connections respectively.
An effective approach to capture both semantic and spatial feature at the same time is to embed them in a latent semantic space as elaborated by [43, 47, 48] for POIs. Hence, embedding over ROI with semantic features would also be an effective method for ROI analysis. Nevertheless, existing solutions only consider the semantic embedding of POI but not ROI. A naive extension for extending semantic embedding for POI to obtain semantic embedding of ROI is to simply aggregate over POI features for all POIs inside a ROI and treat that ROI then simply as an aggregated POI. However, this approach is not effective in capturing spatial and semantic information simultaneously due to the loss of interesting correlations between spatial and semantic information in the process of simple aggregation, as also verified by our experiments. We deduced ROI embedding problem into a tripartite graph embedding problem with entities (a) ROIs, (b) POIs, (c) Words, whose embedding goal is to minimize the probability distribution difference between embedding entities in latent space and the information graph network based on edge connections. The ROI embedding model facilitates the online analysis and discovery of the (dis)similarities between any pair of ROIs from the perspective of human understanding.
To further add to our motivation, and answer why ROI embedding is needed, we need to look at the advantages of using embedding over raw information or semantic keyword based search. Firstly, computation efficient embeddings are generic and aggregate latent features that can easily be integrated into downstream tasks. Secondly, to comply with data retention policies and maintain security standards, it is essential to limit raw information access and step toward a generic and lossy embedding. Thirdly, semantic ROI embeddings grant measurable techniques to attribute a region and can account for its change over time.
Applicationwise, incorporating ROI embedding as a feature for ad services can have a significant impact, as localized crowd engagement/activity in neighborhoods can promote economic growth. ROI feature is another step toward improvement of localized search results. Another far reaching application of ROI using features is vacation home rentals recommendations based on user’s neighborhood preferences. Semantic embedding of ROIs also enables users to filter with scores on each categories like Travel and Transport, Shops and Services, Arts and Entertainment, Schools or Nightlife for finding listings with neighborhood information.
The main set of challenges of ROI semantic embedding comparing against POI semantic embedding lies in:

1.
Geographic influence: Recent studies on POI embedding can effectively classify the POIs categories and use them as features for prediction and recommendation applications. However, evaluating the influence of POIs on its neighborhood region is challenging and not yet been addressed in the literature.

2.
Capturing social effect: Social responses from microblog sites are highly dynamic and often captures popularity information of places in any region. Discovering any spatial features from social behavior is complex and involves significant challenges.

3.
Semantic challenges: Leveraging the textual information associated with places and regions to obtain semantic features is a nontrivial task. We modeled a tripartite graph network embedding approach to learn ROI embedding.

4.
Data challenges: It is difficult to get a large and open dataset of POIs with textual information from locationbased social networks. Currently available public datasets are either geographically sparse or not suitable for our problem statement. We resort to scraping and crawling for creating appropriate datasets for our investigation.
We summarize the contributions of this paper.

We formulate the region of interest (ROI) semantic embedding that simultaneously embeds into semantic space and spatial space. That is, this embeds ROIs nearby others in latent space, which are both nearby semantically in terms of the dominant places of interest as well as the social discussion within those regions, and nearby geospatially, in that way they are nearby other neighboring and overlapping ROIs.

We propose a tripartite network embedding (TNE) to learn a lowdimensional representation of ROIs. For property preserving embedding, TNE introduces, (a) Mergeable Indirect Graphs: TNE creates transitive relation preserving more informative homogeneous graphs and then proposes a method for compatibility testing to merge multiple homogeneous graphs. (b) Communityaware Random Walks: TNE alleviates moderately connected community problem in graphs. (c) Heterogeneous Negative Sampling: TNE proposes noise distributions on heterogeneous graphs which enhances learnability. TNE is easily extendable to multipartite network embedding problems.

We introduce a semantic category annotation for ROIs to identify the feature similarities of ROI with defined categories for semantic understanding. It also helps us to evaluate ROI embeddings in our experiments.

We present extensive experiments with realworld datasets to show qualitative advantage of ROI embedding with TNE. We compared TNE with stateoftheart baselines to justify our embedding process through spatial and semantic facets.
The rest of the paper is organized as follows: Section 2 presents our problem formulation with baseline approaches inspired by stateoftheart literature. In Sect. 3, we present our model TNE: tripartite network embedding, followed by experiments in Sect. 4 and related works in Sect. 5, and Sect. 6 concludes the paper.
2 Preliminaries
This section introduces problem formulation with some necessary definitions and notations used in the paper. After that, we present problem statements on semantic ROI embedding formally and describe our information graph network. Next, we enlist a few baseline approaches to compare with our tripartite network embedding model, TNE.
2.1 Problem formulation
Assume we have three sets of data: points of interest (POIs), regions of interest (ROIs), and geotagged documents. We define each next.
A region of interest (ROI) dubbed as r is an area in the map of Earth demarked by a geometry of circle, rectangle or polygon, e.g., ROI 09000000 from Fig. 1. An ROI r=(id, geofeatures, name, properties) is a tuple of identifier, polygonal geofeatures, name and optional properties like state, country, respectively. ROIs are technically stored as GeoJSON [17]. We represents a set of ROIs as \(R =\{r_1,r_2,\ldots ,r_{R}\}\).
A point of interest (POI) dubbed as p is defined as a specific point location in the map of Earth, e.g., Empire State Building, New York. A POI p=(id, coord, name, properties) is a tuple of identifier, latitude–longitude geocoordinate, name, and optional properties like keywords, description, address, and category, respectively. It is also stored as GeoJSON object. A set of POIs is represented as \(P =\{p_1,p_2,\ldots ,p_{P}\}\).
A geotagged document dubbed as d is a geolocationassociated textual record either by origin or reference, e.g., checkin comments, reviews, microblogs, etc. A geotagged document d = (id, text, coord, properties) is a tuple of identifier, text, a latitude–longitude geocoordinate and optional properties like timestamp, user information. We are mainly interested in two types of geotagged documents (a) microblogs; (b) social reviews. Microblog documents associated with a ROI r are denoted as \(D_r\), and social review documents associated with a POI p are denoted as \(D_p\). Documents are associated with POIs and ROIs based on geotagged locations. We define all geotagged documents as \(D = D_R \cup D_P\), where \(D_P\) and \(D_R\) are \(D_P = \bigcup _{p\in P} D_p;~~ D_R = \bigcup _{r\in R} D_r\). Words from D form vocabulary set \(W = \{w:w \in d.text, d \in D \}\).
We capture relations among multiple entities (i.e., POI, ROI, and Words) through the information graph described in Sect. 2.3. In Table 1, we summarize all the notations used in this paper.
2.2 Problem statements
Problem 1
(Semantic Embedding of ROI) Given a set of ROIs R, a set of POIs P, an associated set of geotagged documents D and an embedding dimension n, the goal of semantics embedding of ROI is to embed each ROI \(r \in R\) as a vector \({\vec {{\varvec{r}}}} \in {\mathbb {R}}^n\), such that the cosine distance of \(\vec {{\varvec{r_i}}}\) and \(\vec {{\varvec{r_j}}}\) captures the similarity of \(r_i\) and \(r_j\) in both spatial and semantic aspects.
The objective of ROI semantic embedding is capturing geographic information and the semantic perspective from the crowd about the region. If any ROI stands out in any semantic features then it must be captured via embedding, such as recreational activities, office and services, residential region or any combination of activities. We introduce an application of ROI embedding as Problem 2: Semantic Category Annotation for evaluation of ROI embedding.
Problem 2
(Semantic Category Annotation of ROI) Given a semantic embedding \({\vec {{\varvec{r}}}}\) for ROI r and a set of categories \(C=\{c_1,\ldots , c_{C}\}\), where a category \(c_i=\{w_1,w_2,\ldots \}\) is represented with set of words, the goal is to annotate the ROI r with semantic categories \( \varvec{Sem_{r}} = \{c_i:score_i  \forall c_i \in {\mathcal {C}}\}\) with corresponding similarity scores.
The aim of Problem 2 is to semantically annotate any ROI r from the generated ROI semantic embedding \({\vec {{\varvec{r}}}}\). As we know, word representation in semantic space is capable of capturing its meaning via context or synonyms in close proximity space. Firstly, we propose a systematic approach to define semantic category which adheres to the categories defined in Table 2. In our model, we describe a category c with a set of words \(\{w_1,w_2,\ldots ,w_k\}\) that captures meaningful information about that category. For example, the category Travel and Transport is described with words travel, trip, station, train, ferry, car, airport, pier etc. We perform a normalized average of these word vectors (each word is represented by a vector via an word embedding process, e.g., Word2Vec embedding) to represent the vector for the semantic category which we dub as semantic category vector \(\vec {{\varvec{c}}}\). The cosine similarity score of a semantic category vector \(\vec {{\varvec{c}}}\) with an ROI \({\vec {{\varvec{r}}}}\), i.e., normalized dot product \(\langle {\vec {{\varvec{r}}}},\vec {{\varvec{c}}}\rangle \), determines the closeness of ROI with respective semantic category. The goal of this study is to find how well we can annotate an ROI with sentiment categories \({\mathcal {C}}=\{\vec {{\varvec{c_1}}},\ldots ,\vec {{\varvec{c_9}}}\}\) and whether it adheres to realworld scenarios. An example of ROI semantic category annotation is given in bottom corner of Fig. 1.
2.3 Information graph network
We define an information graph network \({{\,\mathrm{\mathcal {G}}\,}}=(G_{rp},G_{rw},\) \(G_{pw}, G_r,G_w)\), which is a combination of graphs with POI, ROI, and Word entities to capture spatial and semantic information, illustrated in Fig. 2. It is to note that vocabulary of semantic words \(W =\{w : w \in d.text, d \in D \}\) is from geotagged documents.
In our model, the information graph \({{\,\mathrm{\mathcal {G}}\,}}\) is formed of multiple subgraphs. The subgraphs we model are of two types heterogeneous or bipartite subgraphs and homogeneous subgraphs. We define the three bipartite subgraphs and two homogeneous subgraphs as follows.
Definition 1
(ROI–POI Bipartite Graph: \(G_{rp}\)) An ROI–POI graph, denoted as \(G_{rp} = (R \cup P, E_{rp})\), is a bipartite graph with edges \(E_{rp}\). An edge \(\{ e=(r_i,p_j) \in E_{rp}\}\) exists iff \(p_j\) is located within \(r_i\), and the weight of edge is \(\omega (r_i,p_j)=1\).
Definition 2
(ROIWord Bipartite Graph: \(G_{rw}\)) An ROIWord graph, denoted as \(G_{rw} = (R \cup W, E_{rw})\), is a bipartite graph with edges \(E_{rw}\). An edge \(\{ e=(r_i,w_j) \in E_{rw}\}\) exists iff \(w_j\) is mentioned in any \(d_{r_i}\), and the weight of edge \(\varpi (r_i,w_j)\) is calculated with tfidf scores.
Definition 3
(POIWord Bipartite Graph: \(G_{pw}\)) A POIWord graph, denoted as \(G_{pw} = (P \cup W, E_{pw})\), is a bipartite graph with edges \(E_{pw}\). An edge \(\{ e=(p_i,w_j) \in E_{pw}\}\) exists iff \(w_j\) is mentioned in any \(d_{p_i}\), and the weight of edge \(\varpi (p_i,w_j)\) is calculated with tfidf scores.
Definition 4
(ROI Graph: \(G_{r}\)) An ROI graph, denoted as \(G_r = (R, E_r)\), is a homogeneous graph network of ROIs where an edge \(e \in E_r\) between two ROIs denotes they are spatially overlapped or neighboring region.
Definition 5
(Word Graph: \(G_{w}\)) An word graph, denoted as \(G_w = (W, E_w)\), is a homogeneous graph network of words where an edge \(e \in E_w\) between two words signifies their cooccurrence in geotagged documents.
2.4 Baseline approaches
Stateoftheart methods are not tailored for multipartite embedding. Hence, we extend the stateoftheart network embedding methods to form multiple comparable baselines to compare against our TNE model with various experiments in Sect. 4. The last baseline TNE_nw uses TNE model, but trained on nonweighted edges version of information graph. We use this baseline to show the importance of edge weights in our model.

1.
GE_poi (POI Aggregation): The work of Xie et al. [47] produces stateoftheart POI embedding \(\{{\vec {{\varvec{p}}}} : p\in P\}\) for POI recommendation. Though their objective is different from our, but we matched their technical model with our information graph network \({{\,\mathrm{\mathcal {G}}\,}}{\setminus } (G_{rw},G_r) \) to generate POI embedding for fair comparison. Based on the edges of graph \(G_{rp}\), we aggregate POI embedding \({\vec {{\varvec{p}}}}\) from the same ROIs via normalized vector summation to obtain resultant ROI embedding vector \({\vec {{\varvec{r}}}}\). This baseline also depicts what if similar importance is given to all the POIs in a region. Mathematically, \({\vec {{\varvec{r}}}}\) is calculated as:
$$\begin{aligned} {\vec {{\varvec{r}}}} = \Big \{ \frac{\sum {\vec {{\varvec{p}}}}}{ \sum {\vec {{\varvec{p}}}}} : \exists (r,p) \in E_{rp}, r \in R \Big \} \end{aligned}$$
This approach is expected to perform well in POI embedding and capturing basic (nonweighted) semantic relation of ROIs. Since GE_poi does not perform crowd engagement within ROIs, it is expected to partially solve the problem.

2.
CrossMap (RegionWord graphs): This approach is very similar to CrossMap work from Zhang et al.[54] for popular event exploration. Inspired by their model, we only leverage the information from social engagements and its relation on ROIs to generate ROI embedding. From our information graph \({{\,\mathrm{\mathcal {G}}\,}}\), we used \(G_{rw},G_r,G_w\) to generate ROI embedding for this baseline. Baseline CrossMap captures crowd engagements on ROI, but neither the geospatial correlation nor the POI effects is considered for ROI in their approach.

3.
BiNE (Multiple Bipartite Networks): BiNE [16] is a method proposed for learning vertex representation in bipartite graphs. We treat this as another baseline for learning ROI embedding from bipartite graphs \(G_{rp},G_{pw}\) and \(G_{rw}\). It will be interesting to see if this baseline can capture the spatial affinity and semantic relation of ROIs. We expect BiNE to fail in capturing geospatial correlation as transitivity property is not incorporated in this approach. In the related work (i.e., Sect. 5), we explicate the rationale of using BiNE as another stateoftheart baseline model for comparisons.

4.
TNE_wcr (TNE without Community Random Walk): This version of our model TNE does not take advantage of our communityaware random walk strategy and uses the traditional random walk strategy. Including this baseline model in our experiments helps recognize the impact of incorporating the communityaware random walk in TNE.

5.
TNE_nw (Nonweighted TNE): This version of our model TNE does not use tfidf weights over \(G_{pw}\) graph for measuring the popularity of POIs. This approach demonstrates the modeling advantage of these weights in comparison with Jenkins et al. [22] which does not use such weights—among other differences.
3 TNE: tripartite network embedding
In this section, we present our approach TNE, the Tripartite Graph Network representation learning which can be generalized to a multipartite network embedding model. The primary focus of TNE is learning of ROI embedding, i.e., Problem 1 and Problem 2 are an application of the former. Our network embedding model TNE is (a) microscopic structurepreserving network embedding; (b) transitive propertypreserving networks; and (c) communityaware network embedding. We explain each of these features as we simplistically unravel our model.
3.1 Direct relation models
The relationship among vertices which is straightforward visible from the edges set in the information network is known as direct relation model. We classify Direct Relation Models based on type of vertices between the edges in graph, such as (a) heterogeneous relation, (b) homogeneous relation models.
3.1.1 Direct heterogeneous relation model
The basic graph building block for any multipartite/tripartite networks is bipartite networks that represent relationships between two nonsimilar entities or vertices set. Considering our tripartite information network \({{\,\mathrm{\mathcal {G}}\,}}\), we have three bipartite networks \(G_{rp}, G_{rw}, G_{pw}\). A bipartite graph network is a heterogeneous vertex network (in our model) that represents direct or firstorder relations which we dub as direct heterogeneous relation model.
In any structurepreserving network embedding, it is desirable that the closeness property between two wellconnected vertex is high. Even if the connected vertices are different in nature (e.g., POI and Word in \(G_{pw}\)), their proximity in network is a direct relational information that must be imbibed in the embedding network. For the sake of understanding, let us consider a bipartite network \(G_{uv}=(U \cup V, E_{uv})\), where \(U =\{u_1,u_2,\ldots ,u_{U}\}\) and \(V=\{v_1,v_2,\ldots ,v_{V}\}\) are two sets of different types of vertices, and \(E_{uv} \subset U \times V\) is edge set. Also consider the embedding representation of vertex \(u_i\) and \(v_j\) as \(\vec {{\varvec{u_i}}} \in {\mathbb {R}}^n\) and \(\vec {{\varvec{v_j}}} \in {\mathbb {R}}^n\), respectively. In our model, we consider the Euclidean embedding space where we define closeness measure between any two vertices \(u_i\) and \(v_j\) as conditional probability \({\overline{\Pr }}(v_ju_i)\).
Existing literature and pioneer embedding work of word2vec [28] depict the importance of using inner product for similarity measure and transforming it into probability space with sigmoid function. The microscopic structure of network connection is captured with conditional probability between vertices.
where \(\varpi \) is edge weight function, i.e., \(\varpi (u_i,v_j)\) is weight of edge \(e_{u_iv_j} \in E_{uv}\) and \(deg_{u_i} = \sum _{e_{u_iv_k} \in E_{uv}} \varpi (u_i,v_k)\).
The objective of the model is to learn the embedding vectors by minimizing difference between pairwise distribution.
where \(D_{KL}\) is KL divergence measure for difference between probability distributions. The expression \(\sum \Pr (v_ju_i)\) \(\log {\Pr (v_ju_i)}\) from Eq. 3 is the information entropy expression which is modeled as edge entropy, i.e., \(\sum \varpi (u_i,v_j)\) function. From the final expression, we obtain all the variables in optimization functions, i.e., vectors \(\vec {{\varvec{u_i}}},\vec {{\varvec{v_j}}}\) from \({\overline{Pr}}(\cdot )\).
KL divergence is a particular case of a broader class of divergences called fdivergences. KL divergence is asymmetric and commonly used by embedding methods that preserves local and microstructures [21]. There are other types of divergences such as reverse KL divergence (RKL), Jenson–Shannon (JS) divergence, Hellinger distance [19], \(\chi ^2\) distance measures. As the name suggests, optimizing with reverse KL measures can capture the global or macronetwork structures. JS divergence is symmetric in nature, and some research works suggests using JS distance as a cost function in the empirical domain for optimization purpose [15, 27]. \(\chi ^2\) distance also behaves similar with respect to preserving local structure. Based on the intention of capturing micro and macrostructure or giving equal importance to both of them, we can pick out the right methods.
In our case, the optimization equation for tripartite graphs \(G_{rp},G_{rw}\) and \(G_{pw}\) with KLdivergence method follows:
3.1.2 Direct homogeneous relation model
In many information network, having a direct homogeneous graph is not common. For example, consider the information network with Yahoo Answers or Quora. Users in these sites post questions which then gets answered by other users. There are direct relational graphs between usersquestions, questionsanswers and answersusers, but there are no direct relations among users. There are of course information networks where direct homogeneous graphs are present. It is important that we utilize the information from such graphs because more information helps in learning better [10, 41] as it reduces uncertainty in learning weights within model.
In our scenario, \(G_r\) and \(G_w\) are two homogeneous graph in \({{\,\mathrm{\mathcal {G}}\,}}\), i.e., the edges are between the same type of vertices. The edges in these graphs signify explicit proximity between connected vertices. Even though the information from these explicit relations is very informative, it is not sufficient for embedding because of their sparse nature. The embedding model can still be significantly enhanced by incorporating implicit information via indirect relation graphs as discussed in Sect. 3.2 and then merging direct and indirect homogeneous graphs as shown in Sect. 3.2.2.
3.2 Indirect relation models
In this section, we focus on modeling indirect and deducible relations that contribute in obtaining meaningful information toward embedding. Recent work suggests deducible information helps in improving semantic properties [16, 23, 51]. Heterogeneous networks consisting of bipartite graphs do not have explicit relations among vertices of the same type. To understand the importance of indirect relation, take the example of POIs in our data. The POIs set P does not have any explicit edges between any two POIs. But there are POIs that are similar based on the reception they receive from people. A subset of words can form a topic and commonly describe similar POIs (which is true in realworld scenario), and it is very likely that there will be significant number of paths between the similar POIs in bipartite graph \(G_{pw}\). Generating all the paths between all pairs of large number of vertices is infeasible. To alleviate the issue, it is a common practice to generate several random walks to mimic the representation of a corpus of vertices with the intuition that important vertices get repeated based on its popularity.
3.2.1 Indirect homogeneous graphs
Random walks on bipartite graphs have periodicity issues [1]. The common strategy of addressing this problem is to construct two homogeneous graphs from bipartite graph utilizing secondorder proximity between vertices of the same types [11]. Having said that, we construct \(G^v_{u} = (U,E^v_{u})\), a homogeneous graph with vertices U by utilizing transitive relations with vertices V from bipartite graph \(G_{uv}\). We defined the secondorder proximity between two vertices \(u_i\) and \(u_j\) by weight \(\varpi (u_i,u_j)\), where \(e_{u_iu_j} \in E^v_{u}\) such that there exists edges \(e_{u_iv_k}\) and \(e_{v_ku_j}\) in \(G_{uv}\).
Similarly, we construct homogeneous graph \(G^u_v=(V,E^u_{v})\) with relations via vertices U from \(G_{uv}\).
3.2.2 Merging homogeneous graphs
Our information network \({{\,\mathrm{\mathcal {G}}\,}}\) consists of three bipartite graphs \(G_{rp},G_{rw},G_{pr}\). We now generate homogeneous graph \(G^p_r\) on ROIs R with indirect relations via POIs P and homogeneous graph \(G^w_r\) on ROIs R with indirect relations via words W. Similarly, homogeneous graphs \(G^r_p, G^w_p\) are obtained on POIs P with indirect relations on ROIs R and Words W, respectively. Homogeneous graphs \(G^r_w, G^p_w\) are also generated with indirect relations on ROIs R and POIs P, respectively. Allinclusive \(G^p_r, G^w_r\),\(G^r_p, G^w_p\), \(G^r_w, G^p_w\) indirect homogeneous graphs are obtained from three bipartite graphs.
The homogeneous graphs \(G^p_r, G^w_r\) both on ROIs R provide implicit relation among its vertices. We use all the information from direct and indirect homogeneous graphs by simply appending the edges from the graphs \(G^p_r, G^w_r, G_r\) to form a single graph \(G'_r\) for modeling random walks. However, it should be determined whether these graphs are compatible and not contrasting to each other. Intuitively incompatible graphs can be very contrast in terms of their hubs and authority vertices which can lead to information dilution and loss of quality. In such cases, a wise decision is to only choose the most effective—the most compatible—set of homogeneous networks to merge from multiple homogeneous graphs; this decision lies with the data scientist. To effectively measure the compatibility of graphs, we use the hub and authority matrices from both graphs. Close observation on HITS [25] algorithm reveals that it is an iterative power method to compute the dominant eigenvector for \(M \cdot M^T\) and for \(M^T \cdot M\) where matrix M is an adjacency matrix of a graph. Hub matrix is \({\mathcal {H}} = M \cdot M^T\), and authority matrix is \({\mathcal {A}} = M^T \cdot M\). Also, constant initialization of hub/authority scores enables us to perform power iteration on \({\mathcal {H}}\) and \({\mathcal {A}}\) and choose matrices from any iteration. Let \({\mathcal {H}}^p_r,{\mathcal {H}}^w_r\) and \({\mathcal {A}}^p_r,{\mathcal {A}}^w_r\) be the hub and authority score matrices of two homogeneous graphs on ROI R.
Finding similarity or distance with labeled graphs is an easy task, and we can leverage simple methods like edit distances, matrix similarity or even complex methods like coupled vertexedge scoring [53], MCES [37], etc. For our model, we use Frobenius distance between two matrices and they qualify for merge if the sum of distance is less than some positive value \(\phi \).
Similar to \(G'_r\) homogeneous graph of ROIs, we construct \(G'_p\) and \(G'_w\) by merging \((G^r_p, G^w_p)\) and \((G^r_w, G^p_w, G_w)\), respectively.
3.2.3 Communityaware random walks
Homogeneous graphs constructed from bipartite networks are used to generate a corpus of several random graphs. DeepWalk [35] generates such random walk and utilizes it for learning embedding. BiNE [16] addresses issues that DeepWalk [35] does not capture the characteristic of the realworld network because the distribution of vertices in random walks, and the graph network does not match. One solution is to generate random walks based on the importance of vertices measured with hubs and authority score of vertices.
Community is defined as a subset of vertices within the graph such that connections between the vertices are denser than connections with the rest of the network [36]. If the number of connections or reachability between vertices within a very few hops is high, then they must have a stronger bond. In a realworld scenario, we often have edges that act as bridges between communities or subcommunities. Often sparsity and lack of information in training data are responsible for the appearance of bridges within a community. Even if there is a moderate number of bridges, centrality biased random walks will seldom connect them. We propose a \(\delta \)hop communityaware random walk where a step in the random walk can mutate to a jump with probability \(\alpha \) within \(\delta \)hop connected community.
The motivation of a \(\delta \)hop community is to include strongly/wellconnected bridges and avoid weak connected community bridges. We used \(M^3\), and 3hop is the least number of hops such that an internal node from a wellconnected community can reach an internal node of another community via a bridge, where M is a adjacency matrix.
Hence, it is straightforward to follow that with a low \(\delta =3\) and a low stepjump mutation probability \(\alpha =0.1\), the jump likely remains within the community but alleviates the moderately connected community problem. Like other biased random walk model following “rich gets richer” principle, our mutated stepjump acts as a welfare strategy in the algorithm.
Algorithm 1 presents the summarized communityaware random walk to prepare corpus \({{{\mathcal {D}}}}_u\) from graph homogeneous \(G'_u\). Statistic suggests that mean length of sentences in English varies between 20 and 25 words and follows normal distribution [52]. Technical writing sentences are typically shorter. We take the inspiration from it and use normal distribution with mean \(\mu =15\) and standard deviation \(\sigma =10\) to generate length of sequences in corpus \({{{\mathcal {D}}}}_u\). Starting a sequence with a vertex depends on its popularity (centrality), but we also limit it to a maximum of 5 with variable maxStart.
3.2.4 Corpus generation
Following the communityaware random walk on \(G'_r,G'_p,G'_w\), we obtain corpuses \({{{\mathcal {D}}}}_r,{{{\mathcal {D}}}}_p,{{{\mathcal {D}}}}_w\), respectively, by using Algorithm 1.
For a sequence \({\mathcal {S}}\) in corpus \({{{\mathcal {D}}}}_r\), an ROI \(r_{i}\) positioned at index c in \({\mathcal {S}}\) is represented as \(r^c_i\). In a sequence \({\mathcal {S}}\), a context of m from c will be the ROIs positioned from \(cm\) to \(c+m\), i.e., \(\{r^{cm}_{\odot }, r^{cm+1}_{\odot }, \ldots , \) \(r^{c}_{\odot }, r^{c+1}_{\odot },\) \(\ldots , r^{c+m}_{\odot }\}\), where \(\odot \) is in range [1, P]. We can now apply the skipgram model on corpuses similar to the technique used in Word2Vec [29] embedding to optimize each embedding entity. To optimize the embedding for ROIs \({\vec {{\varvec{r}}}}\), POIs \({\vec {{\varvec{p}}}}\) and Words \({\vec {{\varvec{w}}}}\), we should minimize the expressions for objective functions \(O'_{r},O'_{p}\), and \(O'_{w}\), respectively. It is to note that for each entity, as we create an embedding vector, we also need to assign a corresponding context vector for that entity.
Similarly, we optimize for POIs \({\vec {{\varvec{p}}}}\) with function \(O'_{p}\).
Finally, we optimize for Words \({\vec {{\varvec{w}}}}\) with function \(O'_{w}\).
3.3 Negative sampling
The conditional probability \({\overline{\Pr }}(v_ju_i)\) from Eqs. 3, 4 and \({\overline{\Pr }}(u_ju_i)\) from Eqs. 8, 9, 10 is computationally expensive since it would need to sum over the entire set of vertices. The stateoftheart method to empirically estimate them is via negative sampling (e.g., as in specified in [29]), where the denominator is estimated by sampling random vertices. The numerator (defined by explicitly similar vertices) can be calculated directly.
In particular, negative sampling helps to learn a better embedding by selecting negative vertices that have significant probability difference, yet are closely connected vertices. Our negative sampling method uses popularity biased method which helps in learning faster but also alleviates gradient vanishing issues [6]. We use the concept of transition probabilities in random walk from one vertex to another, and this strategy perfectly replicates the popularity/rankingbased system which we leverage for negative sampling [55]. In a random walk starting from vertex \(u_i\) adjacent to vertex \(u_j\), the probability of reaching from \(u_i\) to \(u_j\) is defined as the ratio of the weight of the edge \((u_i,u_j)\) over the sum of weights on all adjacency edges of vertex \(u_i\). We compute the (i, j)th cell of transition matrix T from the adjacency matrix M of graph as:
where \(M_{u_i,u_j}\) is the weight of edge between \(u_i\) and \(u_j\).
Naturally, T is a right stochastic matrix. We also make sure that selfloops, if they initially exists, are removed from the matrix. Based on the matrix T, we perform a \(\delta \)hop random walk by power iteration \(T^\delta \). For some dense graphs, the matrix can converge and reach a steadystate distribution in few hops. For our purpose, we restrict the \(\delta \) to \(\delta _{\text {max}} = 5\). The row \(u_i\) of \(T^{\delta _{\text {max}}}_{u_i}\) act as a noise distribution matrix for selection of negative candidates for target vertex \(u_i\). We define the K negative samples for target \(u_i\) as \(N^K_{G_u}(u_i)\).
Following the negative sampling technique for homogeneous graphs, we need to extend this technique for incorporating bipartite graphs as well. Firstly, we assume the prevalence of transitive property for bipartite graphs to model hops between the same type of vertices, i.e., if \(u_i\) is connected to \(v_k\) and then \(v_k\) is connected to \(u_j\), then we assume existence of edge between \(u_i\) and \(u_j\) in graph \(G_{uv}\). The weights of edge, i.e., \(\varpi (u_i,u_j) = \sum _{v_k \in V} \varpi (u_i,v_k) \cdot \varpi (v_k,u_j)\). After we have defined the edges and weights between connected \(u_i\)s and \(u_j\)s, it is easy to obtain T. Thereafter, \(\delta _{\text {max}}\)hop and \(T^{\delta _{\text {max}}}\) noise matrix is obtained to perform negative sampling on the same seed and target type vertices, we dub this as homogeneous negative sampling. For a seed vertex \(u_i\) in bipartite graph K negative samples \(N^K_{G_{uv}}(u_i)\) is obtained from noise distribution row \(S(u_i)\) in S where
For negative sampling on bipartite graphs where the seed vertex is different from target sample vertices, which we dub as heterogeneous negative sampling, e.g., seed \(u_i\) to target \(v_l\), we apply the usual transition probabilities on the already obtained noise matrix \(T^{\delta _{\text {max}}}\). When a vertex \(u_i\) connected to \(u_j\) in \(\delta _{\text {max}}\)hop, all the adjacent vertices of \(u_j\) say \(V' = \{v_l  e_{u_j,v_l} \in E_{uv})\}\) are now considered for heterogeneous negative sampling. The entry of \((u_i, v_l)\) cell in noise distribution matrix of bipartite graphs is calculated as
where \(\frac{M_{u_j,v_l}}{\sum _{v_m \in V}M_{u_j,v_m} }\) is the transition probability from \(u_j\) to \(v_l\).
For each edge \((u_i, v_j)\) in a graph with target vertex \(u_i\) and K negative samples, we follow the conditional probability approximation \({\overline{\Pr }}(v_ju_i)\), where \(\vec {{\varvec{\varsigma _j}}}\) is the context vector for \(v_j\) as follows:
Similarly, for \({\overline{\Pr }}(u_ju_i)\), where \(\vec {{\varvec{\varkappa _j}}}\) is the context vector for \(u_j\) as follows:
3.4 Optimization and model update
The intuitive solution for optimization is to minimize the sum of all objective functions. A more complex solution for multiobjective optimization can be applied. However, choosing a multiobjective optimization in embedding scenario requires more studies and can be presented as a separate research work on its own. Having said that, we use nonweighted linear combination of each optimization expressions from Eqs. 4, 8, 9, and 10 to make a single global optimization.
We present our tripartite joint optimization in Algorithm 2. In the preparation phase, communityaware random walks generate corpora \({{{\mathcal {D}}}}_r,{{{\mathcal {D}}}}_p,{{{\mathcal {D}}}}_w\), negative sampling module prepares noise distribution matrices. In the joint embedding training phase, edges are sampled from each graph simultaneously and update embedding vectors along with the context vectors using the stochastic gradient descent algorithm.
The complexity of the training depends on the density/sparsity of the graph network. To avoid expensive computation of centrality and \(\delta \)hop adjacency matrix, we perform walks on the graph based on degree centrality. The context size for a vertex is \(b\cdot m\), where b is the batch size much less than the maximum degree of the vertex, and m is context defined in Sect. 3.2.4. Overall, the computation complexity of our algorithm is \(O(E_{rp}+E_{rw}+E_{pw} \cdot b \cdot m \cdot (ns+1))\), where ns is the number of negative samples.
TNE supports increment updates as we collect new datasets from social networks and create a new information graph or update the old information graph. In this case, the embeddings previously generated from TNE should be used instead of random initialization of embedding vectors. Hyperparameter tuning, such as the learning rate, should be tweaked based on the age of the previously trained dataset and volume of the new dataset. With the increasing volume of new data and the aging of the previous dataset, the learning rate can gradually increase for optimal performance.
4 Experiments
In this section, we first describe our realworld dataset based on New York City (NYC) used in our experiments. We then present five experiments we performed exhibiting multifaceted effectiveness of ROI embedding with TNE on spatial correlation, semantic association and predictive capabilities. A summary of the experiments is as follows:

1.
Model validation with POI classification: We perform this POI embedding experiment to validate the fact that TNE, TNE_nw and stateoftheart baselines can perform this task equally well as expected. In the followup experiments, we show that baselines methods cannot perform at par with TNE or TNE_nw on ROI experiments validating the necessity for TNE.

2.
Geospatial affinity of ROIs: This experiment evaluates all the models on ROI embeddings, whether it can preserve the spatial correlations among ROIs in the embedding space as the original data. By spatial correlation, we mean neighboring or spatially overlapping ROIs.

3.
Semantic category annotation of ROIs: In this experiment, we perform a ranking evaluation task with category annotation from ROI embeddings and crowdsourced ground truth results. We use normalized discounted cumulative gain (NDCG) [44] metric as the measure of performance.

4.
Semantic category difference from ROI embedding: This experiment is similar to the previous experiment with a distinction here that we try to evaluate the semantic difference between a pair of ROIs from their embedding.

5.
Popularity Prediction of Regions: We introduce region popularity prediction experiment with the simplest of regression models to demonstrate that ROI embedding with TNE_nw, TNE can capture features better than extended baselines along with temporal features. The aim is not to overcomplicate experiment with complex models aiming lowest error but to show perceptible differences even with simple featurebased models.
4.1 Dataset
The dataset imitates the information graph \({{\,\mathrm{\mathcal {G}}\,}}\) we presented in Fig. 2. Also, as described in Sect. 2, our realworld dataset consists of three entities (a) POI, (b) ROI, (c) Word. We will release the anonymized processed version of dataset adhering to the copyright of the sources for the growth of research work in this field.
POIWord Data. We used the checkin dataset from [49] and NYC government site [32] to collect POI dataset. Our dataset comprise of 38,008 POIs. Each POI is associated with geolocation, name, category, description and comments. The words from name, description and all available comments from each POI are cleaned and tokenized in a preprocessing step. The association of words with POIs is used to create edges between them. As mentioned in Sect. 2.1, the weights of the POIWord edges are calculated based on their TFIDF score.
ROI–POI Data. The ROI data are obtained from the publicly available GeoJSON [31] of NYC that are demarcated with multipolygonal features. Each GeoJSON has a unique geographical division of NYC based on its type. Figure 3 shows some geographical divisions of NYC, such as boroughs, city councils, election districts, fire battalions, police precincts districts and health districts. All geographical divisions consist of several nonoverlapping ROIs, and each of them is treated as a separate and unique ROI in our dataset. Overall, we have 12 different geographical divisions/districts as stated in Table 3 along with the number of ROIs from that division. The total number of ROIs in our dataset is 456.
A POI is associated with an ROI iff the geolocation of the POI is within the polygonal boundary of the multipolygonal spatial feature. It is notable that for a nonoverlapping set of ROIs; POIs will create a manyone onto relation function with ROIs. However, introducing overlapping ROIs makes the information graph \({{\,\mathrm{\mathcal {G}}\,}}\) interesting because shared POIs among two or more overlapping ROIs increases the complexity of the graph. The associated weight of the edges in ROI–POI graph is assigned a value of 1.0.
ROIWord Data. The relationship between ROI and Word is obtained from the geotagged tweets collected over a period of time. Similar to the technique used with POIWord pair, the weight of edge between an ROIWord is determined from the TFIDF score. First, we used 1% sample tweet stream from twitter to collect our geotagged documents for one month and prepare a corpus of documents (each document associated with an ROI). On analyzing the twitter stream and performing TFIDF on the corpus, it revealed that one month of 1% sample stream is not enough to extract meaningful TFIDF scores and information out of blabber, chores and chatter of tweets. To alleviate the problem, we used 6 months of 1% sample twitter stream. It is still a very feasible approach as 6 months of 1% sample roughly equates to 2–3 weeks of original twitter stream or firehose API of Twitter.
We present examples of the TFIDF scores w.r.t. ROIs from Police Precincts division (as shown in Fig. 4) for 1 month and 6 months of data in Table 4 for two famous attractions of NYC, i.e., Empire State Building and Brooklyn Bridge. For 1 month of data and word empirestatebuilding, ROI 12 have maximum TFIDF score of 0.019. However, the location of Empire State Building suggests ROI 07 should have the maximum score, which ranks second highlighted in bold in Table 4. It is also notable that TFIDF scores for 1 month do not suggest good spatial correlation. On contrast, the result with 6 months of data shows significant correlation of TFIDF score and true ROI location for Empire State Building, i.e., ROI 07. Furthermore, the TFIDF scores for the same are in accordance with the neighborhood ROIs showing strong geospatial correlation.
Another interesting trend can be seen with Brooklyn Bridge, where the true spatial location is ROI 01 and 53 highlighted in bold in Table 4. For 1 months of data, though the top TFIDF scores are in accordance with ground truth ROI location, the scores are very close to the ROIs that are not near to Brooklyn Bridge (i.e., ROI 53: 0.041; ROI 39: 0.035), whereas a clear disparate between TFIDF scores of ground truth ROIs (53 and 01) and other ROIs (00, 03 and 04) with 6 months of data. These examples explain and support our decision of using 6 months of geotagged tweets.
Region Popularity Data. Region popularity data are collected from the New York checkin dataset [49], which contains 227,428 checkins from Foursquare for period of April 2012 to February 2013. We score the popularity of a region from the number of checkins.
4.2 TNE validation with POI classification
This experiment evaluates POI embedding from our model and baselines. The aim of this experiment is to validate that our model is consistent in learning POI embedding as other stateoftheart work. In this experiment, we expect all the methods to perform equally well.
Our POI dataset has a ground truth category for each POI which has been collected from the data source. It is worth mentioning that Table 2 presents all nine top level categories for our POI dataset.
First, we present (i) knearest neighbor classification to evaluate POI embedding of all the models. Then, we use (ii) tSNE visualization to notice the macro and microstructure of embeddings.
To boost our learning process, we initialized word embeddings with pretrained GloVe [34] embedding. We used Glove vectors of words from description of POIs for POI embedding initialization. However, all the ROI embeddings are always initialized with random vectors. Our justification for initialization is to utilize full resources and information available in hand, rather than spending more iterations on learning from random initialization.
4.2.1 knearest neighbor classification
We trained our knearest neighbor (kNN) classifier on 70% of embedding and evaluated on the rest of the embedding data. The dimension of embedding was kept 100 and k stands for the number of nearest neighbors considered for kNN classification. From the result presented in Table 5, we see that GE_poi, TNE_nw, and TNE performed similarly with 96% accuracy in determining top category, whereas BiNE achieves more than 95% for kNN with \(k\ge 3\). It verifies that TNE achieves comparable stateoftheart performance with GE_poi. We have not included CrossMap result in Table 5 because CrossMap does not produce POI embedding.
4.2.2 tSNE visualization
To reveal subtlety of the POI embedding and explore macro and microstructure, we perform tSNE on the highdimensional POI embedding. We color each POI in accordance with the top category mentioned in Table 2. Figure 5 shows how the POI embedding changes from training iteration 10 and 40 for TNE. Figure 5a shows different category points are much nearer and somewhere overlaps with one another. The scenario of such overlaps and distance between dissimilar category cluster improves with more iteration in Fig. 5b. We also present tSNE of GE_poi in Fig. 5c.
Though our kNN classification and tSNE yield good performance for top level category or macrostructure, our experiment did not feature so well with subcategories. In Fig. 6 we present a microscopic analysis of the embeddings with tSNE visualization based on POI subcategories. For this experiment, we have taken all the POIs with top category as Travel and Transport and performed tSNE on it. The colors of the POIs in Fig. 6 are based on the subcategories. Here, we provide the list of the subcategories for Travel and Transport and order them with the color number in the tSNE visualization: 0. Airport; 1. Bike Rental/Bike Share; 2. Boat or Ferry; 3. Bus Station; 4. General Travel; 5. Hotel; 6. Light Rail Station; 7. Metro Station; 8. Moving Target; 9. Pier; 10. Rental Car Location; 11. Rest Area; 12. Road; 13. Taxi; 14. Tourist Information Center; 15. Train Station; 16. Travel Lounge.
It is clear from Fig. 6 that the POI embedding of subcategories is overlapping for both GE_poi and TNE. The close association among POIs under the same top level category might explain such embedding phenomenon in semantic space. However, it might be worth to look into features of such intracategorical POIs in the future work.
4.3 Geospatial affinity of ROIs
In this section, we evaluate the ROI embedding based on the geospatial affinity among ROIs. The intended scenario is to obtain similar embeddings for ROIs having geospatial affinity, i.e., (a) overlapping region, and (b) neighboring region.
We randomly selected 200 ROIs and analyzed 4 nearest neighbors of each ROI from our embedding with crowdsourced ground truth. Human judgment is used to find out whether the nearest neighbors ROIs predicted from embedding have any geospatial affinity or not with the queried ROI. We build a website with geographical map for crowdsourcing and to facilitate this process. Ideally, we would want more ROIs with 3–4 geospatially overlapped neighbors from the kNN result from embedding space with \(k=4\). It is worth to mention that our dataset has 12 different geographical division that means each ROI has many (at least 10) geospatially overlapped ROIs. In plot of Fig. 7d, we show the number of ROI neighbors that have geospatial affinity for TNE. The last histogram bar with black color shows that out of 200 ROIs more than 80 ROIs have 4 neighbors with geographical overlapping region or neighboring boundary for TNE. We performed similar analysis on GE_poi; the number of ROIs with 4NN is comparatively low (only \(10\%\)) as shown in Fig. 7a, compared to \(40\%\) with our model in Fig. 7d. The results for CrossMap and BiNE are far worst with almost 50% and 55% of the ROIs with zero geospatially overlapped neighbors, respectively, as shown in Fig. 7b, c. From this result, we can strongly deduce that our embedding preserves geospatial affinity in its embedding which other baseline approaches cannot.
4.4 Semantic category annotation of ROIs
Figure 8 provides three examples of the geospatial affinity (with query ROIs 06000000, 07000004, and 11000043) obtained by utilizing nearest neighbor technique on ROI embedding. Figure 8a shows nearest neighbors of ROI 06000000 from embedding (05000000, 03000068, 09000009, 02000005). Similarly, Fig. 8b shows nearest neighbors of ROI 07000004 as (10000027, 12000010, 02000025, 10000004). The interesting observation of Fig. 8c for nearest neighbors for ROI 11000043 in Staten Island is that it finds a ROI 05000045 located in Brooklyn to be similar. More detailed observation on both the ROIs reveals that they are similarly popular with Arts and Entertainment POIs, Outdoor activities as obtained from the cosine similarities of the embeddings. Table 6 presents the similarity scores of the abovementioned ROIs for some semantic categories. We will discuss more on the technical methods on obtaining it in Sect. 4.4.
In this section, we present the analysis of ROI embedding on semantic category annotation. First, we show an example of semantic annotation in Table 7 for ROI 09000056. The geospatial location of ROI 09000056 Greenpoint, Brooklyn, NYC is presented in the map along with Table 4 as ROI 56. The rank of categories in Table 4 suggests Greenpoint has considerable shops and services locations, recreation parks and residential complexes. To verify our prediction, we tallied the rank with human raters who used Foursquare [14], NYC government site [9] and ArcGIS [2] maps, Twitter [42] and Wikipedia [46] for ground truth information. Crowdsourced groundtruth semantic categories of ROIs are ranked into three levels (1) low relevant level, (2) moderately relevant level, (3) highly relevant level. Crowdsourced information for ROI 09000056 suggests that there are many good shops, McCarren Park for outdoor activities and residential complexes. This information aligns with top 3 categories of semantic category annotation (a) Shops and Services, (b) Outdoors and Recreation, (c) Residential.
For a comprehensive analysis, we crowdsourced groundtruth categories with human raters for 100 random ROIs with category levels 3, 2, and 1. We compare groundtruth against the semantic category annotations obtained from the embedding. We converted it into a ranking problem. In an ideal case, all categories with level 3 should rank higher than level 2, followed by level 1 categories at the bottom. We used normalized discounted cumulative gain (NDCG) [44] to find the quality of embedding via ranking order. Table 8 shows NDCG scores at topk ranking positions, and higher the score signifies better ranking order achieved by the model. Result presented in Table 8 with NDCG scores highlighted in bold suggests that TNE beats all baselines GE_poi, CrossMap, BiNE, TNE_wcr, and TNE_nw by a considerable margin. TNE achieved an NDCG@1 score of 0.844 and an average NDCG@k (\(k=[1,5]\)) of 0.8206 with 9 semantic categories.
It is an important result in our experiment that gives us insights on how ROI embeddings can capture the semantic perspective observed by society about any region. From Table 8, we follow that TNE outperformed GE_poi, CrossMap and BiNE by 0.235, 0.215, 0.194 NDCG at rank 1 which is considerably high improvement in selecting the best category candidate for an ROI. The results are similar to other ranking levels. An average NDCG gain of more than 20% from stateoftheart baselines (i.e., GE_poi, CrossMap, and BiNE) is a large gain (in ranking problem) that shows the efficacy of TNE. Also, to note that TNE_nw and TNE_wcr performed better than other baselines but beaten by TNE with an average score of 0.1 (or 12%). It shows the necessity of using edge weights in \({{\,\mathrm{\mathcal {G}}\,}}\), and communityaware random walk in our strategy.
4.5 Semantic category difference from ROI Embeddings
In this section, we briefly demonstrate the capability of ROI embedding to find semantic differences between ROIs. Technically, for any pair of ROIs with embedding vectors \(\vec {{\varvec{r_1}}}, \vec {{\varvec{r_2}}}\) and semantic category vectors \(C=\{\vec {{\varvec{c_1}}},\ldots ,\vec {{\varvec{c_9}}}\}\), the top semantic category difference is calculated as follows:
We demonstrate semantic category difference of 3 pairs of overlapped ROIs from lower east, west and midtown of Manhattan as shown in Fig. 9. We ranked the top three semantic category differences for each pair with the formulation mentioned before. The result is presented in the table within Fig. 9, and on close observation, it reveals discernible facts. The major semantic category differences between the pair of ROIs (09000004,08000024) from lower east Manhattan shown in Fig. 9a are Arts and Entertainment and Residence; it is because ROI 09000004 has popular music and theater performance centers and has a large residential community known as East Village and on the contrary lower part ROI 08000024 shown in orange has many restaurants. Similarly, the pair of ROIs (03000037,09000005) from west Manhattan shown in Fig. 9b has major difference with Travel and Transport and College and Education since ROI 03000037 contains the transit hub of Manhattan (Port Authority) and universities such as The City University of New York and State University of New York and similar places do not feature in ROI 09000005. Lastly, the midtown Manhattan with ROIs (09000012,08000026) shown in Fig. 9c does not show Recreation as major category difference as ROI 08000026 fully covers ROI 09000012, i.e., Central Park (Recreation place) and shows differences in Arts and Entertainment, Shops and Services as ROI 08000026 is a cultural hub and a shopping or commercial area.
We performed an indepth study of the semantic category difference annotation with NDCG analysis, similar to the analysis in Sect. 4.4. We chose 30 pairs of ROIs, and human raters annotated all categories on each pair of ROIs in three levels based on their differences as (1) nonsignificantly, (2) moderately, (3) critically different. In an ideal case, the analysis from embedding should rank categories in the order, 3 critically, 2 moderately, and 1 nonsignificantly different categories. Table 9 shows the performance of each model on NDCG analysis. We still found TNE to perform better than other baselines on NDCG scores highlighted in bold in Table 9.
4.6 Region popularity prediction
To evaluate the effectiveness of ROI embedding in a realworld application, we performed the popularity prediction of region experiment. We used an open available checkin dataset of New York City [49] for the prediction task. The only feature used for prediction is the ROI embedding obtained from baselines and TNE models. We used two regression models (a) random forest, and (b) XGBoost, for prediction of the number of checkin in a region. Table 10 shows the mean absolute error (MAE), and rootmeansquared error (RMSE) for both the regression models the regression models, with the best results highlighted in bold. We can notice that TNE performed well in comparison with baselines in all except XGBoostMAE where TNE_nw performed best. However, the RMSE error for TNE_nw is very high for both regressions. We also performed a temporal (day, night) region popularity experiment, shown in Table 11. TNE_day, and TNE_night are TNE models trained with \(G_{rw}\) graph generated from geotagged tweets obtained during days and nights, respectively, and achieved the best MAEs for the corresponding timeperiod as highlighted in bold in Table 11.
Summary: Each experiment investigates a qualitative aspect of the embedding procedures. TNE provides a qualitative semantic embedding, shown via semantic category annotation experiments. The spatial affinity experiment exhibits that TNE preserves strong geospatial relations. Region popularity prediction with embedding features demonstrates the expressiveness of features from the models. From all the above experiments, it can be established that our approach for ROI embeddings with TNE shows admissible support on the quality of ROI representation.
5 Related works
To the best of our knowledge, only one very recent work by Jenkins et al. [22], builds an ROI embedding jointly with rich auxiliary information—in their case POI, satellite images, and taxi flow data. While our approach also uses POI data, it makes the (as we see very important) distinction to weight these by popularity, and also equally incorporates semantic information from microblog text. This allows for a different and (in our view) richer set of applications demonstrated, including temporal variation using timing of microblog updates. The embedding methods are also different, while their approach uses a single autoencoder from a convolutional network; we show how to build a tripartite network that can ensure the three components (ROIs, POIs, and semantic text) can be weighted equally. Although this work is only inpress, and their data are private, we still attempt to compare against this method by considering similar baselines—notably including method TNE_nw which like Jenkins et al. [22] does not include popularity weights on POIs in the \(G_{pw}\) graph.
POI Embedding. Extended literature survey suggests that research work on places of interest (POI) embedding are the most closest studies to our work. But there are major differences in our ROI embedding from the works on POI embedding [18, 39, 43, 47, 48, 50, 54]. Firstly, our work treats ROI as considerably bigger regions encircling many POIs, and simply aggregating POI embedding vectors to generate ROI embedding does not yield desired result, as we will see in our comparative experiments. Secondly, relevant POI embedding learning works focus on POI sequence recommendation task for users based on checkin activity [18, 43, 48, 50], whereas our task on ROI embedding focuses on preserving spatial and semantic relation without involving users in the scenario. That makes our problem statement different from others. Thirdly, POI embedding work by Xie et al. [47] modeled a bipartite graph network embedding for learning POI which also consist of a POIRegion bipartite graph. Though the concept of region is unclear from their paper, we assumed our definition for ROI for a comparative analysis. Major difference in our work is that we capture the social behavior within region and also transitive/implicit relationship for bipartite graphs. Since POI recommendation task is extraneous to our problem statement, we cannot directly compare their task/experiment with ours. Fourthly, the work of Zhang et al. [54] aims to find correlation among hotspot locations (defined as spatial Gaussian kernel window), word and time to search spatiotemporal events. We find our work dissimilar from [54] as hotspot locations are very different from our geographically bounded polygonal ROI or POIs. Both of these spatial entity play significantly different role in our model.
SemanticVisual Embedding. The idea of crossmodal embedding in oneshot supervised learning has recently garnered researchers’ attention. From the birdeye view, we find our objective moderately matches semanticvisualization embedding on images where the problem is the assignment of semantic labels on subregion/partial image [13, 38]. Our semantic embedding of ROIs also uses multimodal features to find the uniqueness of a spatial region. However, there are distinctions between the two fields of work. Our work’s novelty lies in the application of semantic features on the realworld geospatial regions of interest (ROIs) from the perspective of social engagement and solving the specific problems related to it. Additionally, the former focuses on featurebased spatial search on images, whereas our work concentrates on relationalbased semantic learning on graph networks. In that aspect, our work is entirely original in the geospatial domain.
Graph Network Embedding. Broadly, our work is related to network embedding research. The commonly used methods for network embedding are matrix factorization, random walk, deep neural networks. Our model is based on random walk, and Deepwalk [35] is the first pioneer work on it. We made advancements in the field with structurepreserving tripartite or multipartite network embedding following the footsteps after groundbreaking contribution from LINE, HINE, Metapath2vec++, PME, BiNE, etc. [5, 7, 12, 16, 40].
The first use of triparty or three entity in graph network embedding in alignment with randomwalk strategy is from Pan et al. [33]. However, it is not a true tripartite graph network, rather an attributed heterogeneous embedding approach involving text associatedentity by incorporating contextual word embedding. Another more closely related work on tripartite embedding is HGP from Kim et al. [24] involving group\(\rightarrow \)user\(\rightarrow \)item and does not consider groupitem relationship in the picture which does not make it a complete tripartite network. HGP propagates relation for each edge type independently, and their approach concentrates on attention mechanisms for largescale adaptation. Overall, the main aim of HGP [24] is to tackle the oversmoothing problem in heterogeneous graphs on a large scale, which is very different from our objective of incorporating implicit and explicit relationship in learning representation.
More recent work from Hong et al. [20] aligns their research direction toward attributed network embedding in a different direction. Each vertex in the graph network has a fixed set of features to evaluate their similarity. While these works [20, 24, 33] mainly concentrate on featureattributed network embedding, our work focuses on capturing implicit structural information from transitive relations on multipartite graph networks.
Furthermore, as described in BiNE [16], the random walk generator used in the works mentioned above (inspired by [35]) is not equipped to mimic the realworld distribution of vertices in a graph. BiNE [16] overshadows them in structurepreserving embedding, which thoroughly investigates vital information on edge relationship in graph network along with the oversmoothing problem of vertices. Hence, we also add BiNE [16] as a baseline for our experiments, where our model proposes communityaware random walk, transitive property preserving graphs, and a heterogeneous negative sampling technique for multiple entities embedding.
We thank reviewers of this paper to bring a very recent work of Chen et al. [4] to our notice, which explores folded bipartite network embedding using graph convolution network (GCN). This work advances bipartite network embedding by introducing higherorder relationships and using a selfattention technique to perform embedding. Our work concentrates on extending bipartite to multipartite network embedding with random walk modeling and supporting our usecase with a realworld application, which makes [4] partially orthogonal.
We believe our work contributes significantly toward structurepreserving network embedding and its application in semantic ROI embedding to herald a new direction in elucidating geospatial regions with semantic features.
6 Conclusion
In this paper, we propose TNE, a tripartite network embedding model for learning regions of interest (ROI) embedding. Our study focuses on learning ROI embedding that simultaneously captures semantic and geospatial features. First, we formalize the semantic embedding for ROIs problem with an information graph that captures social, semantic, and spatial attributes. Then, we use that TNE induces transitive relational features to obtain better learning performances while preserving the structure of the information graph. We performed multifaceted experiments on realworld data showing the advantages of performing ROI embedding with TNE over other baselines. Also, we demonstrate an interactive map to explore and discover the similarities and distinctness of regions.
References
Alzahrani, T., Horadam, K.J., Boztas, S.: Community detection in bipartite networks using random walks. In: Complex Networks V. Springer (2014)
ArcGIS: Arcgis.com (2019). https://arcgis.com/. Accessed 15 Mar 2019
Buyukokkten, O., Cho, J., GarciaMolina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of web pages. ilpubs.stanford.edu (1999)
Chen, H., Yin, H., Chen, T., Wang, W., Li, X., Hu, X.: Social boosted recommendation with folded bipartite network embedding. IEEE Trans. Knowl. Data Eng. (2020)
Chen, H., Yin, H., Wang, W., Wang, H., Nguyen, Q.V.H., Li, X.: PME: projected metric embedding on heterogeneous networks for link prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1177–1186 (2018)
Chen, L., Yuan, F., Jose, J.M., Zhang, W.: Improving negative sampling for word representation using selfembedded features. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 99–107. ACM (2018)
Chen, Y., Wang, C.: Hine: Heterogeneous information network embedding. In: International Conference on Database Systems for Advanced Applications, pp. 180–195. Springer (2017)
Cheng, T.K., Von Behren, J.R.: Locationbased searching using a search area that corresponds to a geographical location of a computing device. US Patent 8,386,514 (2013)
CityOfNewYork: cityofnewyork (2019). https://opendata.cityofnewyork.us/. Accessed 15 Mar 2019
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)
Deng, H., Lyu, M.R., King, I.: A generalized cohits algorithm and its application to bipartite graphs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248. ACM (2009)
Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144. ACM (2017)
Engilberge, M., Chevallier, L., Pérez, P., Cord, M.: Deep semanticvisual embedding with localization (2018)
Foursquare: foursquare.com (2019). http://foursquare.com. Accessed 15 Mar 2019
Fuglede, B., Topsoe, F.: Jensen–Shannon divergence and Hilbert space embedding. In: Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on, p. 31. IEEE (2004)
Gao, M., Chen, L., He, X., Zhou, A.: Bine: bipartite network embedding. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 715–724. ACM (2018)
H. Butler, M.Daly, A. Doyle, S. Gillies, S. Hagen, T.Schaub: The GeoJSON Format. RFC 7946, RFC Editor (2016). https://doi.org/10.17487/RFC7946. http://www.rfceditor.org/rfc/rfc7946.txt
He, T., Yin, H., Chen, Z., Zhou, X., Sadiq, S., Luo, B.: A spatialtemporal topic model for the semantic annotation of POIS in LBSNS. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 12 (2016)
Hellinger, E.: Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal für die reine und angewandte Mathematik 136, 210–271 (1909)
Hong, R., He, Y., Wu, L., Ge, Y., Wu, X.: Deep attributed network embedding by preserving structure and attribute information. IEEE Trans. Syst. Man Cybern. Syst. (2019)
Im, D.J., Verma, N., Branson, K.: Stochastic neighbor embedding under fdivergences. arXiv preprint arXiv:1811.01247 (2018)
Jenkins, P., Farag, A., Wang, S., Li, Z.: Unsupervised representation learning of spatial data via multimodal embedding. In: CIKM (2019). https://doi.org/10.1145/3357384.3358001
Jiang, M., Cui, P., Yuan, N.J., Xie, X., Yang, S.: Little is much: bridging crossplatform behaviors through overlapped crowds. In: AAAI, pp. 13–19 (2016)
Kim, K.M., Kwak, D., Kwak, H., Park, Y.J., Sim, S., Cho, J.H., Kim, M., Kwon, J., Sung, N., Ha, J.W.: Tripartite heterogeneous graph propagation for largescale social recommendation. arXiv preprint arXiv:1908.02569 (2019)
Kleinberg, J.M.: Hubs, authorities, and communities. ACM Comput. Surv. (CSUR) 31(4es), 5 (1999)
KlimanSilver, C., Hannak, A., Lazer, D., Wilson, C., Mislove, A.: Location, location, location: The impact of geolocation on web search personalization. In: Proceedings of the 2015 Internet Measurement Conference, pp. 121–127. ACM (2015)
Lee, J.A., Renard, E., Bernard, G., Dupont, P., Verleysen, M.: Type 1 and 2 mixtures of Kullback–Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112, 92–108 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
NPS: Statue of liberty, park statistics (2019). https://www.nps.gov/stli/learn/management/parkstatistics.htm. Accessed 11 July 2019
NYCDataBeta: Nyc geojson dataset (2019). http://data.beta.nyc/dataset?res_format=GeoJSON. Accessed 15 Mar 2019
nyc.gov: Nyc open data (2019). https://opendata.cityofnewyork.us/. Accessed 15 Mar 2019
Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Triparty deep network representation. Network 11(9), 12 (2016)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Perozzi, B., AlRfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. 101(9), 2658–2663 (2004)
Raymond, J.W., Gardiner, E.J., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631–644 (2002)
Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.: Multiinstance visualsemantic embedding. arXiv preprint arXiv:1512.06963 (2015)
Shen, J., Cheng, T.: Semantic enrichment of interesting regions with POI data. In: Proceedings of the GISRUK Conference, Manchester, pp. 1–5 (2017)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Largescale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Thearling, K.: An introduction to data mining. Direct Marketing Magazine, pp. 28–31 (1999)
Twitter: twitter.com (2019). https://twitter.com. Accessed 15 June 2019
Wang, Y., Qin, Z., Pang, J., Zhang, Y., Xin, J.: Semantic annotation for places in LBSN through graph embedding. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp. 2343–2346. ACM (2017)
Wang, Y., Wang, L., Li, Y., He, D., Liu, T.Y.: A theoretical analysis of NDCG type ranking measures. In: Conference on Learning Theory, pp. 25–54 (2013)
Wikipedia: Region of interest, wikipedia, the free encyclopedia (2019). https://en.wikipedia.org/wiki/Region_of_interest. Accessed 11 July 2019
Wikipedia: Wikipedia, the free encyclopedia (2019). https://www.wikipedia.org/. Accessed 15 June 2019
Xie, M., Yin, H., Wang, H., Xu, F., Chen, W., Wang, S.: Learning graphbased poi embedding for locationbased recommendation. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 15–24. ACM (2016)
Xie, M., Yin, H., Xu, F., Wang, H., Zhou, X.: Graphbased metric embedding for next POI recommendation. In: International Conference on Web Information Systems Engineering, pp. 207–222. Springer (2016)
Yang, D., Zhang, D., Zheng, V.W., Yu, Z.: Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNS. IEEE Trans. Syst. Man Cybern. Syst. 45(1), 129–142 (2015)
Yin, H., Wang, W., Wang, H., Chen, L., Zhou, X.: Spatialaware hierarchical collaborative deep learning for POI recommendation. IEEE Trans. Knowl. Data Eng. 29(11), 2537–2551 (2017)
Yu, L., Zhang, C., Pei, S., Sun, G., Zhang, X.: Walkranker: a unified pairwise ranking model with multiple relations for item recommendation. In: AAAI. AAAI (2018)
Yule, G.U.: On sentencelength as a statistical characteristic of style in prose: with application to two cases of disputed authorship. Biometrika 30(3/4), 363–390 (1939)
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008)
Zhang, C., Zhang, K., Yuan, Q., Peng, H., Zheng, Y., Hanratty, T., Wang, S., Han, J.: Regions, periods, activities: uncovering urban dynamics via crossmodal representation learning. In: Proceedings of the 26th International Conference on World Wide Web, pp. 361–370. International World Wide Web Conferences Steering Committee (2017)
Zhang, Z., Zweigenbaum, P.: Gneg: Graphbased negative sampling for word2vec. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 566–571 (2018)
Zhao, S., Zhao, T., King, I., Lyu, M.R.: Gtseer: geotemporal sequential embedding rank for pointofinterest recommendation. arXiv preprint arXiv:1606.05859 (2016)
Acknowledgements
We thank our colleagues who helped in crowdsourcing dataset for experiments. We would also like to show our gratitude to Sunipa Dev for early discussion on this topic, and initial help in processing data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Paul, D., Li, F. & Phillips, J.M. Semantic embedding for regions of interest. The VLDB Journal 30, 311–331 (2021). https://doi.org/10.1007/s00778020006470
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778020006470