Introduction

Efficient and effective urban mobility systems are essential for the development of smart and sustainable cities (Gonçalves et al. 2017). To make informed decisions and interventions, it is crucial to understand human mobility patterns and behaviors, particularly in normal and disrupted situations (e.g., congestion, pandemics, and natural disasters). The availability of large-scale and diverse mobility data (for example, GPS trajectory data, smart card data in public transport, and geotagging social media data), enables a detailed understanding of human mobility patterns at a granular level.

Numerous studies have explored travel patterns and the factors influencing them using mobility data (Zhao et al. 2020; Gao et al. 2021; Hu et al. 2020). However, traditional methods face challenges in storing, processing, and analyzing large-scale mobility data. Relational databases and flat file systems may struggle to handle the complexity and volume of mobility data, leading to issues such as redundancy, slow queries, and limited scalability. Traditional data processing methods may also struggle to extract valuable insights from diverse and heterogeneous mobility data due to difficulties in analyzing complex relationships and patterns. Finally, these techniques often focus on static representations, failing to capture the temporal and spatial dynamics of mobility data and limiting our understanding of dynamic human mobility patterns and underlying factors.

The utilization of knowledge graph (KG) offers several potential solutions to address these challenges. The KG, a graph-based knowledge representation method, provides a flexible and scalable framework for organizing and representing complex relationships and dependencies within the data (Yu et al. 2017). It provides a flexible and scalable framework for organizing and analyzing complex relationships within mobility data. By structuring the data as a graph, KG enables efficient storage, retrieval, and analysis of large-scale mobility data. Moreover, KG allows for the integration of diverse data sources, facilitating a comprehensive understanding of mobility patterns. The graph-based structure of KG captures both the spatial and temporal aspects of mobility data, enabling the analysis of dynamic patterns and trends. Furthermore, KG’s ability to capture semantic relationships between entities enhances its analytical capabilities and supports advanced data mining techniques. KG has gained attention and been successfully applied in various domains, including medical, web search, journalism, entertainment, network security, and pharmaceuticals (Zhang et al. 2020; Kim 2017; Zhang et al. 2021; Jia et al. 2018; Berven et al. 2020).

However, to the best of our knowledge, there is no existing approach that properly models mobility data based on KG. Most approaches treat mobility data as static KG, lacking the ability to capture the dynamic nature of mobility behavior (Deng et al. 2020; Chen et al. 2021; Liao et al. 2021; Trivedi et al. 2017; Mezni 2021). Static KG refers to a KG that does not incorporate time-related information and the relations between entities are fixed and do not evolve or vary with time (Chen et al. 2020). Despite the extensive research on static KGs in various domains, they are insufficient for capturing the time-varying characteristics of mobility behavior, which is highly dynamic in nature. Considering the complexity of mobility patterns, developing a comprehensive knowledge graph system is still a considerable distance away at this time point.

In light of these, in this paper, we aim to provide as much inspiration to the researchers who are interested in/devoted to addressing the mobility knowledge graph (MKG) modeling problem by defining the MKG framework and illustrating the value of MKG in driving an essential application in public transport. We first present a comprehensive review of relevant studies, covering various aspects of KG. We have made utmost efforts to ensure that this part covers various aspects of KG, providing a comprehensive and "one-stop" reading experience of the knowledge related to MKG. Furthermore, we discuss the principles of MKG modeling by summarizing the findings from the reviewed KG studies. In the second part of the paper, we propose an initial MKG construction example using AFC data as well as its application. The purpose of this example is not to create a best model for the studied problem (which we believe is still a work in progress), but rather to showcase the construction process based on the principles discussed in the review section. It is important to note that this example is not perfect, as demonstrated in the corresponding case study, but we anticipate it will serve as a valuable source of inspiration for further optimization and advancement in MKG modeling. Therefore, the objective of this paper is to provide a foundation for effectively representing and expressing mobility data through the KG method. To achieve this, the paper conducts a comprehensive review of existing studies on KG development and focuses on discussing the principles of MKG modeling. By doing so, the paper aims to offer valuable insights and guidance for future research in public transport, ultimately contributing to the advancement and application of KG in the context of mobility data. The main contributions are summarized as follows:

  • Review and synthesize existing KG studies in different application areas and introduce the concept of MKG.

  • Propose a learning framework to construct MKG from smart card data. It captures the spatiotemporal travel pattern correlations between stations using both rule-based linear decomposition and neural network-based nonlinear decomposition methods;

  • Conduct a case study to validate the MKG construction framework and explore the value of MKG in the application of individual trips destination prediction using only tap-in records.

The remaining paper is organized as follows: Sect. 2 reviews knowledge graph studies in the literature, summarizes KG definitions in different domain areas, and synthesizes general KG construction and implementation pipelines. Section 3 proposes a general method of constructing MKG from mobility trajectory data using matrix decomposition methods, including rule-based and neural network-based models. The case study in Sect. 4 validates the MGK construction method using smart card data by comparing it with benchmark models. It also illustrates the value of MKG in contributing to a typical problem of predicting individual trip destinations for public transport systems with only tap-in records. Section 5 summarizes the main findings and future research directions.

Review of knowledge graph

To survey the knowledge graph studies in different domains, we systematically searched academic publications in Scopus and Web of Science with publication dates ranging from January 2015 to December 2022. We searched the article title using keywords including ‘knowledge graph’ and ‘knowledge base’. After screening for irrelevant and repeated papers, we retrieved a total of 6238 articles for analysis. Figure 1 shows the number of articles published each year. It shows a fast increasing number of KG publications in recent years, particularly after 2018.

Fig. 1
figure 1

Number of articles on knowledge graphs published in 2015–2021

KGs have been successfully applied and widely promoted in many fields. We analyzed the research status of KG in typical application domains (e.g., medical, education, financial, energy, etc.) according to the number of published articles. Figure 2 shows the distribution of KG studies (in total 689 papers) in different application domains from 2015 to 2022. Note that papers focusing on purely methodological development are not considered in this analysis. The most studied area is Medical/Drug, followed by Enterprise/Financial and Education/Course. Compared to other areas, the area of urban transportation is far less studied but starting to emerge. For example, Tan et al. (2021) developed an urban traffic knowledge graph to integrate multi-source transportation data. It is used for traffic knowledge discovery and queries the shortest distance path between two nodes. Hu et al. (2022) proposed an individual travel knowledge graph to identify the public transportation commuters based on individual travel chain information from smart card transaction data and travel survey data. Shan and Cao (2017) proposed an urban knowledge graph, which composes multiple traffic topics (e.g., forecast of pedestrian volume), entities (e.g., POIs, weather, time slots, etc.), attributes (e.g., temperatures, air pressures) extracted from traffic text knowledge. It performed better than traditional statistical methods for inferring pedestrian volumes based on a hybrid reasoning algorithm. Wang et al. (2022) proposed a metro traffic flow prediction method based on knowledge graph representation learning and spatiotemporal graph neural network (KGR–STGNN). The metro knowledge graph extract entities (e.g., stations, routes, traffic events, and passenger flow, etc.), relations (e.g., the adjacent relations between stations, the attribution relations between stations and lines) and attribute of entities (e.g., station name, station id, latitude and longitude,etc.) from static and dynamic traffic data sources (e.g., metro network data, metro card swiping data, points of interest data). It used to store and represent factors related to metro traffic networks and facilitate to incorporate the influence of external factors into the prediction model based on the spatiotemporal graph neural network.

Fig. 2
figure 2

Research statistics on domain knowledge graph

Knowledge graph definition

Table 1 summarizes the domain knowledge graph definitions in different industries. The definitions share the same KG structure regardless of the application area. The KG is a directed labeled graph and its basic unit is a triple<entity, relation, entity>. For example, Elhammadi et al. (2020) developed the financial knowledge graph, where the triple form of<Steve Jobs, founderOf, Apple> means that Steve Jobs is the founder of Apple. In some research fields, the KG is represented in a quadruple form<entity, relation, entity, property>, in which the ‘property’ is used to describe attributes of entities or relations. For example, the quadruple<Lung cancer, disease_related_symptom, bloody sputum, probability 8.89%> means that the probability of bloody sputum for lung cancer is 8.89% (Li et al. 2020).

Table 1 Selected domain knowledge graphs and their definitions

Conceptually, the entities (e.g., people, places, objects) in KG describe the subjects or objects in the studied domain. The relations (e.g., being married to, being located in) describe the factual/semantic relation between two entities. One common finding of the KG definition is that the KG only allows a single directional relation between an entity pair. Graphically, the KG can have one and only one directional edge between any two nodes. This may constrain its application in the mobility context where, for example, there could be more than one activity (relation) between a location (entity) pair such as work and shopping.

KG development pipeline

The studies on KG mainly focuses on the construction methodology and applications based on KG. Based on the synthesis of the literature, Fig. 3 summarizes the general development pipeline of KG. It includes two key components: construction (C) and application (A). The objective of KG construction is to represent knowledge/data as the specific organization structure (i.e., triple or quadruple form). It mainly includes data acquisition and information extraction. The data used to construct KG can be either structured (e.g., trip records), semi-structured (e.g., GPS trajectories), or unstructured data (e.g., video/text data). The KG construction also depends on the stored information type (Table 2), including time-series data (TS, information with time-tagged attributes and attribute dependencies over time), panel data (P, information with time-tagged attributes), and cross-sectional data (CS, information with attributes).

Fig. 3
figure 3

General development pipeline of knowledge graph

Table 2 Different types of data

Each data source requires an information extraction process to obtain entities, relations, and attributes used for KG construction. Generally, the information extraction methods can be categorized into four classes: rule-based (R), statistics-based (S), dictionary-based (D), and machine learning-based (ML).

In the application stage, the KG takes inputs from the extracted information and represents them in triple (T) or quadruple (Q) forms. Applications can be divided into basic and advanced applications. Basic applications are driven by KG itself, such as knowledge visualization (KV), knowledge retrieval (KR), knowledge inference (KI), and question and answering (Q &A). Among them, knowledge inference refers to mining or inferring unknown/implicit semantic relations between entities based on existing facts/incomplete relations in KG.

Based on the constructed KG, entities and relations can be embedded into the continuous vector space through KG embedding methods to assist advanced machine learning applications (e.g., personalized recommendation, auxiliary decision-making, medical diagnosis, etc.). Machine learning applications can be categorized, according to the type of response (decision) variables, into classification (CL, predicting a discrete class label), regression (RE, predicting a continuous quantity), and trajectory planning (TP, a series of observations with time attributes).

Based on the above analysis, Table 3 summarizes selected domain KG studies using the above taxonomy, including research problems, domain, data type, KG form, information extraction, KG embedding, basic applications, and advanced applications. The review shows a wide span of application domains using KGs, but very few in transportation. Almost all studies have used triple forms with a few using quadruple forms, and all studies have allowed only a single directional relation between entities. Most studies construct KGs using cross-sectional data with no time attribute, or panel data with time attributes but no serial dependency. No study has been found on KGs capturing time-series data (i.e., order and dependencies) which is critical in urban mobility areas. As discussed before, the single directional relation triple form may limit KG’s capability to fully model mobility problems, such as probabilistic activities between two stations.

For information extraction, most studies used rule-based or machine learning methods depending on data types. The KG embedding learns a low-dimensional representation of entities and relations, which could be roughly divided into distance-based models (e.g., TransD, TransE), semantic models (e.g., SemaTyp), and neural network-based models (e.g., CNN, GCN). The advanced applications cover both classification and regression problems, but the number of applications in classification is overwhelming, while no study was reported on trajectory planning problems (predicting a series of values).

Table 3 Selected knowledge graph studies

Critical analysis of KG in transportation

While our literature review provides a comprehensive survey of existing KG construction and applications, it is imperative to delve deeper into the nuances and intricacies that could inform the development of the MKG. To this end, we critically analyze the literature to extract key elements and lessons that could be transposed to enhance the KG applied in transportation area.

  1. 1.

    Form and directionality constraints Conventional KGs typically employ triple forms and restrict entities to singular, unidirectional relationships. This approach may not suffice for the transportation domain, where entities often have complex, bidirectional interactions. For instance, the same pair of locations could be linked by different types of trips, like commuting in the morning and returning in the evening, each with its own distinct characteristics and implications for transport planning.

  2. 2.

    Dynamics of mobility Standard KGs often have static structures that fail to reflect the fluid nature of transportation systems, where the significance of connections between locations can shift according to temporal patterns, such as rush hours, weekends, or seasonal variations, as well as due to individual user preferences and behaviors. Capturing these dynamics is crucial for accurate modeling and forecasting in urban transport systems.

  3. 3.

    Multidimensional relations The multifaceted nature of transportation necessitates KGs that can encapsulate the diverse range of human activities occurring between the same origin and destination within specific time frames. For example, a single journey may encompass multiple purposes-commuting, shopping, leisure-that are not mutually exclusive. This complexity is particularly pronounced during peak hours when the confluence of activities is at its highest. A more nuanced, multidimensional approach to KGs could provide a richer, more accurate representation of such intricate human mobility patterns.

  4. 4.

    Temporal data handling Existing KGs lack the capability to handle time-series data, a critical feature for accurately reflecting the ever-changing landscape of urban mobility. To truly capture the temporal dynamics of transportation-such as peak travel times, seasonal variations, and event-driven traffic fluctuations-KGs need to evolve to process and analyze data that is inherently sequential and time-dependent.

  5. 5.

    Data integration KGs in transportation should not only integrate data from various sources (e.g., traffic sensors, smart cards, GPS traces) but also reconcile the disparate temporal scales and spatial granularities. Integrating high-frequency GPS data with lower-frequency transit records, and matching precise GPS locations with broader transport zones, is crucial for a nuanced view of travel behaviors and advanced transport system planning.

  6. 6.

    Applications The literature shows a preponderance of KGs in basic applications like knowledge retrieval and inference. However, advanced applications such as personalized recommendations and decision support systems are less explored in transportation. Moreover, there is a notable gap in KG applications for trajectory planning.

In summary, while the literature lays a foundational understanding of KGs, it also underscores the limitations of traditional KGs and the gaps that must be addressed for their effective application in transportation. Our proposed MKG framework is designed to address these critical elements, thereby enhancing knowledge representation and management within the transportation domain.

Methodology

Problem definition

Following the reviewed KG definitions in Table 1, we define the mobility knowledge graph (MKG) as a time-based knowledge graph, which is composed of sub-graphs for different time intervals. Each sub-graph is a quadruple form of \(\langle location, \ relation, \ location, \ time \ period \rangle\) composed of nodes, edges, and properties, in which nodes represent locations and edges represent the relations (e.g., travel activity) between locations during a certain time period. For example, (location A, work, location B, 07:30-09:30) indicates that travels between locations A and B during the time 07:30-09:30 are mostly likely for work activities.

We define an individual mobility trajectory as a set of trips \(Tr\ =\{ tr_1,\ tr_2,\ tr_3,\ \ldots \}\), with each trip tr consisting of \(\{ origin \ location, \ start \ time, \ destination \ location, \ arrival \ time \}\). The MKG construction problem is:

  • Given the individual mobility trajectories Tr, extract the hidden relations \({\textbf{R}}\) between locations \({\textbf{V}}\) capturing spatiotemporal correlations of travels during time period p and construct the mobility knowledge graph \({\textbf{G}}\ ({\textbf{V}},{\textbf{R}}, p)\) in a form of < location, relation, location, time period >.

Figure 4 shows the framework of MKG construction from mobility data. The input of the framework is the individual mobility trajectories, and the output is the mobility knowledge graph \({\textbf{G}}\). The construction contains two core parts: stations’ latent representation and station-station relations extraction. To generate these, we construct the temporal station demand matrix \({\textbf{T}}\) and OD (origin–destination) flow matrix \({\textbf{S}}\) from individual mobility trajectories, then extract the temporal and spatial information, respectively. It’s important to emphasize that our study focuses on constructing the MKG under the current stable system state, without considering the automatic updating of the MKG to reflect operational changes or network modifications.

Fig. 4
figure 4

Framework for MKG construction from mobility data

MKG construction in public transport

For the MKG in public transport, entities could be stations, city grid cells, and points-of-interest, and relations are travel activities such as shopping, working, and school. In practice, the activity (relationship) between an OD pair could vary for different time periods. Therefore, we develop a set of sub-graphs corresponding to different times, such as morning and afternoon peaks. Given the requirement of the single directional relationship between entities in KG, we define the relation as the most likely travel activity between stations. Other relation definitions could also be adopted, resulting in a set of sub-graphs pertaining to different applications, such as the least likely activity between stations.

MKG construction in public transport consists of two key parts: entity extraction and relation extraction. Entity extraction refers to extracting different station entities (including subway stations, train stations, or bus stops) from mobility data. Unlike GPS trajectory data, passengers’ boarding and alighting stations are usually recorded in the smart card data. The relation represents the spatiotemporal travel pattern correlations between two stations within a certain period. Since it cannot be observed directly from the trajectory data, the objective of relation extraction is to extract the implicit relation between the origin and destination stations. Therefore, to obtain the implicit station-station relations, stations also should be described as the latent representation and all variables in this study are calculated in the latent space.

Stations’ latent representation

For each station, we construct the station’s latent vector \({\textbf{v}}=[{\textbf{v}}^t;{\textbf{v}}^s]\) capturing spatiotemporal characteristics of station demand patterns, where \({\textbf{v}}^t\) and \({\textbf{v}}^s\) are the station’s temporal and spatial latent vectors, respectively. The travel characteristics of each station are time-varying. For example, the number of visitors at a station may be quite different between peak and off-peak hours. Therefore, it is important to extract the visiting features of stations based on time.

We first select the time period p (e.g., weekdays of a specific week), then construct the temporal station demand matrix \({\textbf{T}}\in {\mathbb {R}}^{n\times a}\), where n is the number of stations in the system and a is the number of hours within time period p. An element \(T_{ij}\) in matrix \({\textbf{T}}\) represents the number of visiting to \(i-{th}\) station in the \(j-{th}\) hour. According to the Nonnegative Matrix Factorization (NMF) theory (Lee and Seung 2000), the non-negative matrix \({\textbf{T}}\) can be decomposed into two non-negative matrices, which is utilized to extract the visiting feature of each station. The matrix \({\textbf{T}}\) can be decomposed into:

$$\begin{aligned} {\textbf{T}} \approx {\textbf{V}}^{t}{\textbf{Q}} \end{aligned}$$
(1)

where matrix \({\textbf{V}}^t\in {\mathbb {R}}^{n\times k_t}\) is the stations’ temporal latent representation, \({\textbf{Q}}\in {\mathbb {R}}^{k_t\times a}\) is the coefficient matrix and \(k_t\) is the number of stations’ temporal latent features.

To obtain the spatial features between OD pairs, the OD flow matrix \({\textbf{S}}\in {\mathbb {R}}^{n\times n}\) is constructed, in which an element \(S_{od}\) in matrix \({\textbf{S}}\) represents the number of visiting (normalized value) from origin to destination station. Given the OD flow matrix \({\textbf{S}}\), through square matrix factorization (Klema and Laub 1980), it can be decomposed into:

$$\begin{aligned} {\textbf{S}}\approx {\textbf{V}}^{s} {\textbf{D}} {\textbf{V}}^{s T} \end{aligned}$$
(2)

where matrix \({\textbf{V}}^s\in {\mathbb {R}}^{n\times k_s}\) is the stations’ spatial related latent representation; \({\textbf{V}}^{s^T}\) is the transpose of \({\textbf{V}}^s\); \({\textbf{D}}\in {\mathbb {R}}^{k_s\times k_s}\) is the feature diagonal matrix and \(k_s\) is the number of spatial latent features. The OD flow matrix \({\textbf{S}}\) describes the spatial correlation (dependencies) between stations, which can be regarded as the embedding of the system’s topology. Similarly, the matrices \({\textbf{V}}^s\) and \({\textbf{D}}\) are the stations’ embedding and the interactions in the latent space, respectively.

Combining Eqs. (1) and (2), the matrices \({\textbf{V}}^t\), \({\textbf{V}}^s\), \({\textbf{Q}}\) and \({\textbf{D}}\) can be learned by solving the following optimization problem:

$$\begin{aligned} O b j_{1}=\min _{{\textbf{V}}^{t}, {\textbf{V}}^{s}, {\textbf{Q}}, {\textbf{D}} \ge 0}\left\| {\textbf{S}}-{\textbf{V}}^{s} {\textbf{D}} {\textbf{V}}^{s T}\right\| ^{2}+\left\| {\textbf{T}}-{\textbf{V}}^{t} {\textbf{Q}}\right\| ^{2}+\lambda \left[ \left\| {\textbf{V}}^{t}\right\| ^{2} +\left\| {\textbf{V}}^{s}\right\| ^{2}+\Vert {\textbf{Q}}\Vert ^{2}+\Vert {\textbf{D}}\Vert ^{2}\right] \end{aligned}$$
(3)

where \(\left\| \cdot \right\|\) denotes the Frobenius norm; \(\lambda\) is the regularization rate. The first and second term captures the spatial and temporal latent patterns, respectively. The last term is the regularization for penalizing the norm of \({\textbf{V}}^t\), \({\textbf{V}}^s\), \({\textbf{Q}}\) and \({\textbf{D}}\).

By solving Eq. (3), the stations’ temporal and spatial related latent matrices \({\textbf{V}}^t\) and \({\textbf{V}}^s\) are extracted, and the stations’ temporal and spatial latent vectors \({\textbf{v}}^t\) and \({\textbf{v}}^s\) are obtained. After this step, the stations’ temporal and spatial related latent representation \({\textbf{V}}=[{\textbf{V}}^t;{\textbf{V}}^s]\) is constructed.

Relation extraction

Relation extraction for KGs is an extensive and profound research topic and most studies focus on supervised learning methods (Smirnova and Cudré-Mauroux 2018; Nayak et al. 2021; Wang et al. 2022; Liu 2020). However, unsupervised learning methods are more suitable for our research problem since the edges in MKG do not have any semantic labels. Here we introduce two approaches to extract ‘hidden’ directional relations between stations.

Rule-based relation extraction The relation between the two stations describes the specific travel pattern of these two stations under a certain time period. Therefore, it can be regarded as a constraint on these two stations. To distinguish the relation between different stations and the relation between the same two stations under different time periods, a simple way is to represent the relation as a concatenation of the temporal and spatial features of the two stations.

Figure 5 shows the rule-based relation extraction framework. In the rule-based model, the directional relationship between two stations is formed by concatenating the temporal feature of the origin station and the spatial feature of the destination station, that is, \({\textbf{r}}_{od}=[{\textbf{v}}_o^t;{\textbf{v}}_d^s]\), where \({\textbf{r}}_{od}\) is the latent vector of relation, \({\textbf{v}}_o^t\) the temporal latent vector of origin station, and \({\textbf{v}}_d^s\) the spatial latent vector of destination station. For the example in Fig. 5, we have the origin stations set and destination stations set, and the latent representation of both is matrix \({\textbf{V}}=[{\textbf{v}}_1,{\textbf{v}}_2,{\textbf{v}}_3,\ldots ,\ {\textbf{v}}_n]\). For station vector \({{\textbf{v}}}_i\), it consists of the temporal latent vector \({\textbf{v}}_i^t\) and spatial latent vector \({\textbf{v}}_i^s\) with a dimension of \(1\times 5\), whose values are colored in orange and green, respectively. For a selected OD pair \({\textbf{v}}_o\) and \({\textbf{v}}_d\), the directional relation \({\textbf{r}}_{od}\) is obtained by concatenating the temporal feature of the origin station (light orange) and the spatial feature of the destination station (dark green).

Fig. 5
figure 5

Rule-based relation extraction

By solving Eq. (3), the stations’ temporal and spatial related latent matrices \({\textbf{V}}^t\) and \({\textbf{V}}^s\) are extracted and the stations’ temporal and spatial latent vectors \({\textbf{v}}^t\) and \({\textbf{v}}^s\) are also obtained. Thus, the relation between stations is obtained.

After learning all relations, a clustering algorithm, such as k-means, may be used to classify all candidate relations. A cluster represents a kind of relation that may be mapped to a semantic space understandable by humans (illustrated in Fig. 8 in the Case Study Section). However, the relations between multiple origins and one destination most likely will be divided into different clusters. That is to say, within a certain time interval, the relations between different origin stations and one destination are different. These may cause inconsistent relations in MKG as these edges share the same activity in the real world. Therefore, this method is more suitable for extracting 1-to-1 relations. The rule-based method has difficulty obtaining relations of multiple origins to one destination (N-to-1 relation), and the relation of one origin to multiple destinations (1-to-N relation).

Neural network (NN)-based relation extraction

There are also nonlinear methods for matrix decomposition, such as neural networks (Zhuang et al. 2017; Dziugaite and Roy 2015). They can extract latent representations of directional relations simultaneously capturing spatiotemporal travel patterns of OD pairs (rather than arbitrarily concatenating spatial and temporal patterns). This paper develops a relation extraction method based on the neural network (NN) model used in (Zhuang et al. 2017), to simultaneously extract the spatiotemporal latent representation of stations and relations.

In the OD flow matrix \({\textbf{S}}\), each entry \(s_{od}\) records the passenger flow from the origin station to the destination station, and its value is determined by the relation between these two stations. Therefore, the relation can be extracted from the entry \(S_{od}\). Moreover, to avoid the limitations of the previous method, the stations’ and relations’ spatiotemporal latent representation are extracted simultaneously and are defined as \({\textbf{V}}^{ts}\in {\mathbb {R}}^{n\times k_{ts}}\) and \({\textbf{R}}^{ts}\in {\mathbb {R}}^{m\times k_r}\), respectively, where n and m are the numbers of stations and relations in the network; \(k_{ts}\) and \(k_r\) are the number of latent features of stations and relations. Therefore, the objective of the relation extraction is to map \(({\textbf{v}}_o,{\textbf{r}}_{od},{\textbf{v}}_d)\) to the nonzero entry \({S}_{od}\).

The relation extraction problem is defined as: Given a training set \(\{([{\textbf{v}}_1;{\textbf{r}}_{12};{\textbf{v}}_2],s_{12}),([{\textbf{v}}_1;{\textbf{r}}_{13};{\textbf{v}}_3],s_{13}),\cdots ,([{\textbf{v}}_n; {{\textbf{r}}_{n(n-1)}};{\textbf{v}}_{n-1}],s_{n(n-1)})\}\), for each input vector \({\textbf{x}}=\left[ {\textbf{v}}_{o}; {r_{o d}}; {\textbf{v}}_{d}\right] \in {\textbf{X}}\) and corresponding output value \(S_{od}\in {\textbf{S}}\), the goal is to learn a function \(f:X\rightarrow S\) mapping inputs to outputs. For each input vector x, the feed-forward NN-based \(f_{NN}(x)\) is defined as:

$$\begin{aligned} f_{NN}({\textbf{x}})={\textbf{W}}_{2} {\text {Sigmoid}}\left( \left( {\textbf{x}} {\textbf{W}}_{1}\right) ^{T}\right) \end{aligned}$$
(4)

where matrix \({\textbf{W}}_1\in {\mathbb {R}}^{\left( 2k_{ts}+k_r\right) \times w}\) and \({\textbf{W}}_2\in {\mathbb {R}}^{1\times w}\) represent the weights of the first layers and second layers; w is the number of hidden neurons and Sigmoid() is the activation function. To specify a relation \({\textbf{r}}_{od}\) in each quadruple \(({\textbf{v}}_o, {\textbf{r}}_{od},{\textbf{v}}_d, p)\), a latent indexing vector \({\textbf{h}}\in {\mathbb {R}}^{1\times m}\) is introduced, and the specific latent relation is \({{\textbf{r}}_{od}}={{\textbf{h}}}{{\textbf{R}}}^{ts}\). Therefore, the input vector \({\textbf{x}}\) can be represent as \({\textbf{x}}=[{\textbf{v}}_o;{{\textbf{h}}}{{\textbf{R}}}^{ts};{\textbf{v}}_d]\in {\mathbb {R}}^{1\times (2k_{ts}+k_r)}\), which construct by concatenating the origin station’s latent vector \({\textbf{v}}_o\), the relationship’s latent vector \({\textbf{r}}_{od}\), and the destination station’s latent vector \({{\textbf{v}}}_d\) horizontally.

By considering both temporal and spatial features, i.e., Eqs. (1) and (4), the stations’ latent representation \({\textbf{V}}^{ts}\), and relation \({\textbf{R}}^{ts}\) can be learned by minimizing the following objective function:

$$\begin{aligned} \begin{aligned} Obj_{2} =&\min _{{\textbf{V}}^{s}, {\textbf{R}}^{s}, {\textbf{H}}, {\textbf{Q}} \ge 0} \sum \limits _{\forall \left( {\textbf{v}}_{o}, {\textbf{r}}_{od}, {\textbf{v}}_{d}\right) \in {\textbf{S}}} \left( S_{od}-f_{NN}({\textbf{x}})\right) ^{2}+ \left\| {\textbf{T}}-{\textbf{V}}^{t s} {\textbf{Q}}\right\| ^{2} \\&+\lambda \left[ \left\| {\textbf{V}}^{t s}\right\| ^{2}+\Vert {\textbf{h}}\Vert ^{2}+\left\| {\textbf{R}}^{t s}\right\| ^{2}+\Vert {\textbf{Q}}\Vert ^{2}\right] \end{aligned} \end{aligned}$$
(5)

where the stations’ latent spatiotemporal representation \({\textbf{V}}^{ts}\) is shared in the first and second terms and constrained by the first and second terms simultaneously. The parameter \(\lambda\) is the regularization rate. The last term is the regularization for penalizing the norm of \({{\textbf{V}}}^{ts}\), \({\textbf{R}}^{ts}\), \({\textbf{Q}}\), and \({\textbf{h}}\).

By solving Eq. (5), the hidden relations \({\textbf{R}}^{ts}\) are extracted. Compared to the rule-based relation extraction, the NN-based relation extraction can simultaneously learn the temporal and spatial features of stations, thus theoretically achieving a better performance. Mathematically, the two optimization problems in Eqs. (3) and (5) are both matrix decomposition problems, though with different objectives and decision variables. We develop an iterative relation extraction algorithm based on the gradient descent algorithm to solve these problems. To illustrate the algorithm, we use Eq. (5) as an example and present the algorithm pseudo-code in Algorithm 1. Each element in the OD flow matrix is iterated over once to extract relationships, resulting in a computational complexity of O(\(n^2\)), where n is the number of stations.

Algorithm 1:
figure a

Relation extraction algorithm

Case study

Experiment setting

To validate the model performance and explore the value of the knowledge graph in mobility, we design an experiment to predict trip destinations given information of trip origins, trip times, and historical trip activity distribution (from MKG) of individuals. Smart card data from 90 stations for the urban railway system in Hong Kong from January 1st to March 31st, 2018 are used. The smart card data contains both tap-in and tap-out stations and times. The MKG is trained using data from January 1st to February 28th, 2018. The trained MKG is used to generate individual trip activity distribution and constrain the list of candidate stations in predicting individual trip destinations in March 2018.

The latent variables\({\ {\textbf{V}}}^t\),\({\ {\textbf{V}}}^s\),\({\ {\textbf{V}}}^{ts}\), \({\textbf{R}}^{ts}\), \({\textbf{D}}\), \({\textbf{h}}\), \({\textbf{Q}}\) in Eqs. (3) and (5) are estimated by Algorithm 1. For the selection of optimal hyper-parameters, we employed a grid search strategy. The hyper-parameters were varied within the following ranges: the number of latent features \(k_t, k_s, k_{ts}, k_r \in \{5, 10, 15, 20\}\), the number of hidden neurons \(w \in \{10, 20, 30\}\), learning rate \(\alpha \in \{0.1, 0.01, 0.001, 0.0001, 0.00001\}\), regularization term \(\lambda \in \{0.1, 0.01, 0.001\}\), and convergence error \(\varepsilon \in \{0.1, 0.01, 0.001\}\). The optimal hyper-parameters were determined to be \(k_t = 5\), \(k_s = 5\), \(k_{ts} = 10\), \(k_r = 10\), \(w = 20\), \(\alpha = 1e-5\), \(\lambda = 0.01\), and \(\varepsilon = 0.01\). The training of the MKG was performed on a standard laptop equipped with an AMD Ryzen 7 PRO 4750U processor with Radeon Graphics, clocked at 1.70 GHz, and 16 GB of RAM. Our model and associated experiments were implemented using Python version 2.7 on a Microsoft Windows 10 platform, with the PyCharm IDE version 2020.1 facilitating development.

In the rule-based relation extraction, after extracting each OD pair relation, the k-means clustering algorithm is used to cluster the relation types. The number of clusters in k-means is set as the number of relations in \(R^{ts}\) extracted by the NN-based relation extraction method.

We use Accuracy@TopM as a performance measure, which is defined as follows:

$$\begin{aligned} \textit{Accuracy@TopM}=\frac{m}{N} \end{aligned}$$
(6)

where m is the number of trips with the real destination in the Top \(M\in {\mathbb {N}}\) prediction results and N is the total number of trips. For example, suppose there are 1000 trips in total. For each trip, a candidate destination stations list \(\{ d_1,\ d_2,\ d_3,\ldots \}\) with corresponding likelihood \(\{ l_1,\ l_2,\ l_3,\ldots \}\). For \(M = 1\), m is the number of trips whose top 1 candidate set contains the ‘real’ destination station. For \(M = 2\), assuming that the number of trips whose top 2 candidate set contains the ‘real’ destination station is 800, then the Accuracy@TopM\(=0.8\).

In the experiment, we test both morning peak and off-peak trip destination predictions by training the MKG and testing trips in the corresponding periods. We develop three destination prediction models for comparison.

  1. 1.

    Stat_Group It is a statistical model based on user groups. It predicts the trip destination stations based on the ranked list of candidate stations characterized by the trip OD transition probability of a group. The user group is clustered based on user travel features including home station, work station, average daily travel frequency, and the number of visiting stations.

  2. 2.

    MKG_Rule It predicts the trip destination stations based on the probability of candidate destination stations. It is the probability of activity times the conditional probability of the destination station given the activity. The activity distribution is generated from MKG trained using the rule-based relation extraction method.

  3. 3.

    MKG_NN It uses the same destination prediction process as the MKG_Rule method but uses the MKG trained using the NN-based relation extraction.

The objective is to utilize the trained MKG, taking into account the user’s departure time, origin station, and individual historical activities, to infer the user’s likely destination. Initially, based on the departure time, origin, and historical activities, we determine a set of candidate destinations and calculate their conditional probabilities based on these activities. Subsequently, the MKG assesses activity distributions linked to the origin station. This information is then employed to refine and update the probabilities of each candidate destination, effectively narrowing down the options and ranking them based on their likelihood. This process enhances the accuracy of predicting the user’s intended destination.

To further compare the MKG model performance, we calculate the upper bound of the prediction performance which predicts the trip destination stations based on OD transition probabilities calibrated from historical trips of individuals. All models above aim to capture station relations by reconstructing aggregated travel patterns. For individuals, their travel patterns (activities) can be obtained from their historical travel OD pairs. To predict an individual’s trip destination, the best performance is basically the statistical distribution of the individual’s destinations. Thus, the individual OD transition probability is used as the upper bound prediction performance.

Results

Figure 6 shows the model comparisons with the number of relations \(R=30\) during morning peak and off-peak. In general, incorporating MKG significantly improves prediction performance over the Stat_Group model for both peak and off-peak periods. MKG_NN performance is close to the upper bound performance. This has an important implication for trip destination prediction (and also for destination inference) for public transport fare systems with no alighting provision needed. In addition, MKG_NN performs better than the MKG_Rule model since it captures the complex spatiotemporal information simultaneously. Compared to the morning peak, the prediction performance in off-peak is worse for all methods, since the off-peak activities are more diverse and challenging to predict. Stat_Group performs the worst because the candidate stations aggregate a large number of irrelevant stations (noisy data) visited by similar users.

Fig. 6
figure 6

Performance comparison with varying Accuracy@TopM (M=1, 2, 3, 4, 5)

Currently, the primary method for trip destination inference in public transport is the trip chaining model (Liu et al. 2020; Lei et al. 2021; Wang et al. 2011; Jin et al. 2022; Sánchez-Martínez 2017). This approach infers the destination of a smart card transaction using rule-based methods with pre-defined or calibrated threshold values. Since automated fare collection systems record passengers’ boarding information without alighting information, the data of the true destinations was unavailable for validation. Therefore, instead of direct validation, most researchers set an inference rate that corresponds to the proportion of entry-only trips that met several rules they established to infer destinations. In other words, they specified several matching rules, and if a candidate destination satisfies the rules, it is regarded as a potential destination. On the other hand, some studies have validated their method using unique smart card datasets (including travelers’ boarding and alighting details) with an accuracy between 60 and 75%, which are significantly lower than our prediction performance (Fig. 6a) in the morning peak (Trépanier et al. 2007; Alsger et al. 2016; Jung and Sohn 2017; Assemi et al. 2020). In addition, a few studies have validated and achieved relatively good results (about 80%) using survey data (Barry et al. 2002; Dou et al. 2007; Munizaga et al. 2014). However, the sample size of such data sets is small and their results are difficult to generalize. Also, the trip-chaining model requires calibrating or defining a fair amount of threshold parameters, which is usually difficult and time-consuming. Our MKG-NN-based approach can achieve a better performance than benchmark models and close to the theoretical upper performance bound. Importantly, it is purely data-driven which lends its capability for practical generalization and implementation in different cities.

The number of relations impacts the MKG performance in predicting the trip destination. Taking the morning peak and \(M = 1\) as an example, we test different numbers of relations from 10 to 60 with a step of 10. Figure 7 shows the sensitivity analysis with respect to the number of relations in MKG (morning peak, Top1). It shows an increasing trend with a stable performance after the number of relations is 30.

Fig. 7
figure 7

Performance under different number of relations during morning peak

Semantic relations mapping

Translating the latent relations uncovered by the Mobility Knowledge Graph (MKG) into semantic, human-understandable terms is crucial for the practical application of our findings. The inherent challenge lies in the unsupervised nature of MKG construction, which does not inherently provide semantic meaning to the relations it discovers. To tackle this, we have developed a qualitative approach for semantic mapping, as illustrated in Fig. 8.

Our approach begins by considering a set of predefined semantic relations, such as ’work’, ’school’, and ’other’. These categories are informed by typical urban mobility patterns and are chosen for their relevance to the majority of commuter flows. The MKG, constructed using a neural network-based method, identifies numerical relations that are indicative of movement patterns between stations. These numerical relations are initially agnostic of semantic meaning and are denoted as \(r_1\), \(r_2\), and \(r_3\). Each relation is associated with a set of MKG edges, as depicted in Fig. 8(1).

To systematically map these numerical relations to our predefined semantic categories, we leverage auxiliary data sources that provide context about the main activities occurring between certain stations. For instance, household travel surveys can reveal that the predominant movement between stations A and C is for work purposes, while the flow between stations B and F is primarily for school commuting. This contextual information is applied to the corresponding edges within the MKG, as shown in Fig. 8(2).

The next step involves a systematic matching process. By comparing the patterns of the numerical MKG relations from Fig. 8(1) with the activity-based semantic relations from Fig. 8(2), we can deduce a mapping: \(r_1\) corresponds to ’work’, \(r_2\) to ’school’, and \(r_3\) to ’other’. This mapping process is depicted in Fig. 8(3).

Finally, with the mappings established, we update the complete MKG to reflect these semantic relations, resulting in a graph that not only represents the flow of commuters but also the purpose of their journeys. This enriched MKG is presented in Fig. 8(4). This systematic approach to semantic mapping not only enhances the interpretability of the MKG but also its practical utility, as it allows stakeholders to understand and address the specific needs and behaviors associated with different types of commuter flows.

Additionally, other types of data can also be utilized to enrich semantic relationships mapping, such as analysing the distribution of various points of interest (POIs) around origin–destination stations and the popularity of these POIs during certain time periods, information which can be sourced from platforms like Google Maps.

Fig. 8
figure 8

Numerical and semantical relations mapping

Discussion and conclusion

The knowledge graph is a graph-based knowledge representation and organization method, which is composed of nodes (entities) and edges (relations). It has been widely and successfully applied in many areas. However, the application of knowledge graphs in urban mobility is so far limited given the lack of direct knowledge recorded in the mobility data and time-dependent relationships. The paper introduces the concept of mobility knowledge graph by reviewing existing KG studies and synthesizing KG definitions and its development pipeline. In addition, we explore the MKG construction in public transport using smart card data and developed two generic decomposition approaches to extract hidden relations including rule-based and neural network-based models.

The case study using smart card data in urban railways validates the model performance and illustrates the value of MKG in predicting an individual trip destination by comparing it with real observations and benchmark approaches. We find that the neural network-based MKG construction method achieves an accuracy level close to the ‘theoretical’ upper bound performance of the prediction task. It has important implications for the pubic transport trip destination inference for fare systems with only boarding provisions.

In many cities, such as Stockholm, the smart card system in public transport only records the tap-in data without the tap-out data. Predicting individual trip destinations in such cities is challenging since the detailed individual trip trajectory data cannot be accessed to develop the MKG. In such a case, the mobility data from other sources in the same city (e.g., taxi trajectory data, geotagging check-ins data, etc.) can be used to infer the location-location relations for the MKG. The constructed MKG then would provide useful information to infer the individual trip destination in public transport, which needs further testing in future work.

People with the same travel activity and traveling from the same origin station at the same time would be more likely to visit the same destination. Therefore, the relations (activities) in MKG contribute to inferring the real attention (dependency) of users to stations/locations even if the user-station visiting data is not observed (missing or zero values). This is a common problem in public transport (i.e., no visit does not necessarily mean no attention or interest to a location) and it helps to a better understanding of human mobility (e.g., travel purpose) and facilitates downstream applications in transport, such as location recommendation. Moreover, the station-station relationship in MKG is also of great significance for other application scenarios. For example, precision marketing can assist in locating more target groups and placing targeted advertisements based on the main travel activities of stations/locations.

Translating the numerical relations to semantic ones for practical uses is challenging given the unsupervised learning manner of the approach. We discussed a qualitative approach in that direction and interesting. The future research direction is to develop a systematic approach to map the hidden relations to semantic meanings and validate these mappings using additional data sources, e.g., household travel surveys.