Spatiotemporal Data: Trajectories
Let p(l, t) be a spatiotemporal point with location l at time t. A trajectory is defined as τ =< p 1, p 2…p n > where p i .t ≤ p j .t if i < j. That is, a trajectory is a sequence of spatiotemporal points ordered by time.
Location l can be represented as a longitude and latitude pair in geographical space or a road segment ID and distance offset in a road network. A trajectory without temporal information is often called route or path, and a collection of trajectories of an object is called its trace. The trajectory with a specific origin and destination pair (OD pair) is also called a trip.
A trajectory records how an object moved in a space. Such information is easier than ever to acquire with the prevalence of location-capturing devices such as GPS nowadays. Therefore, large volumes of trajectory data are being accumulated from various sources every day, for animals, human, vehicles, and natural phenomena (Zheng 2015). Animal trajectory data can be obtained by attaching tracking devices to animals, for environment protection and animal behavior studies. Movebank has collected animal movement data from thousands of studies at millions of locations. Human trajectories are collected from travelers, cyclists, and joggers, due to the recent popularity of electronic fitness tracking devices and mobile devices. Transport-related trajectory data, which by far are the most voluminous, most interesting, and most useful type of trajectory data, are generated by GPS devices and fixed-location data-capturing devices from vehicles, airplanes, and ships. Taxi service providers like Uber and DiDi create terabytes of trajectory data every single day. Natural phenomena trajectory data are also collected for scientific studies. NOAA Air Resources Laboratory (Draxler and Rolph 2003) stores a massive amount of meteorology trajectories that can be used to better understand the causes and impacts of natural disasters and to protect the natural environment.
The works on trajectory data can be categorized into several topics (Zheng and Zhou 2011). The first one is trajectory preprocessing, including noise removal to improve data quality, map matching that aligns points to road segments for road network-constrained moving objects (such as cars), data compression, and trip segmentation that prepares data for further uses like clustering and classification. The second one is related to trajectory data management, which aims at answering retrieval queries efficiently by building indexes and developing query algorithms. The third one is trajectory mining, which involves finding the patterns among the trajectories, classifying them into different categories, detecting outliers, and reducing the uncertainty between two consecutive points. Finally, based on all the previous steps, trajectory data can be used to solve problems ranging from more conventional applications such as traffic condition prediction and route planning to the more recent applications such as fuel and pollution emission minimization in a city.
Key Research Findings
Essentially, the raw trajectory is an array of (l, t) data, which can be noisy, too dense, or too coarse in terms of sampling rates and cannot be directly used for a variety of applications. Therefore, like any other types of raw data, preprocessing is needed before actual uses.
Due the accuracy of the devices, the data collected are not always accurate. Some of the data points obviously drift off the course. The simplest approach is using the mean or median value of a sliding window to filter out the noise point. However, it fails when there are multiple consecutive noise points. More practical approaches apply outlier detection methods, like computing the travel speeds of the points and removing those that surpass the threshold (Yuan et al. 2013).
If there exists an underlying road network that confines object movement (e.g., for cars), it is always beneficial to attach the GPS points to the corresponding roads. Based on the time when the matching is executed, it can be categorized into real-time mode and post-processing mode. The real-time map matching is widely applied in real-time turn-by-turn navigation systems. It requires fast computation and can only use the previous few points (i.e., no future points can be used), while it cannot guarantee the continuity on the path during the trip. The post-processing map matching can utilize the entire trajectory, so it is more accurate but time-consuming.
The techniques used in map-matching methods can be divided into four groups. The first one mainly considers the geometry distance between GPS point/trajectory segment and the candidate map points/map edges that could be possibly aligned on. The second group also considers topology of a map such as connectivity and contiguity of roads. The third group further improves the accuracy by using probability-based methods like the Hidden Markov Model and Kalman Filter. More advanced approaches combine the existing methods with additional information like Wi-Fi, Bluetooth, and cellular fingerprint on mobile phones, driver behaviors, and other semantic information about the road network, the objects, and other related information.
The amount of trajectory data increases at an increasing pace, leading to gigantic storage overhead as well as computation and communication costs. However, not many applications need the trajectory data to be that precise, so compression is necessary in many cases.
A simple approach to reduce the size is to remove some points if they do not affect the precision dramatically. In this way, the new and more compact trajectory is an approximation of the original one. Another approach takes advantages of the road network if applicable. Data size can be reduced significantly after using consecutive matched road segments to represent points because normally there are multiple GPS points along the same road segment. Further compression can be achieved with the help of frequent sequential pattern mining and Huffman Coding (Song et al. 2014), or using other string compression methods like Burrows-Wheeler Transform, because a series of roads can be viewed as a string with each road representing a character in the alphabet (Koide et al. 2017).
In many high-level applications where trajectory data are used (such as traffic and traveler behavior analytics), shorter trajectory segments make more sense than the original long trajectory. This is not only because shorter trajectories can better support similarity-based analysis but also improve computation efficiency (many trajectory similarity measures are of quadratic complexity). Further, segmentation based on OD pairs can also bring semantic information to trajectories. Trajectory segmentation (or trip segmentation) is the process that breaks a long trajectory into a series of short trajectories.
The segmentation processing has three main categories. Firstly, the trajectory can be segmented based on time interval. It is like resampling the original trajectory on a lower sampling rate. Secondly, it can be segmented based on the shape (Lee et al. 2007), which involves finding the turning points. Finally, semantic meaning of the points (like walk segment, driving segment, and segments between taxi waiting time) can also serve as segmentation points.
Querying and processing directly on a large volume of trajectories are actually very time-consuming. Therefore, how to organize and index the trajectory data to support trajectory query answering efficiently becomes a research topic, which is called trajectory management (Deng et al. 2011).
Based on the type of query entity (i.e., points, regions, and trajectories), trajectory queries can be classified into three types. P-Query asks for points which satisfy a given spatiotemporal relationship to specified trajectory segment(s) (e.g., top-k nearest neighbors) or reversely. Similarly, R-Query asks for regions, and T-Query asks for trajectories. An example of a spatiotemporal relationship is “within 500 m of a gas station between 9:00pm and 9:30pm.”
Compared with other general data types, trajectory data has unique characteristics like continuous long time span. Meanwhile, queries on trajectory also often ask for information in a continuous time window. Based on these characteristics, three types of indexes are proposed.
The first type augments the existing multidimensional index with a temporal dimension, like 3D R-Tree. The second type further breaks the temporal dimension down to multi-version structures, such as HR +-Tree and MV3R-Tree (Tao and Papadias 2001). The third type focuses on dividing the spatial dimension into grids and then building a separate temporal index on each grid. This category includes SETI and MTSB-Tree.
Like any other data types, a similarity (or distance) measurement is needed to compare between trajectories.
The simplest scenario is the distance from a point to a trajectory, which is measured by the distance to the nearest point in the trajectory. As for the distance between a set of nodes and a trajectory, the closer matched pair of points is assigned with larger weights using the sum of distance, while those faraway pairs are given much lower value typically in an exponential way.
The similarity between two trajectories is usually measured by some kind of aggregation of distances between trajectory points (Wang et al. 2013). Along this line, several typical similarity functions for different applications include Closest Pair Distance, Sum-of-Pairs Distance, Dynamic Time Warping, Longest Common Subsequence, Edit Distance with Real Penalty, and Edit Distance on Real Sequences. It is worth noting that some of those similarity functions were originally proposed for time series data. But as trajectories can be regarded as a special kind of time series in a multidimensional space, these similarity functions can also be applied to trajectory data.
Because trajectory data is always a sample of the object’s actual movement trace, the uncertainty exists between any two points in a trajectory especially when the sampling rate is low (Zheng et al. 2012). On one hand, some works aim to reduce the uncertainty of the trajectory. On the other hand, other works try to add more uncertainties for privacy protection reasons.
The first group of researches focuses on providing conservative bounds for the positions of uncertain objects between two points, which is achieved by employing geometric cylinders or beads. Independent probability density functions can be used to model the uncertain positions (Cheng et al. 2004).
The other group of approaches aim at providing the most k likely routes between sample points with the help of a set of uncertain trajectories, because these trajectories that share similar routes can often supplement each other to make themselves more complete (Su et al. 2013).
Contrary to the previous attempts, techniques are developed to work on preventing user privacy leaking by blurring the published trajectory while preserving the utility of the data.
Trajectory Pattern Mining
Trajectory pattern mining aims at discovering trajectory groups based on their proximity in either a spatial or a spatiotemporal sense. There are four main categories of patterns that can be discovered from a single trajectory or a group of trajectories.
The first one is the moving together pattern, which discovers a group of objects that move together for a certain time period. A flock is a group of objects that travel together within a disk of some user-specified size for at least k consecutive timestamps. Apparently, the fixed disk shape with a fixed size can be too strict and rigid to use in practice, so convoy is proposed by finding patterns based on density. In this way, patterns of any shape and size can be discovered. However, both flocks and convoys are strict on period, so swarm (Li et al. 2010a) is proposed to further generalize the cluster with objects lasting for at least k timestamps. To cope with stream data, traveling companion uses a data structure (called traveling buddy) to continuously find convoy-/swarm-like patterns from trajectories and can work online. By allowing the membership of a group to evolve gradually, gathering (Zheng et al. 2014) can be used to detect events and incidents.
The second one is trajectory clustering. Unlike general clustering tasks that use feature vectors to represent objects, it is hard to generate a uniform feature vector because different trajectories contain different and complex properties, such as length, shape, sampling rate, number of points, and their orders. A number of works have been done using the trajectory similarity. Although some of them work on the entire trajectory, it is rare for two objects traveling together for the entire journey. So more practical approaches partition trajectories into segments before clustering. If the trajectories are matched to map, the trajectory clustering task can be done by applying graph clustering algorithms.
The third one is mining sequential patterns from trajectories. A sequential pattern means a certain number of moving objects travel a common sequence of locations in a similar time interval and the locations of the sequence do not have to be consecutive. A general solution is using trajectory clustering first and then reforming trajectories with cluster IDs. In this way, existing sequential pattern mining algorithms like PrefixSpan can be used. If the trajectory can be matched on map, the resulting sequence of road IDs can use Suffix Tree to find the frequent patterns (Song et al. 2014).
The last one is mining periodical patterns from trajectories. Some object movements have periodical patterns over the long history. For example, people go to work in the morning and go back home at night. Animals migrate from one place to another at different time of the year. A straightforward approach is to use general frequent pattern mining methods. However, real-life periodic behaviors are complicated and involve multiple interleaving periods, partial time span, and spatiotemporal noises and outliers. Therefore, a more advanced two-stage method is proposed (Li et al. 2010b). In the first stage, it mines all the frequent visiting places by density-based clustering algorithms. The temporal data corresponding to entering and leaving these places can be used to find the period values. In the second stage, larger periodic patterns are created by applying hierarchical clustering algorithms on the partial movement sequences.
Trajectory classification helps divide trajectories into different statuses. For example, taxi trajectories can be occupied, non-occupied, and parking. A cell phone user can be stationary, walking, and driving or even driving, biking, commuting by bus, and walking. In general, the classification has three steps. First of all, trajectories are divided into segments in preprocessing stage. After that, features of each segment are extracted. Finally, existing sequence inference models (such as Dynamic Bayesian Network, Hidden Markov Model, and Conditional Random Field) can be used.
Outliers of trajectory data can be points significantly different from others spatially or temporally and can also be observations that do not follow the expected patterns or constraints. One general approach is to leverage standard frequent pattern mining methods. If the trajectory cannot fall into any cluster, it might be an outlier (Lee et al. 2007).
Examples of Application
Trajectory data can be found in many applications; here we just list a few as examples.
Travel Recommendation. It aims to find interesting locations and travel sequences from trajectories generated by many people. (Zheng and Xie 2011) identifies staying points from users’ trajectories and clusters these points into locations of interest. After that, it can identify the top-k most interesting locations and travel experts in a city and do the recommendations based on their data. Moreover, it can recommend trajectories themselves, because historical traveling experiences can also reveal valuable information on how other people usually choose routes between locations.
Traffic Condition Estimation. The trajectories of vehicles on the road can reflect the traffic condition (Yang et al. 2013). It needs a series of processing to generate a speed profile from trajectories: map matching, speed generation, missing value estimation, and compression. The result not only can be used for finding the fastest path from one place to another at different departure time but also can help find the congestion of road network and provide decision support for urban planning.
Map Inference. Normally, vehicle trajectories can always be matched to some roads on a map. However, when a new road is developed and the map has not been updated, or if there is even no map for the current region, map matching will fail. Map inference works under this scenario to infer new maps or update existing maps based on trajectories.
Diagnosing Traffic Anomalies. Such examples can be that a taxi driver takes a malicious detour, that an unexpected road change occurs, or that people travel a wrong path. They can all be discovered by trajectory outlier detection using trajectory clustering.
Future Movement Prediction. Periodic trajectory pattern mining can be used to predict the next direction or destination of the current moving objects, like a group of animals or a commuter. The prediction can further help compress the trajectory data itself.
User Similarity Estimation. There are many aspects of similarity between users, like connections from social network, point of interest, check-in history, and any other logs, that can be obtained. Besides them, trajectory data, which reveals the life patterns of the users by trajectory classification and clustering, can also help improve the accuracy of similarity comparison.
Sport Tactic Analysis. For many team games like soccer, basketball, hockey, and rugby, the players’ movements are essentially trajectories. By analyzing the video data from different cameras, the trajectory of each player can be reconstructed and are used in tactic analysis by many professional clubs nowadays. Some of them even hire a data analyst as a coaching staff.
Airspace Monitoring and Aircraft Guiding. A large volume of aircraft trajectory data is managed by an air traffic controller. They use these trajectories to monitor the “health” of the airspace, which is implemented by trajectory clustering and outlier detection.
Scientific Study. Meteorologists use trajectory of SO 2 and NO x in an isentropic and constant level to analyze the contributions of acidic deposition. Zoologists use the trajectory of animals to study their movement patterns. Biologists use proteomic trajectories to study mouse retina development.
Future Directions for Research
As summarized above, there exists a rich body of research on all aspects of trajectories, from data capturing, cleaning, compression, and indexing to processing. We have witnessed an accelerating trend of research activities form both academic and industry on this topic. There are three main drivers in today’s big data landscape, which will also drive the research in trajectory data management, processing, and analytics in the foreseeable future.
First, it is about volume. Not only the volume of trajectory data now can reach TB level daily for some large navigation or taxi-/car-sharing companies, but also the number of queries has increased dramatically. What is the best way to process map matching for one billion points every day? How can we reduce the processing costs if we have 10,000 shortest path queries in the same region? This is a scenario that exists already, as when a user opens a map-based app and inputs a location, it issues a shortest path query. For every problem we solved before, it is the time to revisit them, to see how they can become scalable using new computing platforms and how better algorithms can be designed to support batch query processing (for a very large number of streaming queries on streaming data).
Second, it is about semantics. After all, trajectory data are low-level data which can be noisy and with a high level of redundancy (among consecutive points of one moving object, among the history data of one moving object over a long period, and among objects with similar moving patterns). There is a dilemma between justifying the high costs of storing all data available and the fear that some data which we only find useful in the future for some purposes we do not know yet might be lost if we do not store them. Clearly, a new way of thinking is needed to manage trajectory data based on a semantic hierarchy, from raw data, calibrated data, events detected, summarization of data for a basic set of requirements, patterns discovered, and general statistics. Data can be gradually reduced over time and eventually removed. In such a way, trajectory data can be at the center for data integration and data analytics, as location and time are two ubiquitous dimensions for most information useful to us. Trajectory will no longer be considered as specialized data with limited applications; rather, it is an enabler data asset underpinning the future of a data-rich society.
Third, trajectory data can reveal so much about a person. With so much location and movement data about a person captured and accessed so easily, security and privacy become an extremely serious issue for trajectory data. Simple facts about one visiting or not visiting a place can be highly sensitive. This problem can become much more significant when the trajectories are used with other data sources including social network data. Some research has already started on this topic, but much more is needed urgently.
- Deng K, Xie K, Zheng K, Zhou X (2011) Trajectory indexing and retrieval. Computing with spatial trajectories. Springer, New York, pp 35–60Google Scholar
- Draxler RR, Rolph, GD (2003) Hysplit (hybrid single-particle lagrangian integrated trajectory). NOAA air resources laboratory, silver spring, MD. model access via NOAA ARL ready websiteGoogle Scholar
- Koide S, Tadokoro Y, Xiao C, Ishikawa Y (2017) CiNCT: compression and retrieval for massive vehicular trajectories via relative movement labeling. arXiv preprint arXiv:1706.02885Google Scholar
- Lee J-G, Han J, Whang K-Y (2007) Trajectory clustering: a partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, pp 593–604Google Scholar
- Li Z, Ding B, Han J, Kays R, Nye P (2010b) Mining periodic behaviors for moving objects. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1099–1108Google Scholar
- Su H, Zheng K, Wang H, Huang J, Zhou X (2013) Calibrating trajectory data for similarity-based analysis. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp 833–844Google Scholar
- Tao Y, Papadias D (2001) The mv3r-tree: a spatio-temporal access method for timestamp and interval queries. In: Proceedings of very large data bases conference (VLDB), 11–14 Sept, RomeGoogle Scholar
- Wang H, Su H, Zheng K, Sadiq S, Zhou X (2013) An effectiveness study on trajectory similarity measures. In: Proceedings of the twenty-fourth Australasian database conference, vol 137. Australian Computer Society, Inc., pp 13–22Google Scholar
- Zheng Y (2015) Trajectory data mining: an overview. ACM Trans Intell Syst Technol (TIST) 6(3):29Google Scholar
- Zheng Y, Xie X (2011) Learning travel recommendations from user-generated GPS traces. ACM Trans Intell Syst Technol (TIST) 2(1):2Google Scholar
- Zheng K, Zheng Y, Xie X, Zhou X (2012) Reducing uncertainty of low-sampling-rate trajectories. In: 2012 IEEE 28th international conference on data engineering (ICDE). IEEE, pp 1144–1155Google Scholar