There exists a range of transformations that can be applied to movement data for analyzing them in various ways and extracting different kinds of information. First of all, each recorded position is a spatial event, which is specified by a reference to the moving object id, time stamp t, and coordinates x (longitude) and y (latitude). An event may also have attributes: id, t, x, y, attributes.
The events of moving objects being at specific spatial positions at particular times can be called position events to distinguish them from other kinds of spatial events. Integration of chronologically arranged position events of the same moving object produces a trajectory of this object (Fig. 40.9). Such integration allows computation of derived attributes based on the positions of consecutive points: displacement distance and direction, time difference, speed estimate, etc. These derived attributes can be used for extracting secondary events from trajectories (e.g., stops) and dividing trajectories into smaller subsets (e.g., trips between stops). We applied these transformations when investigating the data properties.
Both trajectories and events can be spatially aggregated by a set of places. As a result, the places are characterized based on the visits by moving objects (e.g., counts of the objects and the visits, statistics of the duration of object presence in the area, etc.) or the events that occurred in them (e.g., counts of events of different kinds). The aggregation can be performed by time intervals producing place-based time series of the visits and presence. Additionally, trajectories can be aggregated according to the moves (transitions) between areas. The transitions link the areas, and these links can be characterized based on the number and properties of the transitions, such as the number of distinct objects that moved and the statistics of the speeds and durations. Aggregated transitions between places are usually called flows. The aggregation can also be made by time intervals resulting in link-based time series of flow characteristics.
Spatial time series can be viewed in two complementary ways. On the one hand, they consist of sequences of values associated with individual places or links, which can be called local time series. Respectively, the places or links can be characterized and compared based on the temporal variation of the respective values. On the other hand, for each time step, there exists a particular distribution of the values over the set of places or links. This distribution can be called a spatial situation. The whole spatial time series can be seen as a sequence of such spatial situations. Respectively, the temporal variation of the spatial situations can be studied and characterized.
Further events (e.g., occurrences of extreme values) can be extracted from place- or link-based spatial time series.
Data transformations support investigation of different aspects of mobility phenomena. As our goal is characterization of urban context, we expect that transformations will allow us to enrich the context by different kinds of relevant information.
40.4.1 Context Acquisition from Movement Data
Traffic and mobility are important parts of the overall urban context. Information concerning movements of vehicles and people in an urban area may be relevant in studying various phenomena, such as air quality, noise, or disease spread, and events, such as traffic accidents, crimes, or disruptions in the work of public transport. Movement-related context information that can be extracted from trajectory data includes place visiting context, flow context, time context, trip context, and personalized semantic context. We consider a selection of the listed aspects in detail in the following sections.
40.4.1.1 Place Visiting Context
For describing the context in terms of place visits, it is necessary to have a suitable set of places. When there are no predefined places suiting the goals of an intended study, the places need to be appropriately defined. One possible way to do this is taking the neighborhoods of some positions of interest, e.g., circles of a chosen radius around the positions of studied events. Places relevant to transportation studies can be defined based on the street segments and intersections. However, the resulting level of detail and amount of data can be excessive for the envisaged spatial scale of the intended study. For studies of human mobility behaviors, places can be defined based on identifying areas of different kinds of human activities.
A set of places can also be derived by partitioning the territory into compartments based on the spatial distribution of some data, such as positions of stationary objects, events, or points from vehicle trajectories. Andrienko and Andrienko (2011) proposed to divide a territory based on the distribution of characteristic points of trajectories, which include the positions of stops and turns as well as trip starts and ends. The points are extracted from the trajectories and grouped according to their spatial locations. A special method for space-bounded point clustering produces spatial clusters whose radii do not exceed a given threshold. The medoids of the clusters (i.e., the points with the smallest mean distances to the other cluster members) are taken as generating seeds for Voronoi tessellation. When the points are not evenly spread throughout the territory but form dense clusters, the seeds tend to be taken from these clusters, which make the resulting places meaningful and interpretable. Depending on the chosen maximal radius of a point cluster, the territory is divided into larger or smaller compartments. Hence, an analyst can adjust the partitioning to the spatial scale of the intended analysis and the desired level of detail.
An example of territory partitioning based on trajectory data is shown in Fig. 40.10. The characteristic points have been grouped in clusters with the maximal radius 2.5 km. As a result, we have obtained 3535 places (compartments). It can be observed that the geometries and the spatial layout of the places reflect the topology of the major roads. This is the effect of taking seeds for the tessellation from dense concentrations of trajectory points, which mainly occurred along these roads. The places in Fig. 40.10 are colored according to the numbers of distinct cars that visited them. As we mentioned earlier, other characteristics of places that can be derived from movement data are time series of place visits and their durations, and aggregate characteristics of the objects that visited the places.
Thus, our data allow us to characterize the places based on the “population structure” of the cars that visited them. The data set includes car manufacturer information for each anonymized car identifier. Respectively, it is possible to obtain separate car counts for different manufacturers. Using this information, we would like to cluster the places by the similarity of the car population structures. However, a straightforward application of clustering to the absolute counts just separates areas by total car counts, replicating the major patterns visible in Fig. 40.9. Therefore, it is necessary to normalize the counts by the total numbers of different cars recorded in each compartment, thus obtaining proportional values.
We have clustered the normalized counts using the partition-based clustering method k-means in combination with a projection of the cluster centroids onto a plane, as suggested by Andrienko and Andrienko (2013b). The results are presented in Fig. 40.11. The positions of the cluster centroids on the projection plane (top left) are used for selecting appropriate clustering parameters and then for assigning colors to clusters reflecting their similarities and differences. The cluster profiles in terms of the proportions of the cars from different manufacturers are shown in a bar chart (top right) and on a map (bottom left).
The clustering results show that the main motorways are dominated by Vauxhall, Ford, and VW, while central London and Brighton are characterized by a mix of everything, with some prevalence of Vauxhalls and Fords. One can find compact “villages” in rural areas populated mostly by Fiat, Ford, SEAT, Peugeot, or VW.
Places can also be grouped according to the place-based time series of visits or counts of distinct cars, either in absolute or normalized form. We omit such analysis here due to space restrictions. However, we shall consider link-based time series in the next section.
40.4.2 Flow Context
While place-based time series characterize a territory in terms of the spatiotemporal variation of the presence of moving objects or events, link-based time series complement the characterization by describing the volumes and characteristics of movements (flows) between the places. In this section, we present an example of analyzing the flows between the same places as in Figs. 40.10 and 40.11. For the set of 3,535 places, we obtain 13,153 directed links when we use the original trajectories and 12,654 links when we use the trajectories corresponding to the trips (resulting from dividing the original trajectories based on stops for 15 min or more). The divided trajectories are more appropriate for characterization of movement speeds.
Figure 40.12 presents a map where the links are represented by curved lines colored according to the average speeds during the transitions between the places. Similarly to Fig. 40.10, this map reflects the properties of the road network and the spatial distribution of the urban areas. Each pair of places is connected by two lines reflecting movements in opposite directions. We can notice that for the majority of the location pairs there is no substantial difference between the average speeds in the opposite directions. However, aggregates that reflect the temporal variation, such as the hourly flow volumes over the two weeks, may reveal asymmetry between the flows in opposite directions.
In Fig. 40.13, we have applied k-means clustering to the flow volumes normalized by the each link’s mean value after exclusion of the links with very low flows (less than 50 moves in total during the 2 weeks period). As in the previous section (Fig. 40.11), the parameters for the clustering were selected by inspecting the positions of the clusters centroids in the projection space, and the projection was also used for assigning colors to the clusters. Clusters whose centroids are close in the projection space due to the similarity of the respective attribute values receive similar colors. In the map in Fig. 40.13, we can observe the consistency of cluster affiliation along chains of links following the major roads; hence, the traffic has common patterns along the major transportation corridors formed by the most important motorways. We can also notice pairs of opposite links that were put in distinct clusters, which means that the temporal patterns of the respective flows differ.
40.4.3 Time Context
Mobility is essentially a temporal phenomenon; thus, the distribution of people and vehicles over a territory and their movements from place to place vary over time. As human activities are cyclic in general, we can expect temporal cycles to appear in aggregated representations of mobility, and we have observed them in the 2D histograms of the aggregated flows in Fig. 40.13.
As shown in Fig. 40.9, spatial time series can be viewed from two complementary perspectives: as spatially distributed local time series and as temporally varying spatial situations. Figure 40.13 corresponds to the former perspective: we applied cluster analysis to the local time series associated with the links. Now we are going to take the other perspective and apply clustering to the time steps of the time series. We cluster the time steps according to the similarity of the spatial distributions of the car presence (Figs. 40.14 and 40.15) and flow volumes (Figs. 40.16 and 40.17). The aggregates representing the presence have been obtained from the original (undivided) trajectories, to take stationary vehicles into account, and the link-based aggregates have been obtained from the divided trajectories representing the trips.
The calendar view in Fig. 40.14, left, shows the daily and weekly patterns of the spatial distribution of the car presence, where the night hours are similar across the days; the morning and evening rush hours of the weekdays appear quite different from the midday times, and the weekend patterns are distinct from the weekday ones. The patterns on Friday evenings differ from the other weekdays by later beginnings of the evening- and night-specific distributions.
The small multiple maps in Fig. 40.15 demonstrate the spatial distribution of the mean volumes of the presence for each cluster. The clusters are arranged according to the succession of their numeric labels (from 1 to 12) in rows from left to right and from top to bottom. We can observe extremely prominent road network patterns, especially during the mass commuting times (e.g., Clusters 6 and 10). These patterns do not appear in late evenings and nights (Clusters 9 and 12).
Figures 40.16 and 40.17 present the results of applying clustering to the time steps of the link-based time series. The times have been clustered according to the similarity of the spatial distributions of the flow volumes. Figure 40.16 is analogous to Figs. 40.14 and 40.17 corresponds to Fig. 40.15, but the maps here show the spatial distributions of the mean flow volumes corresponding to the clusters. The volumes are represented by proportional widths of the flow lines.
The afternoon Clusters 1, 4, and 9 are characterized by intensive traffic on highways while the morning Clusters 6, 7, and 8 show higher traffic on local roads and in populated areas. Interestingly, the flow distribution patterns in Hours 9–14 on the weekdays are similar to those in the nights. Several clusters consist of only a few or even a single time moment with extraordinary traffic distributions. For example, Cluster 5 has a very high traffic on the inner ring of London.