In human movement analytics, a primary task is to compute movement parameters such as speed, acceleration, direction, path tortuosity to characterize human movement in space and time (Schüssler and Axhausen 2008). In this section we first use several computational experiments to demonstrate how movement parameters can be influenced by varying temporal scales of the GPS tracking data. This is followed by an investigation of the impact of temporal scale on estimation of individual space usage computed using different approaches (e.g., minimum convex polygon, 95% Kernel Density Estimation home range, radius of gyration, and potential path area). The last focus of this section is to examine the scale impact on the analysis of human interactions between mobile individuals using a time-geographic approach.
Movement parameters
Two basic movement parameters, speed and path tortuosity, are used to demonstrate the impact of varying sampling intervals on computing movement parameters. Given the original sampling interval is 3 s per fix, the raw data are manually down-sampled to 9 s, 15 s, 30 s, 60 s (1 min), 180 s (3 min), 300 s (5 min), 600 s (10 min), 900 s (15 min), and 1800 s (30 min) by skipping multiple trajectory points (see the example in Fig. 3). Granted that it is less likely to use 10 min or above sampling intervals for human movement analysis, these coarse sampling intervals are retained in our experiments for observing more holistically the scale impact on movement analytics. In addition, in many applications, high-resolution GPS tracking data are not available and tracking is done via other stationary sensors at times when moving individuals interact with or pass by these sensors. These forms of commonly used movement data including call detail records, smart card data, geotagged social media check-in data, etc., are obtained in various resolutions, often coarser than 30 min or 1 h (Barbosa et al. 2018). Therefore, it is important to investigate the impact of such coarse temporal scales on movement analytics. Considering that the origin and destination of individual trips are places where activities happened, the original origin and destination locations of each trip are retained in the process of down-sampling (see the example in Fig. 3). It is important to note that when the sampling interval is increased to 1800 s, only origin and destination locations are left for 93.06% of individual trips as the durations of most trips are shorter than 1800 s. As shown in Fig. 4, the number of GPS fixes substantially decreases after down-sampling.
An illustration of the trajectory of the same individual at different sampling intervals is presented in Fig. 5. The original trajectory collected at 3 s (shown in dark red) represent the continuous movement of the individual as a smooth trajectory, while those down-sampled to 9 s, 15 s, and beyond exhibit a ‘jagged’ geometry. The finer resolution tracking data can delineate the shape of the road network (also see the trajectory at the scale \(s_{1}\) in Fig. 3). As shown in the two inset figures, the finer scale data captures turnings at intersections and the moving along a curved road well, while the coarser trajectories might overlook these details. To tackle this issue, a common practice is to snap individual’s locations to road segments by map matching (Hashemi and Karimi 2014; Quddus et al. 2007). In this way, it becomes possible to compute network distance that incorporates the shape of road network instead of taking Euclidean distance between points of the raw trajectory. However, when adding map matching to preprocess trajectories, one should be careful about the temporal scale. As sampling interval increases, the actual movement path can be distorted (see Figs. 3 and 5), which may cause erroneous results in the map matching process and computation of trajectory shape and movement distance. In our experiments, we do not apply map matching as the original data come in a very high resolution and represent the shape of the network with high precision.
Speed
Two types of speed are used here that are the point-to-point speed along the trajectory and the average speed of an individual trip. A trajectory of a trip consisting of n tracking points can be denoted as \(T = \left\{ {\left( {x_{0} ,y_{0} ,t_{0} } \right),{ }\left( {x_{1} ,y_{1} ,t_{1} } \right),{ } \ldots ,{ }\left( {x_{i} ,y_{i} ,t_{i} } \right),{ } \ldots ,\left( {x_{n} ,y_{n} ,t_{n} } \right)} \right\}\), where \((x_{i} ,y_{i} )\) represents the geographic coordinate and \(t_{i}\) denotes the recorded timestamp. The point-to-point speed \(v_{i,i + 1}\) can be calculated as follows using a pair of consecutive fixes \((x_{i} ,y_{i} )\) and \((x_{i + 1} ,y_{i + 1} )\).
$$v_{i,i + 1} = \frac{{\sqrt {(x_{i + 1} - x_{i} )^{2} + (y_{i + 1} - y_{i} )^{2} } }}{{t_{i + 1} - t_{i} }} \; \; with\; i \in \left[ {0,n - 1} \right]$$
(1)
The average speed of an individual trip \(\overline{v}\) can be defined as the total distance traveled between the origin \((x_{0} ,y_{0} ,t_{0} )\) and destination \(\left( {x_{n} ,y_{n} ,t_{{\text{n}}} } \right)\) divided by the elapsed time as shown below.
$$\overline{v} = \frac{{\mathop \sum \nolimits_{i = 0}^{n - 1} \sqrt {(x_{i + 1} - x_{i} )^{2} + (y_{i + 1} - y_{i} )^{2} } }}{{t_{n} - t_{0} }}$$
(2)
The goal of this experiment is twofold: (1) to reveal the impact of sampling intervals on computing both the point-to-point speed and the average speed of individual trip, and (2) to understand how that impact differs for different transport modes.
Point-to-point speed along the path Figure 6 illustrates the impact of varying temporal scales on overall point-to-point speeds, disaggregated by different transport modes. The boxplots present the variation of point-to-point speed in our sample at different sampling intervals. Each boxplot displays a median value, a box enclosing the twenty-fifth to seventy-fifth percentiles, and an upper whisker and a lower whisker representing the maximum and minimum values which are less than a distance of 1.5 times the interquartile range from the upper quartile and the lower quartile, respectively (Tukey 1977). In addition, the mean value is also displayed using a triangle mark. Outliers (values beyond the whiskers) are excluded to improve readability. The results visualized in Fig. 6 suggest that as the sampling interval increases, the median and mean of point-to-point speed tend to decrease, which is consistent with existing research (Laube et al. 2007; Laube and Purves 2011). This is reasonable because the actual point-to-point distance traveled is underestimated as the sampling interval increases. As the example in Fig. 5 shows, when comparing the original trajectory collected at 3 s with the trajectory down-sampled to 30 s or above, it is obvious that while the travel time remains the same between the same pair of tracking points at different sampling intervals, the distance tends to shrink due to the aggregation, cutting network corners, and elimination of network details, which results in reducing point-to-point speed. Except for the walk mode which presents a stable variance in speed across different sampling rates, the variance in point-to-point speeds of other modes decreases as the temporal scale increases. In general, except for the transit bus mode, the mean and median values of point-to-point speed are very close. Presumably, transit buses usually make more stops than other modes which causes more low values of point-to-point speed compared to other modes.
Considering that the distribution of speed values is positively skewed, a nonparametric test, Kruskal–Wallis test (Kruskal and Wallis 1952) is applied to examine whether there is significant difference among the mean values of speed at the ten different sampling intervals. The results indicate that there is significant difference (p < 0.01) among the ten groups of speed values at different sampling intervals both for the overall sample and the subsets by the five categories of transport modes. Next, a Dunn’s test (Dunn 1964) is applied to determine which specific means are significantly different from the others. The results of Dunn’s test on overall point-to-point speed indicate that all pairs of sampling intervals (i.e., speed of the same mode computed at two different temporal scales) are significantly different from each other (all pairwise p < 0.05). Similarly, the subsets of auto and walk modes also presents the same relationship (all pairwise p < 0.05). The mean point-to-point speed of auto mode drops significantly from 59.12 to 29.84 km/h when sampling interval increases from 3 to 1800 s (30 min). For walking, it drops significantly from 5 to 2.88 km/h. For bike mode, only the groups of 600 s and 900 s are found not significantly different (p = 0.07). The average speed of bike trips drops significantly from 17.94 to 10.74 km/h as the sampling rate becomes coarser. That is to say, increasing sampling intervals (even just from 3 to 9 s) will underestimate significantly the point-to-point speed when people travel by auto, walk, and bike. However, if we take a close look into the drops of mean speed of auto mode, for instance, the difference of the mean speeds between 3 and 15 s is only 1.93 km/h which should not be considered as a large difference in the transportation context even though it is statistically significant. In terms of transit bus and rail transport modes, the significance test results suggest that the mean speed of many groups of different sampling intervals are not significantly different. Regarding transit bus, we find no significant difference between 180 and 300 s (p = 0.18), 300 s and 600 s (p = 0.35), 300 s and 900 s (p = 0.09), 600 s and 900 s (p = 0.46), 600 s and 1800 s (p = 0.23), 900 s and 1800 s (p = 0.62). For rail transport, only the groups of 600 s and 900 s (p = 0.08), 900 s and 1800 s (p = 0.06) are found not significant. The average speed drops from 64.76 to 59 km/h when the sampling interval increases from 3 to 60 s. Given that these two mean speed values are significantly different, the absolute difference is only 5.76 km/h which should not be considered as a large difference because rail transport usually travels at a constant speed. We will further discuss these findings in Sect. 5 to elaborate on their practical implications.
Average trip speed Figure 7 illustrates the impact of varying sampling intervals of GPS tracking data on the overall average speed of individual trip and disaggregated by different transport modes. Similar to Fig. 6, the results suggest that as the sampling interval increases, the overall median and mean speed values decrease. However, the data distribution of average speed of individual trip along with the varying temporal scales is more stable as compared to the point-to-point speed shown in Fig. 6. Especially for rail transport mode, the variance is very stable and even the mean and median do not change much as the sampling interval increases from 3 to 1800 s. This is reasonable because rail transport usually travels at a relatively constant speed and runs in a straighter path (designated tracks) compared to the other modes. Hence, the average speed of individual trip by rail transport is not very sensitive to the temporal scale. The results also suggest that at the higher the sampling intervals, the average speed of individual trips by auto, transit bus, walk, and bike are more underestimated. This is reasonable because as the sampling interval increases, the many details of movement paths are neglected, which systematically underestimates the actual trip distance traveled and thus results in a lower average speed of individual trips (see the example of Fig. 5). Similar to point-to-point speed for the transit mode in Fig. 6, the mean values of average trip speed for transit bus are substantially greater than the median values across different sampling rates.
The Kruskal–Wallis test results indicate significant difference among different sampling intervals for the whole sample (p < 0.01) as well as for the five subsets grouped by different transport modes (all p-values are below 0.01). We next apply Dunn’s test to identify which specific means are significantly different from the others at various time scales. Considering the whole data set, the Dunn’s test results indicate that the ten groups of different sampling intervals are significantly different from each other (all p-values are below 0.01). The Dunn’s test applied to the subsets of auto and walk modes suggests that the mean values of speed of individual trips at the ten sampling intervals are significantly different from each other (all p-values are below 0.01). These observations are consistent with the case of point-to-point speed by auto and walk modes. However, for transit bus mode, there is no significant difference (p > 0.05) between the groups of 3 s and 9 s, 9 s and 30 s, 30 s and 60 s, 300 s and 600 s, and 600 s and 1800 s. For rail transport, we find no significant difference (p > 0.05) between the group of 3 s and the other three groups of 9 s, 30 s, and 60 s, respectively. In addition, there is no significant difference (p > 0.05) between the groups of 9 s and 30 s as well as 60 s, and between the groups of 30 s and 60 s, 300 s and 600 s, 600 s and 1800 s. In terms of bike mode, we only find no significant difference between the groups of 3 s and 9 s and the other pairs of groups are significantly different from each other at 0.05 significance level.
Path tortuosity
The second movement parameter of interest is path tortuosity. Various measures have been developed to estimate tortuosity of the movement path of a moving object using trajectory data. Examples include but not limited to fractal dimension (Bovet and Benhamou 1988; Turchin 1998; Falconer 2004), sinuosity (Bovet and Benhamou 1988; Benhamou 2004), straightness index (Batschelet 1981), and angular variance of turning angles (Estevez and Christman 2006). Considering similarities across these measures, here we only consider straightness index (SI) to illustrate the influence of varying temporal scales on computing path tortuosity. SI of a trip (i.e., global path tortuosity) is defined as the ratio of the distance between origin and destination to the actual total distance traveled along the movement path as shown formally in Eq. (3).
$$SI = \frac{{\sqrt {(x_{n} - x_{0} )^{2} + (y_{n} - y_{0} )^{2} } }}{{\mathop \sum \nolimits_{i = 0}^{n - 1} \sqrt {(x_{i + 1} - x_{i} )^{2} + (y_{i + 1} - y_{i} )^{2} } }} \; \; with \; SI \in \left[ {0,1} \right]$$
(3)
where \((x_{0} ,y_{0} )\) and \((x_{n} ,y_{n} )\) are the coordinates of the origin (first point) and destination (last point) of a trip, respectively. The SI value is 0 when the moving object returns to the origin location of a trip and is 1 when the movement path of a trip is completely straight.
SI can also be computed over a sliding window \(k\). This is used as the local measure of path tortuosity of a portion of an individual trip as defined below (Dodge et al. 2009).
$$SI^{\prime}\left( {p,k} \right) = \frac{{\sqrt {(x_{p + k} - x_{p - k} )^{2} + (y_{p + k} - y_{p - k} )^{2} } }}{{\mathop \sum \nolimits_{i = p - k}^{p + k - 1} \sqrt {(x_{i + 1} - x_{i} )^{2} + (y_{i + 1} - y_{i} )^{2} } }}\;with\quad SI^{\prime}\left( {p,k} \right) \in \left[ {0,1} \right]$$
(4)
where \(SI^{\prime}\left( {p,k} \right)\) is the local SI of the \(p\) th point of an individual trajectory given a sliding window with a width \(k\). We consider a sliding window of 5 points in the following experiment. The narrower the width of the sliding window \(k\), the noisier are the results. For coarsely sampled data such as 900 s and 1800 s, it is very likely that many individual trips contain fewer than 5 tracking points. In such case, the local measure of path tortuosity will be invalid.
Global path tortuosity Figure 8 illustrates the influence of varying temporal scales on the values of global SI. It shows that as the sampling interval increases, the global SI is more overestimated and approaching 1 which represents a complete straight movement path. This is reasonable because increasing sampling interval neglects the details and complexity of the movement path, which can result in a straighter path compared to the actual path delineating by fine-grained data (see the example of Fig. 5). It is worth to note that when the sampling interval increases to 1800 s, only the origin and destination will be left for the majority of individual trips (93.06%). Therefore, the movement path is completely straight (SI = 1) at the sampling interval of 1800 s for the majority of trips. On the other hand, the mean value is consistently below the median for every mode across different temporal scales. The skewness is negative for every group of the data indicating a negatively skewed distribution.
For relatively high-speed movement such as auto and transit bus modes, the results suggest that the movement path becomes straighter quickly along with the increasing sampling intervals. This is reasonable because a vehicle moving on road networks usually needs to make turns to reach the destination, and increasing sampling intervals will be very likely to ignore these turns which can result in a straighter movement path as compared to the actual traveled path. The distribution of the global SI values does not vary much as the sampling interval increases from 3 to 60 s for rail transport compared to the other modes. This observation indicates that the path tortuosity of rail transport is less impacted by varying scales, which is expected because railways run on fixed and relatively straight tracks with a constant speed as we observed above. For lower speed movement such as walking and biking, we find the movement path becomes straighter quickly along with the increasing sampling intervals. This is reasonable because the trip length of biking/walking usually is shorter than vehicle-based trips and pedestrians and bicyclists are less constrained by road networks compared to vehicles (i.e., their movement is more flexible in terms of direction). Increasing sampling intervals is very likely to ignore wandering and turning for pedestrians and bicyclists. These findings are important for map matching applications which is the process of matching GPS tracking points to a real road network (segments). At intersections, if turns are missed due to a coarser temporal scale then the map matching can result in more erroneous matches. This can alter both movement speed and travel path.
The Kruskal–Wallis test results show significant differences in the mean values of the global SI at different sampling intervals (all p-values are below 0.01). The Dunn’s test applied to the subsets of auto and railway transport modes suggests that the mean values of the global SI are only not significantly different between the groups of 3 s and 9 s, 9 s and 15 s, 15 s and 30 s, 30 s and 60 s (p > 0.05). For transit bus, only the groups of 9 s and 15 s are found not significant (p > 0.05). For bike mode, no significant difference (p > 0.05) is found between the groups of 3 s and 9 s, 9 s and 15 s. For walk mode, all groups are found significantly different from each other (p < 0.05).
Local path tortuosity Figure 9 illustrates the influence of varying temporal scales on the values of local SI over a sliding window of five tracking points. Individual trips that contain fewer than five tracking points are excluded in the results here. Generally, the values of local SI decrease as the sampling interval increases. This is as expected because when the temporal resolution is high, the movement over the sliding window of five tracking points only lasts for a few seconds (e.g., 12 s at 3 s temporal scale) and thus the path tends to be straight and smooth. However, at higher sampling intervals, the sliding window captures movement over a longer period of time, which can result in less straight movement path. It is important to note that the local SI is very similar to the global SI when the sampling interval reaches 300 s and above since the duration of 86.57% of trips is shorter than 25 min.
Regarding different travel modes, the results of auto and rail transport show that the local SI stays at a very high level close to one (meaning that the path is almost completely straight) at various sampling rates. This observation suggests that the movement by auto and rail transport over a segment of individual trips tends to be straight. This reflects the shape of roadway segments and especially railways that are very straight. Also, for coarser tracks, only the origin and the destination, and in some cases, a few more tracking points, remain and therefore, the local and global SI become the same. However, it is surprising that even when the sampling rate increases to 60 s per fix and above for auto trips, the local path tortuosity is still near 1. Presumably, the majority of drivers follow the shortest path or a straight highway when heading to the destination. In respect of rail transport, the local SI deviates slightly from 1 when sampling interval is increased to 30 s and above. This is because rail transport movement can be straight over a sliding window of five tracking points even when the sampling interval is large. The local path tortuosity of transit bus, walking, and biking is affected by temporal scale substantially. Transit buses move slowly compared to autos and rail vehicles and are more likely to make stops during the movement. Therefore, the local SI can capture less straight movement over a sliding window of five tracking points as the sampling interval increases. Walking and biking are usually in lower speed and follow less constrained paths. The segment trajectory by walking and biking that consists of five tracking points can be less straight with coarser sampling.
The Kruskal–Wallis test results show significant differences in the mean values of the global SI at different sampling intervals (all p-values are below 0.01). The Dunn’s test results suggest that there is significant difference (p < 0.05) in the mean values of the local SI among the groups of 3 s, 9 s, 15 s, 30 s, 60 s, and 180 s for the overall sample as well as the other subsets of each mode.
Space utilization
As discussed in the introduction, activity space estimation is a fundamental task prior to human interaction analysis. We consider activity space or space utilization as one of the key indicators for travel behavior that could potentially be impacted by different temporal scales. The objective of this experiment is to use three approaches to estimate human’s space utilization during movement and to assess the impact of the temporal scale on activity space estimation. We first use three measures namely minimum convex polygon (MCP) (also called a convex hull), 95% Kernel Density Estimation (KDE) home range, and radius of gyration to measure individual activity space at the daily level (see the review in Sect. 2). These three indicators can capture the spatial range and spatial dispersion of human’s daily activities. In addition, we apply a time-geography-based approach to estimate the accessible area at a finer scale along the movement path based on consecutive GPS tracking points. The radius of gyration is formulated as follows.
$$R_{g} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 0}^{n} [(x_{i} - \overline{x})^{2} + \left( {y_{i} - \overline{y})^{2} } \right]}}{n}}$$
(5)
where \(\left( {\overline{x},\overline{y}} \right)\) represents the center of mass of the trajectory, specifically, \(\overline{x} = \mathop \sum \limits_{i = 0}^{n} x_{i} /n, \overline{y} = \mathop \sum \limits_{i = 0}^{n} y_{i} /n\). The greater the value of \(R_{g}\), the larger the activity space of an individual.
Figure 10 illustrates the difference between the MCP and the radius of gyration. If only a minimum convex polygon is applied, the activity space of person #1 and person #2 will be exactly the same, and it cannot capture the difference in the spatial distributions of their tracking points. However, person #1 has a larger radius of gyration than person #2, which indicates that person #1 has a more dispersed activity space while person #2’s activities are more concentrated. Here, we use both measures to delineate the individual daily activity space. Figure 11(a) shows the example of the MCP and the radius of gyration for a moving individual using a real track from CHTS. The area of the MCP is 0.042 km2 and the radius of gyration is 6.79 km.
To approximate individual space utilization using KDE, the bivariate normal kernel function is applied:
$$K\left( {\text{x}} \right) = \frac{1}{2\pi }{\text{exp}}\left( { - \frac{1}{2}{\text{x}}^{t} {\text{x}}} \right)$$
(6)
where \(K\left( {\text{x}} \right)\) is the bivariate normal kernel function and \({\text{x}}\) is a vector containing the coordinates of a point on the plane. Subsequently, the kernel density estimation of the utilization distribution at a given point \({\text{x}}\) of the plane can be obtained by:
$$\hat{f}\left( {\text{x}} \right) = \frac{1}{{nh^{2} }}\mathop \sum \limits_{i = 1}^{n} K\left\{ {\frac{1}{h}\left( {{\text{x}} - {\mathbf{X}}_{{\mathbf{i}}} } \right)} \right\}$$
(7)
where \(h\) is a smoothing parameter (i.e., the bandwidth), \(n\) is the number of relocations (i.e., data points), and \({\mathbf{X}}_{{\mathbf{i}}}\) is the \(i\)th relocation of the sample. We employ one of the most common approaches, the reference bandwidth (Silverman, 1986), as defined below:
$$h = \frac{1}{2}\left( {\sigma_{x} + \sigma_{y} } \right)n^{ - 1/6}$$
(8)
where \(\sigma_{x}\) and \(\sigma_{y}\) are the standard deviations of the x and y coordinates of the relocations, respectively.
The result of KDE is a continuous density surface reflecting different probabilities of visiting in each location. The area of KDE can be measured by 100% contour of the density surface (i.e., include all nonzero activity density area) or any other specified level (e.g., 95% contour) in order to exclude areas that may consist of infrequent activities. The derived area is considered as the individual activity space. Figure 11(b) shows the example of the 95% KDE area for the same moving individual in Fig. 11(a). For implementation of KDE, readers may refer to the documentation of the R package “adehabitatHR” developed by Calenge (2006).
To estimate individual daily activity space, the minimum convex polygon, the radius of gyration, and the 95% KDE home range are computed using the individual daily records as the unit of analysis. As mentioned in Sect. 3, the GPS component of CHTS was collected over a three-day period. In total we have 9349 individual daily tracking records. It is important to note that the sample size for the experiment here is lower than the original sample size (9349) due to three reasons: (1) we exclude individuals who travel outside California; (2) the convex hull approach is invalid when an individual has only two tracking points in their daily diary (i.e., at least three points are required to construct a valid convex hull); and (3) the KDE approach requires more than five tracking points for an individual’s daily movement. Hence, it is invalid to use the convex hull approach for people who only made one outward trip on the survey day (e.g., left home and did not come back) when the sampling interval is 1800 s. Similarly, the KDE approach may also be invalid when none to very few tracking points between the origin and destination were left as the sampling rate becomes coarser.
Figure 12 shows the impact of varying temporal scales on computing the minimum convex polygon, the radius of gyration, and the area of the 95% KDE. The results in Fig. 12(a) suggest that the area of minimum convex polygon remains very stable across different sampling rates, presumably because we retain all origin and designation locations in the process of down-sampling. Given these are major locations where individual daily activities happened, it is reasonable that the estimated minimum convex polygons for the same person under different sampling intervals remain similar. The Dunn’s test results indicate no significant difference among the groups of 3 s, 9 s, 15 s, 30 s, 60 s, and 180 s (p > 0.05). In terms of the radius of gyration, no significant difference (p > 0.05) among the groups of 3 s, 9 s, 15 s, 30 s, and 60 s is found. In contrast to the stable trend found with the minimum convex polygon, there is a slightly increasing trend of the radius of gyration when the temporal scale becomes coarser (Fig. 12(b)). Specifically, the mean value of radius of gyration increases from 5.50 km (at 3 s) to 6.26 km (at 1800 s). Also, the median value increases from 3.46 to 3.85 km. This observation indicates that at the higher sampling intervals, the spatial dispersion of human’s activity space is overestimated. In terms of the area of the 95% KDE, Fig. 12(c) shows that when the temporal scale becomes coarser, the area of individual activity space estimated by KDE increases rapidly. Presumably, individual’s activity locations become more sparsely distributed in space when the temporal scale becomes coarser, which results in larger 95% KDE surface. The Dunn’s test results indicate only no significant difference among the groups of 600 s, 900 s, and 1800 s (p > 0.05). In addition to the above analysis on individual daily activity space, we also experimented on three-day aggregation of individual activity space. However, the results are similar to daily aggregation of individual activity space and hence are not presented here.
The previous analysis examined the impact of varying sampling intervals on approximating human activity space at an aggregate daily level (i.e., one activity space per day per individual). Next, we apply a time-geography approach to estimate the activity space (accessible areas) during movement at a finer scale (i.e., between consecutive tracking points per trip). Figure 13 illustrates an example of the PPAs for a moving individual that was tracked at 60 s per fix. A PPA is shaped by a pair of consecutive tracking points, a time budget (i.e., sampling interval in this case), and the maximum speed capacity during a specific time interval which is estimated by a floating average of speed over an exponential kernel. For mathematical definitions and implementation of PPA, the reader can refer to Miller (2005) and Dodge et al. (2021).
Figure 14 illustrates the impact of varying temporal scale on computing the area of PPA. It shows that overall, using a coarser sampling rate the data generates bigger PPAs in spite of different transport modes people use. This is mainly because a larger sampling interval allows more time budget (equals to the sampling interval between two consecutive fixes) which is a critical parameter to build a PPA. In addition, the magnitude of the area of PPA varies largely among different modes as the maximum speed capacity for each mode is substantially different. Specifically, people using auto, transit bus, and rail transport modes are more likely to have larger size of PPA. The rail transport mode has the largest size of PPA because it usually runs on designated tracks with a constant speed and less likely to be impacted by traffic. However, this does not necessarily mean that the individuals traveling by rail have access to all locations captured in the PPAs due to the restricted nature of rail transport, but in general they can cover longer distances given a larger time budget. One might need to refine the PPA approach that accounts for the rail networks or resort to different approaches to approximate the accessible locations of mobile individuals if they travel by rail (e.g., develop station-based and schedule constrained PPAs). People using walk and bike modes usually generate a smaller size of PPA because the maximum speed by walk and bike is relatively lower compared to the other modes. The Dunn’s test results of auto indicate that the mean values of the area of PPA are significantly different (p < 0.05) from each other when the sampling interval is within the range of 3 s and 600 s. As the sampling rate becomes coarser (600 s, 900 s, 1800 s), no significant difference is found (p > 0.05). For transit bus, railway transport, and bike modes, the mean values are significantly different (p < 0.05) from each other when the sampling interval is within the range of 3 s and 180 s. As the sampling interval increases from 180 s, most of the groups are found not significantly different (p > 0.05). In terms of the walk mode, we find each group is significantly different (p < 0.05) from each other when the sampling interval is within the range of 3 s and 600 s. In general, as the temporal scale becomes very high, no significant difference in the mean values of the area of PPA can be identified. This is because many trips would not last longer than 900 s (15 min). Therefore, only the origin and destination will be retained for these short trips in the process of down-sampling. Eventually, the estimated PPAs of these short trips will be the same at very high temporal scales when the sampling interval exceeds the trip duration.
Human interaction analysis
The goal of this experiment is to understand the impact of temporal scale on identifying potential concurrent and delayed interactions between moving individuals. Two different approaches are applied to identify potential interactions: (1) a proximity-based approach which uses a spatial buffer and a temporal window to identify space–time contacts between mobile individuals, and (2) a time-geographic-based approach named ORTEGA, proposed in Dodge et al. (2021), which uses the PPA to find potential concurrent and delayed contacts between mobile individuals. It is important to note that the identified contacts by both approaches can only be considered as potential interactions, and individuals may or may not socially or physically interact when they come into close contacts spatially and temporally. Guided by the Dodge et al. (2021) study, the proximity-based approach is implemented by intersecting spatial buffers of a 100-m distance threshold centered on synchronous GPS tracking points of two individuals. To relax the restriction of requiring synchronous fixes when identifying concurrent interactions, a time window of 5 min is allowed when determining if two spatial buffers are intersected. For the PPA-based approach, two individuals are considered to have a potential interaction if their PPAs are intersected. Even though the PPA-based approach does not require a predefined distance buffer or a set time window we allow the same 5-min time lag for intersection in this experiment to make it comparable with the proximity-based approach. That is, if the time intervals of the two intersecting PPAs of the two individuals are overlapped synchronously or have a 5-min time lag, we identify it as a potential interaction (i.e., concurrent interactions). If the PPAs of the two individuals are intersected but the time lag is longer than 5 min, we consider this as a delayed interaction. The frequency of interactions among a group of people is then quantified by the count of pairs of individuals that have interacted with each other during the survey day for both the proximity-based and PPA-based approaches. The outcomes of these two approaches at different temporal scales are then compared as described below.
Considering that people living far away are less likely to interact, instead of taking the whole sample of CHTS, we only use the portion of the data that is collected in Santa Clara county, where we have the largest number of respondents as a case study. This portion of the data contains GPS tracking of 850 persons from 380 households that were collected from February 3, 2012 to January 31, 2013. Given the information of household composition, we separate the detected interactions by interactions that happened among persons in the same household and interactions that happened with persons that are not from the same household to incorporate the social relationship factors in human interaction analysis. Individuals within the same household tend to travel together and interact more, while individuals from different households might have occasional social interactions and various random contacts.
Figure 15 illustrates the impact of temporal scale on identifying concurrent interactions using the PPA-based approach versus the proximity-based approach between individuals of the same household and different households. In general, both approaches are sensitive to the temporal scale. The proximity-based approach can identify a greater number of contacts within household interactions when the temporal resolution is 1 min as opposed to coarser sampling rates beyond 1 min. Specifically, the total number of within household interactions identified is 367 at 1-min interval, whereas this number drops to 361 at 5-min interval, and to 359 at 10-min interval and remain unchanged at 20-min and 30-min intervals. Likewise, the proximity-based approach can identify 73 outside household interactions at 1-min interval while substantially fewer number of outside household interactions are identified when sampling interval increases (i.e., 6, 2, 2, 2 outside household interactions are identified at 5 min, 10 min, 20 min, 30 min interval, respectively). On the contrary, the PPA-based approach can identify substantially more outside household concurrent interactions when coarse temporal scales are used. This is mainly because using larger sampling intervals results in bigger PPAs which are more likely to intersect with each other. But the total number of within household interactions remains stable as the sampling interval increases. People living in the same household are more likely to travel together during a day (e.g., parents escort children to school) (Lee and Goulias 2018), which increases the chance of physical contacts with each other. Even when the sampling interval is increased, these people are still very likely to be tracked simultaneously. Hence, the PPA-based approach and the proximity-based can identify very closely the number of within household interactions and the numbers remain stable along with the increasing sampling intervals. Unlike people living in the same household, people from different households are less likely to travel together and their physical interactions are often unintentional (e.g., two strangers pass by each other on the street) and asynchronous (e.g., two persons visit the same grocery store at different time). The proximity-based approach is less effective in this situation because it requires a synchronous and predefined distance buffer threshold which is set by the user (such as 100 m in this case) to determine if two moving entities come close to each other. However, the PPA-based approach does not have such constraint as it is based on the actual locations accessible to people, their time budget and speed capacities. All of these parameters can be extracted from the given data without a need for a user defined threshold. Therefore, we observe more outside household concurrent interactions identified by the PPA-based approach as opposed to the proximity-based approach. As shown in Dodge et al. (2021), the outcomes of the proximity-based approach is also sensitive to the buffer size. Specifically, they find that by applying 500 m distance buffer, the proximity-based approach can identify almost the same number of concurrent intrahousehold interactions but interhousehold interactions are still underestimated. Employing a distance buffer larger than 500 m may not be reasonable as the potential concurrent interactions are defined as close contact between moving individuals in this study and individuals stay away from each other more than 500 m should not be considered as potential concurrent interaction.
We next focus on the impact of temporal scale on identifying delayed interactions using the PPA-based approach. A delayed interaction is context-specific and a meaningful temporal delay can vary in different applications. For example, in the context of COVID-19 transmission, interactions with a delay of up to 30 min may be considered critical or risky contacts as the droplets may stay in the air for a considerable amount of time. As shown in Fig. 16, allowing a longer time lag can identify more delayed interactions within households as well as outside households. This is reasonable because allowing a longer time lag increases the chance of intersection between two PPAs. However, the increase in the amount of identified delayed interactions outside households is substantially higher than the interactions within households. This is mainly because people living in the same household are more likely to interact synchronously. Therefore, allowing longer time lag does not help much with identifying delayed interactions within households. Nevertheless, people from different households are more likely to have asynchronously interactions, resulting in more delayed interactions when allowing longer time lags. In addition, using different sampling intervals of the data seems to have no significant impact on identifying delayed interactions between individuals of the same household, whereas more delayed interactions between individuals of different households can be identified if a coarser sampling rate of data are used. This is similar to our previous findings regarding concurrent interactions. Presumably, people living in the same household are more likely to travel together and increasing sampling intervals results in larger PPAs but does not impact very much the chances of intersecting between PPAs. On the other hand, larger PPAs of individuals from different households as a result of a coarser time scale increases the probability of intersecting PPAs.