Introduction

Sensor data has the potential to introduce new depths to travel survey by reducing response burden, human error, and subjectivity. In transportation planning, health studies and ecological research, data from Global Navigation Satellite Systems (GNSS) is now well into its second decade of use. Collection of individual traces using respondents’ own mobile devices rose with the ubiquity of sensor-equipped mobile device. Research shows that this individual-level sensor data has the capacity not only to collect mobility data, but to provide a basis for behavioral interventions (Batool et al. 2022; Cellina et al. 2019). Despite this fact, most publications represent field test and introductory apps (Allström et al. 2017; Chambers et al. 2017; Marra et al. 2019; McCool et al. 2021). In fact, a recent SWOT analysis of Smartphone-Based Travel Surveys (SBTS) found that the majority of research teams discontinued their applications following the initial project (Pronello and Kumawat 2020). Only recently have SBTS advanced to second rounds of data collection, or undertaken research beyond their own feasibility (Molloy et al. 2020; Patterson et al. 2019; Axhausen et al. 2020).

Because SBTS aim to collect data both at a high frequency and over a lengthy period, almost all studies encounter problems with missing data, often wholly outside the control of any involved party (Wang et al. 2018; Gadziński 2018; Harding et al. 2020; Xie et al. 2020). While many researchers opt to remove cases or periods of time containing sparse data, this both risks biasing results and functionally reduces sample sizes in a field where participation is already limited (Körner 2012; Wang et al. 2019). Researchers who aggregate the data to remove the spatiotemporal nature discard the potential benefits inherent in this novel method of data collection, and may still introduce bias (Baratchi et al. 2014). Robust methods of handling missing data arise independently from any of six different fields: statistics, machine learning, transportation, engineering, geoscience or computer science (Chen et al. 2016; Harrison et al. 2020; Servizi et al. 2020; Shen et al. 2014). Each field may have its own terminology and set of underlying assumptions, leaving researchers in search of best practices to parse through disparate partial solutions.

Crucial to avoiding the propagation of biases biases from the raw data is quantifying the impact of the missing data. Quantification can be thought of as a multi-step process. First, key outcome measures should be identified, as these guide the decision making process. Second, it is possible to establish a relationship with some mechanism available in the incomplete data, such as the amount of missing data or the length of the gap. A third step would allow trip characteristics to develop this relationship further. Step four is the estimation of the missing measures of interest, which must be based on the relationships uncovered in previous steps, given the selected complete data. Finally, we may allow for steps three and four to vary across individual features.

In this paper, we detail steps one through three. The remainder of the introduction discusses causes and existing solutions for missing data, while Sect. 2 outlines methods for describing the extent of missingness in the raw data and pre-processing it for analysis. Section 3 explores the relationship between bias in mobility metrics and missing data by inducing missingness into complete mobility data. Finally, Sect. 4 discusses the results of the simulation, and suggests a methodology for establishing the limits at which simple mechanisms for addressing gaps in the data begin to fail.

Causes of missingness in passively-collected location data vary in both origin and relative impact (Hecker et al. 2010; Shen and Stopher 2014). Mechanisms due to signal loss, such as blocked line of sight, or the "cold-start" problem, often produce small gaps. Others may produce longer gaps, obscuring one or more trips within a day, by causing the device to stop transmission. This may come from battery drain, termination of the app by the device or user, powering off of the device, or entering into hibernation mode. Some of these may take considerable time to resolve, which can lead to multi-day gaps. Lastly, device incompatibility and respondent willingness can lead to missingness at the user-level.

The simplest cause of signal loss is interrupted line of sight. In a study comparing GNSS-generated trajectories with users’ recorded travel diaries, 15.7% of the GNSS trajectories contained at least one instance of blocked line of sight (Stutz 2019). This tends to be related to particular location-related circumstances, such as traveling through an underpass or tunnel. In these cases, there will be a loss of signal transmission, leading to no data being recorded for the length of time during which the GPS satellite and the phone are unable to establish a connection. If the gap is linearly interpolated by connecting the coordinates immediately preceding and following this gap, this assumes a path that is perfectly straight in the intervening time. As tunnels and underpasses are often constructed with the shortest distance in mind, the expected impact on distance or number of discrete travel events is minimal, but any deviation of the true path from this linearity will bias estimates that are based on the underlying behavior. On the other hand, a true straight path may show little to no bias, even if the signal is blocked for a much longer period, as might be the case with a train traveling through a tunnel.

A secondary physical cause is the so called "urban canyon," which may occur in areas where many tall buildings are situated close together, and have the potential to block the line of sight (Chen et al. 2010). Similarly, urban environments can contain "black holes," or areas in which all entering trajectories may disappear, as documented by Hong et al. (2015). Unlike missing data caused by a short tunnel, travel behavior within urban canyons or black holes is not restricted to a generally straight path. A slight downward-biasing of distance and radius of gyration would be expected in data that contain missingness due to urban canyons, and stops within the built-up area may be lost entirely. Urban canyons can also lead to noisy data, when reflected satellite signals cause erroneous triangulation.

The cold start problem is another common cause of signal loss, with one study demonstrating that 27.5% of all trajectories contained a cold-start period (Stutz 2019). Because someone’s location is usually determined by Wi-Fi sensors when indoors, and by GPS outdoors, the start of a trip is often missed during the hand off at the boundary. The process of identifying a sufficient number of GPS satellites in order to accurately record a position is not immediate and can take anywhere from 20 seconds to 12.5 min (Langley 2015) to provide an initial position. Someone can range quite far within this time period, and, unlike in the tunnel situation, is unlikely to be following a strictly linear path from the point at which they left the building, to the point at which the signal is regained. A downward bias of estimates of distance traveled and radius of gyration in calculating naive statistics with linear interpolation is expected. When the underlying travel behavior represents a round trip that is sufficiently short, an entire trip may be lost.

While map features and data characteristics may allow for distinguishing the causes of short gaps, the generation mechanisms behind long gaps are more difficult. A common pattern of missing data occurs when the device itself ceases acquiring data on behalf of the application. This can be as a consequence of user intent – the user has shut off their phone or closed the application. This can also be unintentional, and part of the design of the operating system. Android operating systems and iOS operating systems alike have both introduced measures to limit battery drain when a phone user is not directly engaged with the device, called variously Doze mode or Hibernation (Bähr et al. 2022; McCool et al. 2021). Additionally, some phone manufacturers include versions of the Android operating system that will aggressively kill apps, preventing them from running in the background (Zhou et al. 2020). Consequences vary with respect to the length of the gap and the cause of the gap. A lengthy period of missing data followed by a period of activity may indicate that the device was in hibernation or doze mode. Yoo et al. (2020) report higher levels of sparsity in the nighttime hours, likely due to this OS behavior. In this case, the impact on distance, spread of activity (measured as Radius of Gyration), and discrete trip events is likely to be minimal. However, there is little to distinguish this from the case in which the app has been closed, either by the user or the OS, and subsequently reopened by the user. This has the potential to obscure extensive travel behavior, and may produce large downward biases in distance traveled and radius of gyration, and will often miss trips and stops. The longer the gap, the more influential it is likely to be.

Devices that cease recording data due to battery discharge may demonstrate similar data patterns, but may be identifiable if the app records battery life history. This is more likely to occur during a trip, leading to missing trip ends. As the location updating process tends to be in itself draining on the battery, longer trips are both more likely to use battery, as well as prohibit charging for longer periods of time. If data collection begins again, this is likely to occur at a known stop, such as home or work.

These situations lead to gaps in a user’s location history that can be quite large, ranging from hours to days or weeks. Lacking a model for predicting and contextualizing the interim period, accurately accounting for the bias becomes impossible.

Previous studies have proposed addressing missing data within the SBTS in certain ways. Prelipcean et al. (2015) used manually-generated trip diaries to fill gaps in passively-generated data to establish a joint ground truth. Meseck et al. (2016) filled each gap with the median location of the preceding twenty coordinates. Bihrmann and Ersbøll (2015) performed multiple imputation on aggregate measures of interest. Huang et al. (2020) implemented fuzzy c-means imputation on missing taxi gps data to construct missing segments within the trajectories. Barnett and Onnela (2018) and Zhao et al. (2021) sampled from existing trajectories to fill in gaps. Schuessler and Axhausen (2009), Bierlaire et al. (2013) and Li et al. (2021) use map-matching methods to improve sparse data. Others, such as Nawaz et al. (2020) and Liu and Onnela (2021), use other methods of probabilistically establishing what occurs within gaps. Table 1 provides an overview of some recent methodologies.

Table 1 Existing solutions for missing data

While methods have been proposed for managing these gaps, nothing is currently available as a benchmark to researchers looking to assess the extent and composition of their own missing data, in order to guide the choice of when and how to apply these methods (Yoo et al. 2020; Zhao et al. 2018; Hwang et al. 2018). We fill this research gap by simulating missing data with different characteristics based on real travel survey data. In the simulations, we vary gap length and density. Doing so, it is possible to set lower thresholds for when missing data becomes problematic. This critical first step enables selection of an imputation mechanism on the basis of research goals and data availability.

In this paper, we evaluate the method of linear interpolation for addressing gaps of varied sizes, under the assumption that certain features may define gaps where it is likely that users have followed a mostly linear path. We distinguish this from map-matching methods, which may be applied in a subset of these situations, but which are more complex, and often unavailable for pedestrian or bike routes.

Methods

When location data are collected, they are sampled from an underlying continuous trajectory. Two consecutive sampled points will be separated both by distance and by time. The shorter the time interval between the sampled points, the more accurately the continuous trajectory is approximated. A consequence of this discretization of the continuous trajectory is that all location history data contain missingness due to the nature of sampling. This limits the extent to which GPS traces can be categorized either as wholly complete or wholly missing. Instead we propose a metric to establish the impact of potential missing information between successive points.

Sparsity

By discretizing a respondent’s total observation time, \(\mathcal {T}\), into a number of same-length intervals of length \(\tau\). \(\tau\) becomes the temporal resolution of our missingness analysis. \(\tau\) must be chosen to reflect the goals of the eventual analysis, reflecting an interval that is short enough to preclude missing impactful behavioral changes, but long enough to encompass the sufficient sampling interval.

The discretization of \(\mathcal {T}\) into intervals of length \(\tau\) leaves us with \(\frac{\mathcal {T}}{\tau } = T\) intervals, 1, ... , t.

Each interval in a user’s trajectory can be assigned an indicator representing presence, \(r_t = 1\), or absence, \(r_t = 0\), of at least one record during the time period. The proportion of \(r_t = 0\) relative to T provides a measure of sparsity with respect to \(\tau\), parameterized as q in Equation 1.

$$\begin{aligned} q = \frac{1}{T} \sum _{t = 1}^{T} r_t = 0 \end{aligned}$$
(1)

We can extend this measurement of sparsity across persons, units of time (e.g. days, weeks), and states (e.g. traveling, stationary). We speak of N persons, \(i = \{1, 2, \dots , N\}\). Each person i has data occurring in \(J_i\) units of time, \(j = \{1, \dots , J\}\), and \(K_{ij}\) states, \(k = \{1, \dots , K\}\). Each state \(k_{ij}\) contains \(T_{ijk}\) intervals, \(t = \{1, \dots , T\}\). Let \(r_{ijk}\) represent a binary indicator of any record for discrete time period t in state k in time interval j for person i. This leads to the following full equation for sparsity shown in Equation 2.

$$\begin{aligned} q = \frac{1}{N}\sum _{i = 1}^{N}{\frac{1}{J_i}\sum _{j_i=1}^{J_i}{\frac{1}{K_{ij}}\sum _{k_{ij} = 1}^{K_{ij}}{\frac{1}{T_{ijk}}\sum _{t_{ijk} = 1}^{T_{ijk}}{r_{ijkt} = 0}}}} \end{aligned}$$
(2)

Segmentation

Location data are generated as a time-stamped sequence of points which reflect a geographical position at a moment in time. These coordinates reflect someone’s continuous location history with a set of discrete points which make calculations on underlying trajectories costly and conceptually difficult. If the data can be reduced in size without sacrificing information, it is possible to simultaneously reduce both the computational complexity as well as the number of assumptions that must be made about individual locations. One method of data reduction is by partitioning time-stamped trajectories \(\{p_0, p_1, ..., p_i, ..., p_n\}\) into straight line segments \(\overline{p_0, p_i}\) \(\overline{p_i, p_n}\) that sufficiently represent the path (Lee and Krumm 2011). These line segments can then be used for calculations on properties of the underlying trajectory such as speed or distance. They can additionally be more easily compared against other trajectories consisting of line segments in order to identify similar paths and come with the added benefit of reducing measurement error common to GPS navigational systems.

Methods of segmentation vary, but most follow along with the well-known Ramer-Douglas-Peucker algorithm’s method of creating new segments based on the magnitude of discrepancy between the proposed segment and the point that it should represent (Ramer 1972; Douglas and Peucker 1973). Figure 1 illustrates the original Ramer-Douglas-Peucker algorithm in which endpoints are successively introduced within the trajectory at the point with the largest perpendicular euclidean distance. This simulation study implements the Top-Down Time Ratio algorithm outlined in Meratnia and de By (2003). This is an extension of the Ramer-Douglas-Peucker algorithm in which endpoints are selected on the basis of largest spatial euclidean distance. A segment is generated between the first location in a trajectory and the last location in the trajectory. Each recorded location point between the two segment ends is given a proposed new point along the generated segment, with respect to the elapsed time. The distance is calculated between the recorded point and this pseudo-sampled point. The point lying furthest from its segment is selected to form a new segment end, whereupon the process begins again. Algorithm 1 describes the process as implemented in this study.

Fig. 1
figure 1

Ramer-Douglas-Peucker Algorithm

The algorithm is iterative and if allowed to run indefinitely will create \(N - 1\) segments for N points. In order to be useful it must be given a stopping criteria such as the number of segments or the maximum error distance between the recorded points and the adjusted points that lie along the line segment. The smaller the error, the more information is preserved, and the more segments are created. Too many segments will reduce capacity for later imputation. It is therefore important to choose stopping criteria such that we find a balance between the two opposing aims.

figure a

Metrics of interest

The relative impact on commonly used mobility metrics was used as an evaluation criteria for outcome comparison. Trajectories were decomposed into stop and move sections using an implementation of a rule-based stop classifier, as described in Montoliu et al. (2013). Algorithm 6 details our implementation, which notably lacks an upper limit to stop length which was not necessary given the maximum trajectory length of 24 h. Individual subsequent stops were merged into single stops if their centroids were less than 100m distant in order to reduce the number of incorrectly differentiated stops.

Distance metrics were calculated using the Haversine method (Robusto 1957). To arrive at total distance, the distance between all segment endpoints was calculated for the entire trajectory. Moved distance involved summation of distances between all segment endpoints when segmentation was performed on move events only. Similarly, total move time was established by summation of elapsed time for each move event segment.

Radius of Gyration (RoG) is calculated as the root-mean-square time-weighted average of all individual locations during an individual’s 24-h period, as shown in Equation 3. It was necessary to weight by time due to the unequal frequency of location collection, since the higher density movement trajectories would otherwise inflate estimates of the metric.

$$\begin{aligned} \sqrt{\frac{\sum _i{w_j \times dist([{\overline{lon}}, {\overline{lat}}], [lon_j, lat_j]})}{\sum _i{w_j}}} \end{aligned}$$
(3)

Where \({\overline{lon}}\) and \({\overline{lat}}\) are defined respectively as \(\frac{ \sum _j w_j lon_j}{\sum _j w_j}\) and \(\frac{ \sum _j w_j lat_j}{\sum _j w_j}\) and \(w_j\) is a weighting element, representing half the time interval during which a location was recorded \(w_j = \frac{t_{j+1} - t_{j - 1}}{2}\).

Relating bias and sparsity

Motivating example

As a motivating example, we consider the data collected from a 2018 field test of the Statistics Netherlands travel app. This field test concerned 1902 sample persons aged 16 and older. The sample was evenly divided between a new random sample taken from the Dutch population register and a secondary group of respondents randomly sampled from participants who had participated in the study ODiN in the two months preceding the field test. ODiN is an online-only study of individual mobility in the Dutch population (Centraal Bureau voor de Statistiek 2022).

Both groups of respondents were contacted via post with a request to download the application onto their personal mobile devices, register using the enclosed personal username and password, and record seven days of movement behavior. Full details on app methodology and data structure are available in McCool et al. (2021).

While the app was running on the phone, it captured a participant’s location once per second while the person was determined to be in motion, and once per minute while the person was determined to be stationary. This determination was based upon an algorithm that assessed whether or not the displacement between recorded intervals exceeded thresholds indicating movement behavior.

Collectively, a total of 2087 person days were recorded amongst 576 participants. The mean length of participation was 13.3 days, and the average number of hours with location information in a day was 8.2 h. The large percentage of missing data rendered calculation of the summary statistics of interest, such as number of trips and distance, difficult without careful consideration of the underlying mechanisms leading to the missing data.

Missing data in Statistics Netherlands travel app

Because gap times differ, the choice of \(\tau\) impacts assessment of overall data sparsity. The distribution of sparsity within the data set was evaluated with \(\tau\) was set variously to be 1 min, 5 min, and 15 min. Because the sampling interval while stationary was set to 1 min, reducing our temporal interval to the width of the sampling interval does not allow for the same level of discrimination between persons. A temporal resolution of 15 min may be too large to preclude non-negligible travel behavior. Figure 2 shows the distribution of \(q_i\) in the full data set under these three temporal resolutions. With a \(\tau\) of 1 min, a very small percentage of our data would achieve a sparsity \(< .5\). However, the difference between temporal resolutions of 5 and 15 min is less pronounced. When selecting for complete data, we do so on the basis of 5 min intervals.

Fig. 2
figure 2

Sparsity across \(\tau\), \(\mathcal {T}\) 7

Selection of complete data

A subset of data were selected where the \(q_{ij} |\tau _{5}< .05\) for a contiguous 24 h period. Figure 3 provides a graphical breakdown of the exclusion steps leading to this selection. In total, 185 persons representing 584 complete 24-h periods remained. As we intend for the simulation study with induced missingness to be generalizable to those persons with true missingness, we tested whether the persons with complete data were likely to be representative of the group as a whole. For this analysis, we make use of the fact that we have independently recorded data on travel behavior from the ODiN study for many individuals. It was possible to link 354 of the 360 from the ODiN sample that had provided at least some data.

Fig. 3
figure 3

Steps leading to selection of complete data

Of these 354, 114 recorded at least one complete day within the app (group CD), and 240 did not (group NC). Groups were compared on three measures of interest from the proposed simulation study using a 2-sample permutation test and \(10^4\) iterations. Group CD recorded less active travel time (\(\mu\) = 77.0, \(\sigma\) = 58), than group NC (\(\mu\) = 93.9, \(\sigma\) = 77), \(p = .02\). Travel distance was similar between group CD (\(\mu\) = 44.4, \(\sigma\) = 62.2) and group NC (\(\mu\) = 54.7, \(\sigma\) = 71.5), \(p = .17\), as was mean number of trips for group CD (\(\mu\) = 3.5, \(\sigma\) = 2.3) and NC (\(\mu\) = 3.3, \(\sigma\) = 2.0), \(p = .47\). There may be some indication of differential travel behavior between groups CD and NC, with the group with more complete data more likely to have mobility behavior that reflects more time spent with the app actively recording locations once per second.

Selection of an error parameter for segmentation

A simulation study was conducted on the subset of complete set of data, \(q_{ij} < .05\) in order to determine the relationship between the selected error level of the stopping parameter and the distance covered. Baseline comparison was to an unfiltered error parameter of 1 meter. Error conditions ranged between 1 and 150 meter tolerance. The data were either unfiltered, mean filtered, or median filtered prior to segmentation, as described in Lee and Krumm (2011).

Results from the simulation study demonstrate a relationship between total distance and maximum error that is dependent upon the number of move states that a person has entered. While there is a clear negative relationship between the maximum error and the relative total distance, it is non-linear. Figure 4 demonstrates this complexity. We expect a very small amount of true movement when a person is stationary, so an appropriate error parameter in this case is one that reduces the relative distance to zero. An "elbow" at the error tolerance of 20 meters in the stationary condition indicates a bottoming-out of noise-reduction. Higher error parameters would reduce this number further, but at the cost of perhaps erroneously reducing the distance during true movement behavior. The median relative distance in the true movement cases at an error parameter of 20 meters is approximately 90%, which aligns with previous findings (Palmer 2008; Ranacher et al. 2016).

Fig. 4
figure 4

Distance comparison for max error tolerances across differing number of true moves

Simulation study design

Data

The set of data \(q_{ij}|\tau _5 < .05\) was divided into individual 24 h periods for improving comparison between users. A user with one four-day contiguous set afterwards had four sets of 24 h, with any remainder discarded. The 24 h period began from the first measurement for which all subsequent measurements in the period had no gaps greater than 5 min. The raw data were subsequently cleaned according to the stop detection protocol implemented in the original mobility app. Data were retained if the estimated accuracy provided by Android or iOS was under 80 meters. It was necessary to select a sufficiently large accuracy in order to incorporate data acquired via Wi-Fi triangulation on iOS as this defaults to 65 meters. Selecting a suitably low accuracy effectively removes cell tower-based locations and locations for which there are an insufficient number of navigational satellites visible to establish a reliable position. Before introduction of missing data, \({\bar{q}} = .001\), with a range of \(0 \text {--} .03\). 187 users were retained, representing 362 contiguous periods, broken into a total of 584 24 h periods. The mean number of periods per user was 3.12, with a range of \(1 \text {--} 25\).

Simulating short gaps

In order to assess the impact of increasing levels of sparsity generated by small gaps, the first study introduced missingness to the data (completely) at random. This represents a situation in which the missingness was not functionally related either to mobility or the user. Each period was divided into 288 five-min time intervals. Sparsity was introduced at ten percent intervals, ranging from \(q = 0\), where no data were removed, to \(q = .9\) in which 90% of the 5 min intervals were excluded. For each period, this process was repeated 20 times at each q to allow for different portions of the data to be removed. This led to 180 versions of each set with varied patterns of missingness. Each version was linearly interpolated across gaps and segmented, followed by calculation of the outcome measures. Algorithm 2 describes the steps in detail.

figure b

Simulating long gaps

Functionally, short gaps reduce the overall density of trajectories while maintaining overall mobility characteristics. Long gaps at the same overall level of sparsity should induce a different pattern. Long gaps at increasing levels of sparsity are likely to remove whole trips and thereby decrease movement distance and RoG, which should meaningfully distort travel metrics at lower q than short gaps.

A simulation study was designed in order to test this assumption. q was induced at ten percent intervals, as in Sect. 3.2. Instead of removing data in 5-min intervals at random, a starting point was selected in the data, after which locations were removed in 2.4 h intervals, representing one tenth of a twenty-four hour day and thus corresponding to each level of q. In order to investigate the temporal characteristics, this process was carried out 24 times for each data set, selecting a starting point for each of the recorded hours. For iterations \(i > 2\), some q would reach the end of the 24 h period, in which case further removal started from the beginning of the 24 h period. This process is detailed in Algorithm 3.

figure c

Short gap sensitivity analysis

Analysis of the results from the short gap and long gap simulation studies indicated that gap length was important independent of the total sparsity in the data. We hypothesized that it should be possible to use the time length of the gap in order to discriminate between two types of missing data: those that can be ignored and solved with linear interpolation, and those that cannot be, terming them "short" and "long" gaps, respectively.

A third simulation study was conducted in order to provide a more in-depth look at the variation when gap lengths vary from 1 min to 15 min. For each complete data set, gaps were created between 1 and 20 min in length in increments of 1 min, in each of the 24 h in the data set. The range of q calculated after removal fell between .02 in the case where 1 min was removed from the start of each hour and .33, where 20 min were removed from the start of each hour. This is further described in Algorithm 4.

figure d

Results

Simulation study

Across all three simulation scenarios, we investigated the impact of linear interpolation as a method for addressing gaps in the data. Outcome variables of interest were chosen to represent variables commonly used in mobility research and included total distance, total travel distance, number of stop and move instances, radius of gyration, and total move time. Method of calculation for these metrics is described in Sect. 2.3.

Because the absolute metrics may differ by up to an order of magnitude between persons, comparisons were performed by evaluating the percentage difference of the metric in the interpolated data set as compared to the metric in the complete data set. Figures 5, 6, 7, and 8 group these percentages by box plot across the differing levels of sparsity in order to provide a robust summary of features of the distribution. The box spans between the upper and lower quartile values of the statistic, and the horizontal lines extends from these values through to 1.5 times the interquartile range from the median. The median is represented by the horizontal line through the box. Values extending beyond the horizontal lines are represented by individual points. Tables and figures evaluating percentage difference in movement behavior excluded participants registering no moves in their complete data, and all figures and tables excluded persons registering less than 200 meters of total movement behavior.

Short gaps

Fig. 5
figure 5

MCAR short gap analysis

The data with induced missingness were compared to the complete data set in order to evaluate the impact on metrics of interest. Table 2 and Fig. 5 show the decrease in moved distance and number of stops with increasing sparsity. At 30% sparsity, the mean distance retained is almost 90%, and the median distance retained is 93%. Only as sparsity levels exceed 60% does the median distance lost reach 20%. Similarly, filling gaps through linear interpolation fails to impact the median number of stops until sparsity reaches 50%. In fact, these short gaps become problematic only when they become long gaps, as two or more adjacent short gaps merge into one.

Figure 5 shows a relationship between distance metrics and sparsity that may be predictable in aggregate. Individual response demonstrates a considerable amount of variance – while it may be possible to predict the percentage of distance lost, number of travel behaviors, or total transit time based on the available data and sparsity, the uncertainty as q exceeds .3 in a naïve prediction would lead to confidence bands extending, in some cases, from a 50% increase to a 99% decrease.

Table 2 MCAR Short Gap: Median (%) absolute differences

Long gaps

The results of the second simulation study, designed to investigate whether or not the same method of linear interpolation worked for gaps of increasing length, can be seen in Fig. 7 and a brief summary of median results can be found in Table 3. An increase in q to .3 accompanies a downward bias of 12.7% in median travel distance. Median RoG remains relatively stable through a \(q = .4\), whereas the number of recorded stops becomes unstable with \(q > .3\).

Table 3 MCAR Long Gap: Median (%) absolute differences

Figure 7 demonstrates wide variability in response from individuals across all metrics. Removal of 50% data contiguously could remove the entirety of one respondent’s travel behavior within a day, while leaving all travel behavior intact for another.

Fig. 6
figure 6

Percent bias in calculated mobility metrics relative to sparsity

Figure 6 shows the differential relationship between increasing sparsity through induction of multiple smaller gaps versus increasing sparsity through increasing the length of a single gap. In all situations, long gaps demonstrate a more extreme departure from the ground truth than short gaps at the same levels of overall total missingness.

Fig. 7
figure 7

MCAR long gap analysis

Short gap sensitivity analysis

Table 4 shows some results from this sensitivity analysis. Depending on the level of acceptable data loss, it is possible to establish a maximum gap length from this table of results. For example, if loss of under 2% travel distance is desired, gaps under 5 min in length may be acceptably handled by linear interpolation.

Some differences emerge between the three simulation studies. In Sect. 4.1, an increase in total missingness was associated with a decrease in RoG, whereas removal of 1–15 min per hour, equivalent to a range of q from .01to.25, results in a small positive increase. This is attributable to the simulation design in which min were removed from the beginning of each 60 min period, starting with the first location entry. As users frequently engaged with the app for the first time while at home, this results in a small upwards bias of RoG, since a slightly higher proportion of home locations were removed. Additionally, gaps occurring at stop-move boundaries inhibit accurate determination of movement initiation, leading to a larger proportion of time associated with movement behavior on average and consequently a larger time-weighted RoG.

Table 4 Short gap sensitivity: Median (%) absolute differences

Figure 8 shows the relationship between the percentage difference on calculated mobility measures between the complete data and the data with induced missingness. Some metrics, such as number of stops and moves, or the distance traveled, respond well through gap sizes of 10 min. Other metrics, such as Total Move Time, become quickly unreliable, even with very short gaps.

Fig. 8
figure 8

Sensitivity simulation for small gaps

Importantly, while the point estimates may remain stable across all simulation study methodologies, the variance is considerable, with any one gap being more or less impactful depending on the sampled underlying behavior.

Covariates

Finally, we sought to establish a set of covariates that could impact the relationship between q and bias remaining after interpolation. Differential response across covariates could allow for extending the boundaries of what we consider the maximum acceptable gap time. On the other hand, similar response profiles can assure that no additional bias is introduced when filling short gaps through linear interpolation.

Mobility characteristics

The metrics of interest had a small impact on the percent bias introduced through interpolation. Figure 9 shows the mean bias of total distance, number of moves and RoG following interpolation across gaps at varying levels of sparsity under the long gap simulation condition. All three metrics demonstrate slightly lower bias when calculated in data sets where the true distance and number of moves was lower. We find the inverse relationship with true RoG in total distance and RoG estimation on the interpolated data sets. Across all three metrics, the absolute percent bias at \(q < .1\) is low across varied true trip characteristics.

Fig. 9
figure 9

Bias with respect to underlying mobility characteristics

Personal characteristics

The Dutch population register contains basic information on all individuals living in the Netherlands and registered with their municipality. This information was linked to the users who had participated in our study. Percent bias in travel metrics established in the long gap study were compared across several individual covariates. A full set of results are made available in Appendix B. The data establish minor relationships between covariates age, education and urbanicity and induced bias, but the overall effect sizes are small. Age, education and urbanicity are associated with differential mobility characteristics which demonstrate a stronger relationship with with percent bias in the travel metrics, potentially driving this relationship.

Time

Investigation of time as a metric was considered important, as both Android and iOS operating systems employ mechanisms for reducing device activity during times of lesser activity levels, contributing to long gaps during nights that are unlikely to contain travel behavior. The results from the Long Gap simulation study discussed in 4.1.2 provided a way to investigate the relationship between relative error and the time of day during which the gap occurs.

As shown in Fig. 10, hours between 21:00 and 04:00 produce unbiased estimates of total distance, number of moves and RoG even when gap length exceeds five hours. The low variance during these time periods suggests that overnight missingness may be appropriately resolved through unsophisticated methods that rely upon infrequent nighttime travel behavior.

Interestingly, there appear to be pockets during daytime hours that can bias results up to 25% with relatively short periods of missingness. These correspond with time periods in which people are more likely to engage in travel behavior, such as commuter traffic, with a distinct morning phase and diffuse evening phase. Missing data around noon is more likely to bias recorded move events than distance traveled or RoG, reflecting mobility patterns shorter both in duration and distance that occur during this time.

Fig. 10
figure 10

Bias with respect to gap duration and time of gap

Conclusion

Understanding the composition of the missing data is integral to making the correct decisions about its content. The composition can involve the length of the component gaps, the overall sparsity of the data, or the time at which the gaps begin or end. Working smartly within these boundaries, researchers can extend the use of data that might otherwise be excluded from analysis for being incomplete. In situations where the gaps are short—between five and ten min—even if they occur frequently, very little can perturb aggregate measures such as distance or Radius of Gyration. This makes linear interpolation an acceptable solution for gaps caused by interrupted line-of-sight instances. More intensive methods of addressing these short gaps, such as map-matching or imputation, are unlikely to offer large gains in these situations and the added complexity may hinder efforts to address the larger gaps.

We are limited to the metrics considered within this paper in our discussion of the consequences. One metric of great importance in mobility research yet not included in these analyses is the employed mode of transportation during move events. It may be possible that interpolation even across very short gaps has a negative impact on the prediction accuracy of mode of transportation. Additionally, although the metrics investigated within these simulations are shared across many fields collecting data on movement behavior, it is unlikely that our results will extend beyond data that is collected within the context of individual mobility.

It is clear that linear interpolation is a poor fit for addressing most long gaps. Radius of gyration and number of stops may see minimal impact if only portions of a trip are lost, as may be the case when a phone’s battery dies en route and is then charged at the destination. However, the same situation would almost certainly result in a downward biasing of distance metrics using the same method. The uncertain impact on any individual’s travel behavior necessitates the incorporation of data beyond the spatiotemporal aspects of the gap itself.

So what are the implications for analysis of time-location data in a travel survey? Our study shows that if the gaps don’t extend beyond 10 min in length, and if they are relatively infrequent, say below 15%, biases in main travel statistics can be acceptable. Or seen from the other side, when setting requirements on the coverage of confidence intervals for travel statistics, our study gives insights into what gap characteristics may be problematic.

Methods of addressing these long gaps are documented in the literature. Imputation with a user’s longitudinal data or with densely overlapping spatial data from other users are both promising methods using larger and longer data sources to account for the information lost. Although often users may contribute sufficient data to contextualize their own gaps, the longer the gap, the less data is otherwise available for this context. And while users in densely populated urban areas may contain sufficient overlap in their trajectories to aid in completing other users’ data, this is far from applying to all such collection opportunities, where it may be of interest to distribute a limited budget such that a wider range of geographical areas may be covered. External data sources providing information on the individual such as frequented addresses, or integrated surveys on car- and bike-ownership are used to provide context for estimation. Land-use characteristics and transportation infrastructure data feature in other long gap methodologies. Additionally, smart phones sensors like accelerometers and gyroscopes can be used to identify movement, travel activity and mode of transportation and may additionally be of use in both contextualizing and handling missing data.

A clear path emerges for future research. Researchers need a comprehensive plan for addressing short and long gaps in the most contextually appropriate manner. This paper proposes that linear interpolation is an acceptable manner for short gaps, and defines short gaps as less than 10 min in length for the study of human mobility. Existing methods for addressing long gaps universally incorporate larger data and varied information sources. For researchers, however, the decision on which methods are appropriate for their particular solution remains opaque. The integration of multiple approaches using all available resources is a direct if not uncomplicated next step.

As GPS technology improves, researchers can expect some challenges of location data to fall away. As the number of positioning satellites increases, and technology for triangulation improves, the accuracy of individual locations will certainly improve to create trajectories with less noise. New generations of satellite systems and receiving chips purport sensitivities that can travel through walls, reducing some line-of-sight based causes of missing data. But many causes of missing data are likely to remain for years. Battery capacity has improved year-on-year, but battery life has not as the demand for what a mobile phone must do has grown concurrently. Android versions are becoming more rather than less likely to kill an app, and iOS devices remain similarly opaque on when they are allowed to perform operations in the background. While undoubtedly a positive trend for users, a growing focus on user privacy may indirectly impact researchers as users are not aware of the switch to opt-in versus opt-out location options on their device.

Missing data is unlikely to be a solved problem for researchers in the near future and robust methods of addressing the missing data are integral to unlocking the potential of this new technology.