Distributions of Human Exposure to Ozone During Commuting Hours in Connecticut using the Cellular Device Network

Epidemiologic studies have established associations between various air pollutants and adverse health outcomes for adults and children. Due to high costs of monitoring air pollutant concentrations for subjects enrolled in a study, statisticians predict exposure concentrations from spatial models that are developed using concentrations monitored at a few sites. In the absence of detailed information on when and where subjects move during the study window, researchers typically assume that the subjects spend their entire day at home, school or work. This assumption can potentially lead to large exposure misclassification. In this study, we aim to determine the distribution of the exposure misclassification for an air pollutant (ozone) when individual mobility is taken into account in contrast to assuming that subjects are static. To achieve this goal, we use cell-phone mobility data on 388,972 AT&T users in the state of Connecticut during July, 2016, in conjunction with an ozone pollution model, and compare individual ozone exposure assuming static versus mobile scenarios. Our results show that exposure models not taking mobility into account often provide poor estimates of individuals commuting into and out of urban areas: the average 8-hour maximum difference between these estimates can exceed 80 parts per billion (ppb). However, for most of the population, the difference in exposure assignment between the two models is small, thereby validating many current epidemiologic studies focusing on exposure to ozone.


Background and Motivation
Epidemiologic studies have established associations between various air pollutants and a number of adverse health outcomes for adults and children. Air pollutants, such as groundlevel ozone (O 3 ), particulate matter, carbon monoxide, lead, sulfur dioxide, and nitrogen dioxide, have been shown to worsen health outcomes such as heart rate variability, cardiopulmonary mortality, acute myocardial infarction, low birth weight, development and exacerbation of asthma, reduced lung function, and acute respiratory symptoms [Pope III et al., 2002, 2004Peel et al., 2005;Zanobetti and Schwartz, 2005;Chen et al., 2006;Brauer et al., 2007;Delamater et al., 2012;Pedersen et al., 2013;Sacks et al., 2014].
These associations between exposure to air pollutants and adverse health outcomes have been established using various epidemiologic study designs. Some studies have been conducted on an ecological scale, where the unit of analysis is at an aggregated level (such as counties or census tracts). In these studies, an aggregate measure of a health outcome (such as cause-specific mortality rate over time) is correlated with an aggregate measure of pollutant exposure over the study area [Peel et al., 2005;Chen et al., 2006;Delamater et al., 2012;Sacks et al., 2014]. While these studies are useful in determining associations on the population level, they are subject to the ecological fallacy so that conducting inference on an individual level is problematic. Other studies have utilized case-control or cohort designs (prospective and retrospective), that have conducted the analysis on an individual scale [Gent et al., 2003;Brauer et al., 2007]. In such studies, health data are available on individuals (often in great detail), which are then correlated with data on individual level pollutant exposure. However, assigning pollutant concentration exposure to individuals is rather challenging. Giving each study participant an air pollution monitor to carry with them all day during the study duration is extremely expensive and impractical. Therefore, epidemiologic studies analyzing the effects of air pollutants on adverse health outcomes at an individual level typically estimate pollutant concentrations at subject residences or work places using various approaches (that vary in their level of sophistication) [Pope III et al., 2002;Gent et al., 2003;Jerrett et al., 2005;Brauer et al., 2007;Pedersen et al., 2013].
A common approach to estimate pollutant concentration at a location of interest (e.g. subject residence) uses observed or monitored air pollutant concentration data at limited sites and time durations to predict concentrations at unsampled sites and time durations. This prediction may be achieved using some very simple approaches such as assigning the same concentration as the closest monitored site or an average from a few closest sites [Rage et al., 2009], or estimating via techniques such as inverse distance weighting [Neidell, 2004]. More sophisticated approaches include land use regression models [Turner et al., 2016] and spatial/temporal interpolation techniques using geostatistical models of varying complexity, e.g. universal kriging [Jerrett et al., 2005;Rage et al., 2009].
Another popular approach for assigning pollutant concentrations to subject residences is using some deterministic computer model output that simulates either the underlying pollutant chemistry (e.g. the Community Multiscale Air Quality or CMAQ model [Byun and Schere, 2006]), or the dispersion process of air pollutants from their source(s) (e.g. CALINE4 [Benson, 1992]). Since these simulations are computationally rather expensive, such models provide predictions on a grid where each grid cell covers a large spatial area (e.g. CMAQ provides predictions on grid cells of size 12km x 12km). Epidemiologic studies utilizing these models determine the cell within which a subject residence or work place is located, and assign the corresponding pollutant exposure to the subject. Some recent studies have combined these two approaches together to develop data fusion models. In these models, data from both observed concentrations at limited sites and outputs from deterministic computer models are jointly used to provide predictions at unsampled locations at a fine spatial and temporal resolution [Sacks et al., 2014;Turner et al., 2016].
Regardless of the sophistication of the method used to assign pollutant exposure to subjects using their residential or work location, a fundamental challenge that is not addressed is the fact that subjects do not spend all of their time at home or at work. Humans move around during the day, and the patterns of movement likely depends on a number of factors such as time of day, day of week, season, employment status, etc. By not taking mobility into account, current exposure models may suffer from significant bias in their estimates.
The goal of this study is to determine the distribution of exposure bias to an air pollutant (O 3 ) when mobility is not taken into account, and to identify groups of individuals for which this bias is large. To achieve this goal, we require information on patterns of human mobility. Mobility is well-captured by cell phone data; however, most available cell-phone based mobility studies either require users to install an "app" to capture their location [Bayir et al., 2009] or rely on other, more opportunistic approaches [Calabrese et al., 2011]. These methods suffer from not sufficiently sampling at the population level and are likely not robust enough to generalize. The novel cell-phone telemetry data used in this consists of all AT&T cell devices that are turned on. Very few studies that quantify population-level human mobility at the census tract-level use cell data at the infrastructure level [Becker et al., 2013] and, to our knowledge, this is the first study that integrates pollution information with human mobility for an accurate exposure study at this scale. Our approach uses cell-tower level data to determine patterns of human mobility. We assign exposure to O 3 concentration to AT&T cell devices, which are used as a proxy for individuals, in the US state of Connecticut (CT) during a week in summer, 2016. Based on individuals' mobility behavior and a home-tower location, we are able to assess the exposure bias.
Knowledge of the distribution of O 3 exposure bias will be extremely beneficial for environmental epidemiologists studying the effects of O 3 exposure on adverse health outcomes. Minimal exposure bias due to a static pollutant concentration assignment would validate the results of past studies, while a significant exposure bias would provide evidence for the need for human mobility to be explicitly accounted for in future epidemiologic studies. This paper proceeds as follows. In section 2, we provide details of the cell-phone telemetry data. In section 3, we provide details on the data and model used to provide predictions of O 3 concentrations at the cell tower sites. In section 4, we provide details on the distribution of O 3 exposure misclassification, with concluding remarks in section 5. network capable of routing data traffic between cell devices or to internet-connected devices through a base station. U.S. cell phone networks facilitate the communication of hundreds of millions of devices, transferring vast amounts of data on any given day. This study is restricted to the state of Connecticut. Figure 1 shows the locations of the 9,514 AT&T cell towers in the state, along with the locations of the primary and secondary roads in the state. The data-traffic capacity of an individual tower is limited; to compensate for spatial locations with higher capacity demands, more towers may be erected in those areas. As a result, the spatial distribution of the towers matches the population distribution. This is seen in Figure 1, with most people residing in the southwest corner, the shoreline, and in metropolitan areas including Hartford, Danbury, Middletown, and Waterbury.  Figure 2 shows the distribution of the distance to the closest tower from each tower. The distribution is roughly exponential. The lower quartile, median, and upper quartile are 2.7, 68.2, and 306 meters respectively. Many tower locations are within meters of other towers because multiple towers are sometimes built at a single site. This is especially true in highly populated areas, like southwestern Connecticut, where there are more constraints placed on where a tower can be placed as well as the fact that more towers are needed to compensate for higher traffic demands. Eastern Connecticut is generally less populated than the central and southwestern portions of the state, and the towers in these regions may be several kilometers away from the next closest one.

Hand-off Trajectories in the Cell-Tower Network
A single cellular device connects to the network through a tower. Devices generally connect to the closest tower, however, this can vary due to geographic features, which can occlude communication, and the amount of traffic being routed through the towers. As a phone moves through the network, it is "handed-off" between towers. A hand-off is characterized by a unique (de-identified) device ID, a date and time, and the location of the tower to which the device was handed. Data on these hand-offs may be analyzed to evaluate traffic load, tower placement, and connection integrity. Sequences of hand-offs in time for a single device, which we will refer to as trajectories, can be used to locate the device, and the humans that carry them, up to the resolution of the order 10's to 1000's of meters, depending on the local cell-tower density.
This study considers devices with at least one hand-off, where all hand-offs for the device during the period of July 18-24, 2016 were within the state of Connecticut. A total of 45,919,777 hand-offs were recorded from 388,972 unique devices. A histogram of the number of hand-offs per device is shown in Figure 3. A total of 24,871 devices had a single hand-off. The mode of the histogram where the log number of counts is greater than zero is centered around 5 and implies that most devices had approximately 150 hand-offs during the period considered.

Mobility Diameter
Hand-offs provide user location to within several tens to hundreds of meters. However, the total distances between each tower in a trajectory is not always a good estimate of the total distance traveled by a device. As mentioned earlier, the geographic coverage area of individual towers varies by a number of factors including the topological characteristics of a tower's immediate area. It is also true that, when a device is between two towers it may "bounce" between the two, even when the device is stationary.
To quantify the total mobility of a device on a given day, we define the mobility diameter as the maximum distance between the set of all towers a device checks into on that day. The distribution of the mobility diameter (on the log scale) for all devices in the study is shown in Figure 4. The diameter appears to drop off exponentially at the reference scale but, on the log scale, three distinct clusters appear. The first corresponds to devices with only one hand-off and is centered at zero. These likely correspond to devices that may not be used often or are used by individuals at a stationary location. The second is the smallest cluster and centered around 2.5 (12 meters). These are likely cell devices that are used in homes or work within a defined area or may correspond to people with devices staying at home. The third likely corresponds to most individuals carrying cell phones and the diameter ranges from 6 (403 meters) to 12 (16.3 kilometers).

Description of Pollution Model
In order to determine hourly O 3 concentration exposure of users, we fitted a stationary Gaussian spatio-temporal model [Cressie and Wikle, 2011] to observed hourly O 3 concentrations from July 6 to August 5, 2016 for the state of Connecticut, and used it to predict O 3 concentrations at the 9,514 AT&T cell tower sites during the week of July 18-24, 2016.

Data
We obtained data on observed hourly O 3 concentrations (in parts per billion -ppb) from the US Environmental Protection Agency (EPA) Air Quality System (AQS) Data Mart [US Environmental Protection Agency, 2017] from July 6 to August 5, 2016 for the state of Connecticut. Figure 1 shows the location of the 12 O 3 monitoring sites in CT; we randomly selected 10 sites for model fitting and two sites for model validation. At each site, O 3 concentration data were available for 744 time points. The mean hourly O 3 concentration over the week across all 10 sites was 35.9 ppb with a standard deviation of 19.4 ppb.
In addition to data on the observed concentrations of O 3 , we collected data on traffic and meteorological factors that could help explain the spatio-temporal variation observed in O 3 concentrations. Hourly temperature (in degrees Celsius) and wind speed (in meters per second) data were obtained from the National Oceanic and Atmospheric Administration (Quality Controlled Local Climatological Data) [US Department of Commerce, National Oceanic and Atmospheric Administration, National Environmental Satellite, Data, and Information Service, National Climatic Data Center, 2017]. Data were available from 11 weather stations in Connecticut (see Figure 1) For each O 3 monitoring site, we assigned hourly temperature and wind speed based on the measurements recorded at the closest weather station. For hours with missing data, the last available hour for which data were available was used. Using data on primary and secondary roads network in CT, available from the US Census Bureau [US Census Bureau, 2017], we calculated the minimum distance to primary and secondary roads (in meters). These variables served as proxies for vehicular traffic density in a neighborhood of the O 3 monitoring sites.

Model
As described by Cressie and Wikle (2011), a hierarchical spatio-temporal Gaussian Process (GP) model can be specified in three stages: the observed data stage, the true underlying process stage, and the parameter stage. We can specify distributions for the data, process, and parameters for each stage. The R package spTimer by Bakar et al. [2015] can be used to efficiently fit GP models. Following the formulation of Bakar et al. [2015], let Z(s i , t) denote the observed O 3 concentration at location s i , i = 1, . . . , 10, and time point t, t = 1, . . . , 744, and Y (s i , t) denote the true underlying O 3 concentration at location s i at time t. Let Z t = (Z(s 1 , t), . . . , Z(s 10 , t)) and Y t = (Y (s 1 , t), . . . , Y (s 10 , t)) be the vectors of observed and true underlying O 3 concentrations, respectively, at time t. We define the nugget effect, or the pure error term, as t = ( (s 1 , t), . . . , (s 10 , t)) to be independent (across space and time) and Normally distributed N (0, σ 2 I 10 ), where σ 2 is the unknown pure error variance, and I 10 is the 10 x 10 identity matrix. Similarly, we denote the spatio-temporal random effects as η t = (η(s 1 , t), . . . , η(s 10 , t)) , assumed to be Normally distributed N 0, σ 2 η S η and independent in time (and independent of t ), where σ 2 η is the site invariant spatial variance (also called the sill) and S η is the spatial correlation matrix. We assume that the spatial correlation can be modeled by the exponential function, so that the covariance between two locations s i and s j is a function of the the Euclidean distance d ij between the sites, i.e., Cov (η(s i , t), η(s j , t)) = σ 2 η · exp −φd ij . Further, let X t be a 10 x 5 matrix of covariates (including a column of 1s for the intercept) and β = (β 0 , . . . , β 4 ) denote the 5 x 1 vector of unknown regression coefficients.
We can then specify the hierarchical GP model by: Details on fitting the model and obtaining parameter estimates using the package spTimer, including the full conditional distributions of the parameters, are given in Bakar et al. [2015]. To fit the simplest possible model that provides relatively accurate predictions, we used the default recommendations from the package spTimer for the initial values and the values of the hyper-parameters for the prior distributions. Specifically, we assigned flat Normal priors centered at 0 with large variances (10 10 ) for the regression coefficients; and flat Inverse-Gamma priors for the variance components σ 2 and σ 2 η . The value for the spatial decay parameter (φ) in the spatial covariance matrix was fixed at 3/d max , where d max is the maximum distance between the ozone monitor sites. Various alternative fixed values were tested, as well as estimating φ using a Uniform prior distribution. However, the predictive performance was best for the model that used the default fixed value of φ (0.0186). While the model is specified in the higher language R, the package spTimer performs calculations in the lower level language C for much faster computation. As per default, the Markov chain Monte Carlo was run for 4,000 further iterations after discarding the first 1,000 as burn-in. MCMC diagnostics were performed using package coda [Plummer et al., 2006]; all MCMC chains had converged during the burn-in, and auto-correlation plots displayed independence between iterations. The residual plot also did not show any departures from normality. Estimated parameters from the model are given in Table 1. As expected, higher temperature is associated with increased O 3 concentrations while increased wind speed is associated with lower O 3 concentration. Minimum distance to primary and to secondary roads both have positive slopes. All coefficients are statistically significant since none of the 95% credible intervals contain 0.

Prediction
Once the GP model has been fitted and posterior distributions for the unknown model parameters have been obtained, spatial prediction at a new location s 0 (and temporal prediction at a future time point t ) can be obtained using the posterior predictive distribution for Z(s 0 , t ). The function predict in the package spTimer provides spatio-temporal predictions (see Bakar et al. [2015] for technical details). We predicted hourly O 3 concentrations for the 31 day period from July 6 to August 5, 2016 at the two validation EPA O 3 monitoring sites, as well as at the 9,514 AT&T cell tower sites. Figure 5 shows a time-series comparison of the observed and predicted hourly O 3 concentrations at the two validation sites. At both sites, the predicted concentrations are very similar to the observed concentrations. Table 2 provides validation metrics for the prediction at these sites. The small value for the relative bias suggests a fairly good model fit. Figure 6 shows a prediction map for hourly O 3 concentrations across CT at midnight, 6 am, noon, and 6 pm on seven consecutive days from July 18-24, 2016. It shows that the distribution of O 3 concentration is somewhat homogeneous across space, but changes considerably throughout the day. There also appears to be a concentration gradient in the southwest to northeast direction, particularly during the day-time hours, with higher concentrations in the southwestern part of the state. It also identifies some hot-spots of higher concentrations, mostly around the urban centers of Hartford, New Haven, and Danbury.

Exposure Distributions
As described in section 2 a device hand-off is characterized by device ID, a date and time, and the location of the tower to which the device was handed. From sequences of hand-offs, sorted by date and time, the duration over which the device was checked into the tower was calculated. In addition, a new feature was calculated per device over the study durationthe tower that a device was checked into for the longest duration during the hours of 8:00 PM and 6:30 AM. We note that this new variable is correlated with the "home" locations for working individuals not participating in shift work. However, they are distinct in that, due to de-identification we don't know the residence or occupational characteristics of the devices' owners.
For each tower location the hourly O 3 concentration was estimated as described in section 3. The pollution estimate can be joined to the mobility data as a join over the date, time, and cell-tower locations. The individual-level exposure estimate is then calculated as the amount of pollution at the location multiplied by the duration at that location.
As per the standards set by the EPA, we calculated the average 8-hour maximum exposure for each individual for each of the seven days by calculating the hourly average ozone concentration for each 8-hour window during the day, and selecting the maximum of these hourly averages for each day.

Distribution of the Difference in the Average 8-Hour Max Exposure
To assess the distribution of O 3 exposure misclassification when mobility is not taken into account, we calculated the average 8-hour max exposure for each device per day under two scenarios: (a) using their trajectories to determine their location as they moved throughout the day (which we will refer to as the dynamic scenario); and (b) assuming that the individuals spent the entire day at their "home" location (which we will refer to as the static scenario), similar to what is typically done in epidemiologic studies (see Figure S.1 for a violin plot of these distributions). The exposure misclassification (or bias) for each device per day was then calculated as the difference between the dynamic and static average 8-hour max exposure assignment for each day. The result is shown in Figure 7. (A similar plot showing the distribution of average hourly difference between the dynamic and static scenarios based on a 24-hour cumulative exposure -instead of the average 8-hour max exposure -is given in Figure S.2, while Figure S.3 shows the distribution of hourly exposure assignment for the dynamic and static scenarios). All of the estimated differences were within 80 ppb per hour. Since the mean and medians are close to zero, this result validates many of the current studies of ozone exposure for a large cross-section of the population, i.e., the exposure misclassification due to not taking mobility into account is not too large. However, we observe that the distribution of the differences have heavy tails, and that the upper tails are longer than the lower tails. This suggests that the static scenario underestimates the true concentration more frequently than overestimating it. Additionally, it should be noted that differences are correlated for an individual device. That is, devices with large differences between the models at a given hour tend to have large differences at other hours. This implies that there are distinct subpopulations for which static scenarios are biased. This is further explored in subsequent sections.  Figure 8 shows the distribution of average hourly ozone exposure for individuals for each hour of the day for weekdays and weekends, taking mobility into account. Due to the size of the data, the plots are based on a random sample of 10,000 devices. The plots show that the minimum O 3 exposure is higher on the weekend (for every hour) as compared to the weekday, while the maximum exposure is higher on the weekdays. They also show that the range of average hourly O 3 exposure is greater on weekdays as compared to weekends. It should be noted that these results are likely primarily driven by the variation in O 3 concentrations seen on the weekday when compared to the weekend (as seen in Figure 6), as opposed to the device mobility. The plots also reveal a generally unimodal exposure distribution during the late night/early morning hours, but a bimodal exposure distribution during the afternoon and evening hours. This again probably reflects the smaller spatial variation in O 3 concentrations during the late night and early morning hours, and greater spatial variation during the afternoon and evening hours.

Locations of Poorly-Predicted Exposure when Mobility is not Taken into Account
While the bias from not taking mobility into account is close to zero for most individuals, for some it is up to 80 ppb/hour. While this bias may be negligible for a given day, cumulative exposure difference may result in very different health-outcomes than what is predicted by models not taking mobility into account. In particular, individuals with an exposure misclassification of around 80 ppb/hour likely experience this misclassification for 8-10 hours per day on multiple days, resulting in vastly different cumulative exposure misclassification.
To understand which class of individuals are under-served by existing models, device trajectories where the difference between exposure assessed using the dynamic versus static scenarios was in the highest and lowest 1%, corresponding to individuals with higher and lower exposure with respect to the static scenario, were extracted and a contour plot was created showing their "home" (night-time) locations.
The "home" locations of individuals receiving more exposure when mobility is taken into account is shown in Figure 9a. The difference in exposure assessed based on the dynamic and static scenarios is generally due to individuals' mobility during day-time commuting hours. O 3 predictions at different times of the day given in Figure 6 show that during the afternoon hours, O 3 concentrations are generally higher in the southwestern part of Connecticut (likely due to higher concentrations in New York City area), and decrease in a northeasterly direction. Comparing Figure 9 to O 3 predictions shown in Figure 6, we observe that many of the individuals identified as receiving a considerably larger exposure when taking mobility into account are likely suburban residents commuting into the associated urban areas (in a western or southern direction) where ozone concentrations are higher during the day-time hours. In particular, the cluster of individuals residing north and northeast of Hartford likely commute to Hartford during the day, while the cluster identified around  Waterbury are likely residents that commute to Danbury during the day. The "home" location of individuals receiving less exposure when mobility is taken into account is shown in Figure 9b. There are three distinct clusters: one is southeast of Hartford, one is north of New Haven, and one is north of Bridgeport. Many of these individuals are likely suburban residents commuting into the associated urban areas (in an eastern or northern direction) where ozone concentrations are lower during the day-time hours. In particular, the clusters southeast of Hartford and north of New Haven likely represent individuals that commute to Hartford and Middletown during the day, while the cluster north of Bridgeport perhaps represents individuals commuting northeast to New Haven.
These results suggest that the direction of mobility during day-time hours is an important factor determining the adequacy of the static scenario in accurately assessing individual exposure to O 3 concentration. Movement along the pollutant concentration gradient naturally results in a higher difference between actual exposure and the exposure modeled assuming static behavior.

Conclusions
This paper integrates mobility data from cell-tower hand-offs with a pollution model to accurately and precisely estimate ozone exposure during the period of July 18-24, 2016. These estimates were compared to those of a model where mobility data are not available, which is commonly seen in the literature. We show that the bias introduced by not taking mobility into account is minimal for the majority of individuals in the state of Connecticut, thereby validating many existing epidemiological studies examining the health impacts of exposure to O 3 on various outcomes. However, we also show that existing models do a poor job estimating exposure of individuals who routinely commute into and out of urban areas, particularly whose day-time movements follow the pollution concentration gradient, and, in these cases, mobility should be taken into account.
A key strength of our analysis lies in the use of mobility data on the individual level for a very large portion of the state's population, which allows us to accurately capture real mobility patterns. However, there are a few limitations to this analysis as well.
The O 3 exposure model was developed using ten O 3 monitoring sites for the entire state of Connecticut. Given the spatial distribution of these sites, it is difficult to capture the small scale spatial variability in O 3 concentrations. However, this may not be too big of a concern in this particular analysis since O 3 concentrations are known to be fairly homogeneous over short distances.
Data on mobility in this study were captured using hand-offs at cell towers, and not using actual GPS coordinates. Therefore, the mobility trajectories provide an estimate of where cellular devices are at any given point, and not their exact location. However, most cell towers are within 1000 m of each other, and therefore, cell device locations are generally within a 500 m buffer. This approach has the added advantage of protecting individuals' privacy in that their exact location is never known.
Due to practical reasons, we restricted our analysis to only AT&T users who had at least one tower check-in during the one week window of analysis and stayed within the boundaries of CT during that week. This potentially has two issues related to generalizability of our results. First, AT&T users might not represent the general US population. We are not too concerned about this issue since given the competitive telecommunications market, there does not appear to be any strong evidence suggesting that AT&T attracts a particular niche of customers. Additionally, an internal AT&T study found that their subscribers were not different from the general US population on a number of sociodemographic characteristics. Second, individuals that spend the entire week within the state of CT do not include long distance commuters. Long-distance commuters would be expected to have a much larger difference in exposure assignment between the dynamic and static scenarios, which would potentially lead to longer tails in the exposure difference distribution.
To our knowledge, this is the first study that compares exposure to a pollutant concentration based on the assumption of static behavior versus dynamic mobility. While the results of this study strengthen our confidence in the findings of epidemiologic studies looking at the adverse impact of O 3 concentration on various health outcomes, our analysis can be extended in two directions: looking at other states, or even the entire US, instead of just CT and over longer time duration; and analyzing other air pollutants, such as nitrogen oxides or small particulate matter, which are known to disperse quickly over short distances [Baldwin et al., 2015]. For such pollutants, we expect there to be a significant difference in exposure assignment between the dynamic and static scenarios. However, modeling concentrations for pollutants that vary rapidly over short distances is challenging, as they require a rather dense network of pollutant monitors, which is generally not available. While data on human mobility are available to us in near real-time, a bottleneck arises in accurate air pollutant modeling at fine spatial and temporal scales. Additionally, extending the analysis to other states or the entire US over longer time duration presents challenges of dealing with vast quantities of data for both the air pollution modeling part and for human mobility. However, with rapid advancements in methods dealing with analysis of Big Data, this may not be a big challenge in the future.

Supplementary Material
The Average 8-Hour Max Exposure Density Figure S.1 shows the distribution of the daily average 8-hour max O 3 exposure for individuals for the week of July 18-24, 2016 for both the dynamic and static scenarios. (The distribution of the difference between these two models is given in Figure 7). The distributions for each day generally look similar; however, the static scenario appears to have heavier lower tails as compared to the dynamic scenario, particularly on the weekdays. The Hourly Exposure Difference Between Dynamic and Static Scenarios Figure S.2 shows histograms (by day) of the differences in the hourly average exposure to O 3 assigned to individuals between the dynamic and static scenarios based on 24-hr cumulative exposure. The visualization shows that the difference for each day is centered around zero and most of the absolute differences are less than 5 ppb/hour indicating that current models, not taking mobility into account, are fairly accurate.   It is interesting to note that on weekdays, the lower tail remains close to 0 ppb as the day progresses, while on weekends, the lower tail quickly moves away from 0 ppb. The distribution for each hour is more spread out on weekdays as compared to weekends. It can also be seen that many of the individual hour exposures are bimodal, especially during the hours of 1 PM and 10 PM. Since the diffusion of ozone is relatively dispersed, and many of the mobility diameters are relatively small, individuals in higher exposure areas likely stay in those areas (and vice versa), which would result in a large amount of variation of cumulative exposure by day.  Figure S.4). The weekday distribution is more dispersed when compared to that of the weekend. This likely is a result of increased mobility, due to commuting.