Measuring and analyzing spatial, multimodal biosensor data may effectively model how the built environment influences neurophysiological processes. This article presents the Multimer Data Collection and Analysis System (MDCAS), which records data from several kinds of commonly available, wearable sensors (wearables) including electroencephalogram (EEG), electrocardiogram (ECG), pedometer, accelerometer, and gyroscope modules. Data from these wearables is sent to a custom smartphone application, which also records surveys and associates these with global positioning system (GPS) readings. MDCAS then collects and analyzes data from its smartphone app. MDCAS aims to help space professionals like architects, workplace strategists, and urban planners make better design interventions. As a case study of the MDCAS, this article discusses the analysis results of biometric data (EEG, ECG in addition to survey reports) collected from a 2017 study focused on pedestrians, cyclists, and drivers (N = 101) in New York City. Signal and spatial validation of the data indicated usability—data that is not randomly distributed—for biometric data types. Exploratory regressions of the biometric data (regressors) with exogenous data (predictors including environmental and municipal data sets) revealed spatiotemporal relationships that warrant further investigation. Notable relationships include 1) EEG beta and gamma frequencies were more strongly predicted by street features like service capacity (e.g. delivery levels) and speed limit, while EEG delta and theta frequencies were more strongly predicted by amenities like cultural institutions and trees; 2) pedestrians and cyclists were more impacted by street features during weekdays, and 3) a non-oppositional relationship between EEG beta/gamma and delta/theta frequencies.
Many people are affected by the spaces they traverse: areas in which they live, work, shop, and move (Mouratidis 2018, Sarmiento et al. 2010, Renalds et al. 2010). The use of artificial intelligence (AI) based on human-oriented and biometric data collection may be used to quantitatively model how communities are affected by their spaces. Ever since the concept of AI was established in the 1950s (McCarthy et al. 1955; Turing 1950), the development of AI has progressed on two major fronts: one approach uses a statistical and information-theoretic foundation for machine learning (Bostrom 2014), and the second approach focuses on practical problem-solving and optimizations using methods such as genetic algorithms (Kurzweil 2012). Several recent AI projects have been designed to take a human-centered approach that involves empirical science, involving hypothesis as well as experimental confirmation; this includes projects involving intelligent agents, which are entities that can perceive their environment through sensors and act upon the environment through effectors (Russell and Norvig 2016). With this type of approach, biometric data may provide affordances to connect human experience with environmental and community well-being.
Few studies combine AI techniques and sensor-based biometric data to study the relationship between communities and their built environment. Jones (2012) uses neighborhood-level stressors, such as socioeconomic stability, to examine correlations with residents’ stress-related health outcomes. Jones’ 2012 study, which evaluated specific types of impact the environment has on people’s health, focused on chronic diseases such as mental illness, obesity and elevated blood pressure. A follow-up study (Jones and Pebley 2014) examined the notion of segregation through analysis of neighborhood activities. While these studies used AI techniques, such as machine learning and algorithmic processing, to develop models with significant statistical power, they were primarily based on existing surveys that lack temporal resolution. The work of determining whether neighborhood level stressors are connected to individual health outcomes could be expanded beyond the use of pre-existing survey data.
Other studies have also used biometrics such as portable EEG devices to measure how participants’ brains respond to their environment. Karandinou and Turner (2017) employed EEG devices to research the impact of physical environments on the human brain, and they concluded that beta brain wave frequencies increased when individuals encountered other people and/or were making decisions. While the Karandinou and Turner study included quantitative data with high temporal resolution, it did not include statistically significant quantitative analysis.
Aspinall et al. (2015) conducted an outdoor study with EEG devices and researched the effect of urban green spaces as a mood-enhancing environment. Similar to Karandinou and Turner’s 2017 study, Aspinall involved small participant sample sizes, EEG data collection during participants’ walks, and preliminary quantitative analysis. Other studies (Burton et al. 2011; Tilley et al. 2017; Cerin et al. 2017) in this field also lacked the large sample sizes, algorithmic processing, and application of machine learning techniques that could yield statistically significant results. In summary, few studies have employed the kind of spatial scale, sample sizes, temporal frequency, algorithmic processing, and machine learning that would be required to leverage AI for community well-being.
Terminology and Study Overview
This article presents the Multimer Data Collection and Analysis System (MDCAS), a series of digital tools to collect, process and analyze biometric data, particularly in public urban space. The MDCAS research team developed a smartphone application to facilitate the collection of EEG and ECG data and surveys submitted by study participants. MDCAS also includes a web platform for examining data analytical products, a web dashboard for viewing real-time results, and various internal tools and processes for analysis.
MDCAS aims to provide a data-driven understanding of how community members cognitively and physically experience their built environment. By measuring multimodal biosensor data, MDCAS uses large-scale, quantitative data collection and machine learning to demonstrate how built environments influence cognitive processes (Fig. 1). Using methods not previously deployed outside of a lab or clinical setting, MDCAS can collect large-scale biometric datasets and construct spatiotemporal models for use in machine learning applications.
MDCAS’s smartphone application can record data from several kinds of commonly available, commercial-grade wearable sensors, including EEG, ECG, pedometer, accelerometer, and gyroscope modules. The smartphone application combines surveys with geolocation data using GPS and beacons. MDCAS’s web dashboard displays collected data in real-time and provides an aggregated overview (Fig. 2). The aim of MDCAS is to iteratively prototype a replicable, scalable model of how the built environment and the movement of traffic influence the neurophysiological state of community members moving through their built environment. The MDCAS model is meant to supplement and support traditional tools, such as surveys and interviews (Jones 2012), for studying the community well-being in the built environment. The algorithms developed as part of this prototype are meant to be replicated, scaled, tested, and refined in other communities. The MDCAS falls within the research area of AI (artificial intelligence) interventions to improve community well-being (Musikanski et al. 2020).
The AI deployed as part of the MDCAS validation and modeling process falls into the category of machine learning, a subset of narrow AI as defined by Musikanski et al. (2020). In this definition, narrow AI, otherwise known as weak AI, refers to algorithms designed to solve a specific problem and to make decisions in a limited context. MDCAS includes three concepts laid out by Musikanski et al. (2020): that of big data AI, in that large volumes of cross-media AI can be collected (Fig. 3); that of crowd-sourced AI, in that large communities can participate in the collection and annotation of the data; and that of human-machine hybrid augmented intelligence, in that wearable devices are used to collect and analyze data that can enhance built environments, sustainable landscapes, and daily lives.
Methodology for Modeling and Validation
MDCAS was employed for a study in which 101 pedestrians, cyclists, and drivers (the participants) recorded biosensor data and surveys as they moved through the study area over the course of twelve weeks, between 01 August 2017 and 31 October 2017. Biometric data was collected with MDCAS in the study area of Manhattan south of Central Park, New York City, USA (the study area), which is a highly gridded, high-traffic area (Fig. 4). As part of the study, EEG data were recorded by all participants as they moved through the study area (Fig. 5), and ECG heart rate data were recorded consistently by a small subset (N = 12).
During the study, the research team developed a basic system for data collection and analysis. Additional components of the MDCAS are detailed by Ducao et al. (2018), as is the specification of data variants, which is summarized in Fig. 3. For ease of use outside of the lab, EEG data was recorded by a single EEG electrode at position FP1 (prefrontal 1), which is closest to the medial prefrontal cortex, an area of the brain in relation to higher-level functions like social cognition and action planning (Amodio and Frith 2006; Luu et al. 2000).
After biosensor, GPS, and survey data are collected by the smartphone application, these data are validated for use in spatial research. In this case, validation is defined as the process of verifying that the data is not randomly distributed across a specified frequency band or geographic area. For each collected data type, validation consists of two stages: signal validation and spatial validation. Median values were used when analyzing all biometric data, in order to counteract the effect of robust outliers common in biometric data (Leys et al. 2013). If the results of the signal and spatial validation indicate a strong signal-to-noise result, the collected data was used as part of a spatial model that incorporates external data sets. Figure 6 shows the flow of data from the participant end (wearables and smartphone app) to researchers/analysts spatial validation, neural validation, and spatial modeling.
During the signal validation process, signal processing and analysis tools were used to compare a continuous, periodic biosensor signal (dependent variable) to other relevant (explanatory) variables, including sentiment indicators (e.g. surveys) and post-processed signals. The post-processed signals included the EEG headset manufacturer’s proprietary composite algorithms for attention and meditation (NeuroSky 2015), as well as various ECG chest band manufacturers’ proprietary algorithms for heart rate, which are derived from commonly used heart rate variability algorithms (Berger et al. 1986). Comparing pre-processed signals with post-processed variables indicated an acceptable signal-to-noise threshold, so this study focused on the use of EEG signals to validate other biometric data. For the raw EEG data type, the data was transformed using common band power frequencies listed in Table 1, which are commonly used in clinical practice (Tatum 2014). The step-by-step signal validation methodology consists of preparing data, determining signal thresholds, temporally subdividing data (epoching) as necessary, and creating temporal regressions or histograms. These steps were established by Ducao et al. (2018).
During the spatial validation stage, local and global spatial autocorrelation (Ord and Getis 1995) were used to test the spatial distribution of the collected data, which include biosensor signals and sentiment indicators. Local and global spatial autocorrelation analyze the randomness of the spatial distribution of a given dataset. The greater the likelihood that the dataset is not distributed randomly, the greater the likelihood that the dataset is useful for spatial modeling and regression as discussed by Ord and Getis (1995). The spatial validation methodology involves subdividing the study area into manageable partitions, processing spatial weights matrices to normalize distribution for each data type, using the spatial weights calculation to process spatial autocorrelation and lag, processing Local Indicators of Spatial Association (LISA) diagrams (Anselin 1995), and evaluating the results. The details of the signal validation process are detailed by Ducao et al. (2018).
Spatial Modeling (Regression)
Both signal and spatial validation indicate whether a given biosensor dataset is a useful component for training an AI model that includes exogenous data. In this study, examples of exogenous data include motor vehicle collisions, citizen service requests, bike routes, street types, speed limit, street density, subway entrances, and tree counts. Exogenous data for this study was acquired from the City of New York (NYC n.d.) and OpenStreetMap (OpenStreetMap contributors 2015). The 30 exogenous (environmental) data types used for spatial regression are listed in Online Resource 1. The process of spatial modeling followed these steps:
Compiling and cleaning endogenous data collected by the MDCAS smartphone app.
Compiling and cleaning exogenous data, for the same date range as the study (August–October 2017), via external API (application programming interface) calls (Boeing 2017).
Creating a 150-m grid that bounds the study area.
Using the 150-m grid to subdivide both exogenous and endogenous data sets.
Serializing (Python object serialization n.d.) all data sets to facilitate their use during and between interactive analysis sessions.
Spatially joining exogenous data, endogenous data, and the 150-m grid into one dataset.
Cleaning up artifacts such as non-data columns derived from the spatial join.
Limiting the joined data to the confines of the study area.
Running an Ordinary Least Squares (OLS) regression on the joined data. For each OLS regression, one endogenous data type (e.g. EEG low beta) is chosen as the dependent variable, and a shortened list of approximately ten exogenous data types are chosen as independent variables.
Evaluating the results, iterating the regression, and removing independent variables from the model as needed.
Head et al. (2015) has raised the issue that exploratory regression analyses can lead to controversial practices such as p-hacking (Dahlberg 2018). Even ESRI, the leading developer of commercial GIS (geographic information systems) analysis software, cautions its customers about the pitfalls of using ESRI’s own Exploratory Regression tool (ArcGIS n.d.). In starting with a limited set of independent variables based on previous preliminary studies, and by using the same sensors and data collections, this analysis follows ESRI’s cautionary advice (ArcGIS n.d.) to “always select candidate explanatory regression variables that are supported by theory, guidance from experts, and common sense” (paragraph 10). To attain the study’s goal of a statistical power greater than or equal to 75%, iterative analysis was used to narrow the number of independent variables for each model (see Online Resource 1).
Results of Signal and Spatial Validation
Summary of Signal Validation Results
Signal validation was conducted for a subset of sessions (about ten hours per subject) from a subgroup of participants (N = 13: three pedestrians, two drivers, four bike couriers, and four bike commuters). Participants in this subgroup were selected based on two criteria: whether they recorded at least ten hours of data, and whether they participated in a sub-study conducted to collect heart rate data. Signal processing and analysis tools were used to compare a continuous, periodic biosensor signal (dependent variable) to other relevant (explanatory) variables, processes established by Ducao et al. (2018). The results of the signal validation process show that the band powers associated with repose and relaxation (delta, theta, alpha) tended to predict meditation values (Fig. 7), while band powers associated with high concentration and/or physical activity (beta, gamma, alpha) tended to predict attention values (Fig. 8). Because the terms attention and meditation were widely found to cause confusion during interviews conducted by the research team with study participants and community members (Ducao et al. 2020), for this analysis the more specific term beta/gamma is presented instead of attention, and delta/theta is presented instead of meditation. Alpha band powers are associated with both the attention and meditation metrics, and have been associated with attention, relaxation, and memory formation (Palva and Palva 2007).
The signal validation process also demonstrated an association of high heart rate with high beta and gamma band powers (Fig. 9), and a high percentage of overlap between most pre-processed and post-processed band powers (Fig. 10). Signal-to-noise validation indicated usability for most of the collected biometric types.
Summary of Spatial Validation Results
In order to verify non-random distributions of biometric data in geographic space, the spatial validation process first involved processing spatial weights matrices, which normalizes the dataset to a per capita activity level for each spatial unit (e.g. city block). Spatial autocorrelation and LISA analyses were then calculated for each biometric dataset (Figs. 11 and 12). This process was conducted on a subset of the data collected by all participants (N = 70 contributors of valid biometric data, a subset of 101 trained participants). Non-random distribution was demonstrated for all biometric types (ECG heart rate; EEG delta, theta, alpha, beta, and gamma), with the most distinctive distributions visible in street-based (as opposed to grid-cell based) LISA maps for delta/theta and beta/gamma. In Figs. 11 and 12, bright red (with horizontal line pattern) indicates high-high hotspots (Anselin 1995) where biometric values are high and clustered, while bright blue (with vertical line pattern) indicates low-low coldspots (Anselin 1995) where biometric values are low and diffused. In Fig. 11, many hotspots occurred in the eastern and southern halves of the study area; in Fig. 12, many hotspots occurred on cramped streets and entrances to bridges. As seen in Figs. 11 and 12, the spatial validation process for this study indicated non-random geographic distribution and thus usability for all the collected biometric types.
Spatial Modeling (Regression) Results
Ordinary least squares (linear) regressions were performed on beta/gamma and delta/theta using test subsets consisting of four consecutive weeks of data. The test subsets were compared with validation data sets consisting of four or more consecutive weeks that were mutually exclusive from the test data set. Validations were run three times for beta/gamma and three times for delta/theta.
For beta/gamma, the adjusted R-squared value ranged from 13% to 17% (see Table 1 of Online Resource 1 for numeric results). Of the exogenous independent variables demonstrating significant p-values, beta/gamma was significantly impacted by nearby occurrences of non-fatal vehicular collisions, somewhat impacted by trees and the presence of services (e.g. delivery vehicle traffic), and somewhat negatively impacted by low speed levels.
For delta/theta, adjusted R-squared values ranged from 13% to 23% (see Table 1 of Online Resource 1 for numeric results). As with beta/gamma, delta/theta was somewhat impacted by tree count and significantly impacted by non-fatal vehicular collisions, at an even higher level than with beta/gamma. Unlike beta/gamma, delta/theta was somewhat impacted by street density as well as the presence of libraries and cultural institutions in the area. Delta/theta was also nominally impacted by building height.
Two kinds of temporal periodicity were examined in the beta/gamma and delta/theta data (Fig. 13). First, preliminary regression models were produced to examine weekdays versus weekend days. Within the segmentation of weekends and weekdays, activity types as reported by participants were also examined: walking/running, cycling, and driving. In the smartphone app, 54 participants reported at least one walk/run session, 57 participants reported at least one cycling session, and 17 participants reported at least one driving session. There were 1188 walk/run sessions, 3316 cycling sessions, and 107 driving sessions. Figure 13, which visualizes the predictors that have coefficients of the largest magnitude, indicates the following points:
Pedestrians were less impacted by environmental variables during weekdays. Non-fatal vehicular collisions played a strong role across the days of the week, and across beta/gamma and delta/theta variables. Pedestrian beta/gamma and delta/theta was slightly impacted by the presence of libraries and cultural institutions during weekdays, and more strongly impacted by areas of extreme speed and high street density on the weekends. For numeric results, see Table 2 of Online Resource 1.
Cyclists were also less impacted by environmental variables during weekdays. As with pedestrians, non-fatal vehicular collisions played a strong role across the days of the week and across beta/gamma and delta/theta variables. Cyclists were somewhat impacted by low service areas during the week, and more strongly impacted by low and high service areas during the weekend. The presence of libraries and cultural institutions more strongly impacted cyclists’ delta/theta. For numeric results, see Table 3 of Online Resource 1.
Drivers were impacted by a similar set of environmental variables throughout the entire week, including non-fatal vehicular collisions, wide lanes, and high-speed areas. In addition, drivers were more impacted by the presence of government buildings on weekdays. On weekends, bike routes had more impact on drivers. For numeric results, see Table 4 of Online Resource 1.
Preliminary regression models were also produced to examine times of day in the local time zone EDT (Eastern Daylight Time), divided into periods of morning (06:01–11:00 EDT, including rush hour), midday (11:01–15:00 EDT), evening (15:01–20:00 EDT, including rush hour), and night (20:01–06:00 EDT). Statistically significant predictors for EEG delta/theta and beta/gamma of all participants (Figs. 14 and 15), which corresponds to Table 5 in Online Resource 1, indicate the strong but fluctuating influence of accidents (non-fatal and non-injurious) on beta/gamma and delta/theta, non-oppositional relationships between beta/gamma and delta/theta, and the larger influence of exogenous factors in the morning.
The results summarized in Figs. 14 and 15 show that participants were impacted by similar environmental variables at similar times of day. For instance, core infrastructure impacts the beta/gamma and delta/theta of all participants in the morning, while the presence of libraries and cultural institutions slightly impacts the beta/gamma and delta/theta of all participants at night. Areas of high and low service, as well as areas of extreme (high and low) speed, impacts beta/gamma and delta/theta throughout the day.
For the band powers with significant validation results (Table 6 of Online Resource 1), high speed and the presence of service areas have a moderate impact, with high-speed coefficients ranging from 1.84 to 3.02 across band powers, and area of service (e.g. delivery vehicle route) coefficients ranging from 0.86 to 2.68 across band powers. Street density also has a moderate impact on low alpha, low and high beta, and mid gamma bands. As is the case with beta/gamma and delta/theta, similar environmental variables emerge across all band powers. For the exact numeric results and further discussion, see Table 6 of Online Resource 1.
The research team conducted experimental analysis with Valence Aware Dictionary and Sentiment Reasoner (VADER) (Hutto n.d.), a textual sentiment analysis module that is bundled with Python’s Natural Language Toolkit (Bird et al. 2009). Using VADER, spatial regression models were produced for comments with discernible positive, negative, and neutral affect. The model results are available in Online Resource 1, Table 7.
The goal of spatial regression in this study was to quantitatively describe how the built environment affects the cognition of its community members. Though it was more complex, street-based analysis predicted more true-to-reality results than grid-based analysis. Regression models were developed for EEG and heart rate data types, but models were not developed for GPS speed data due to inconsistent speed measurements yielded by GPS units (Ducao et al. 2018). Following guidance from ESRI’s documentation on exploratory regression (ArcGIS n.d.), the research team focused on the following regression outputs: p-values (probability) for each exogenous independent variable, and adjusted R-squared (effect size) values for each regression model. When analyzing social science or behavioral data, Ferguson (2016) recommends an adjusted R-squared of 4% as a minimum effect size, and an R-squared of 25% or greater as a moderate effect size.
Delta/Theta and Beta/Gamma
Overall, beta/gamma was strongly predicted by street features and delta/theta was strongly predicted by amenities. Pedestrians and cyclists were less impacted by environmental variables during weekdays, and drivers were impacted by a similar set of environmental variables throughout the week. Time-of-day analysis indicated strong but fluctuating influence of accidents (non-fatal and non-injurious) on beta/gamma and delta/theta at all times of day. Adjusted R-squared values of the pre-processed band power regressions were lower than that of the beta/gamma and delta/theta regressions, possibly indicating that compositing band powers into beta/gamma and delta/theta aggregates may be an important step in developing a biometric spatial model.
In almost all of the regressions on beta/gamma and delta/theta, regressors were more strongly impacted by non-fatal vehicular collisions than by any other environmental variable. This may be due to the disruptive but slowing effect of accidents on vehicular traffic. There were also similarities in how periodicity––time of day and day of week––change the environmental variables that affect beta/gamma and delta/theta together.
Regarding the relationship of the beta/gamma and delta/theta regressors, the model indicated a complex, independent relationship; in other words, they should not be considered as two ends of the same scale. Mode-of-transport (Fig. 13) and time-of-day (Figs. 14 and 15) analyses showed that beta/gamma and delta/theta were impacted by similar predictors including accidents, street capacity, speed limit, infrastructure, and government and cultural facilities. However, the level and magnitude of impact was not always oppositional. For example, non-fatal accidents were a predictor for both beta/gamma and delta/theta, but at different times of day: they were a stronger predictor of beta/gamma during the lunch and night periods, and a stronger predictor of delta/theta during the morning and evening periods. Non-oppositional relationships such as these reflected findings that brain modalities are often non-binary (Brayne et al. 2010).
High speed, presence of service areas, and street density had a moderate impact on band power data, which was normalized on a per-grid cell and per-point basis. One of the most complex processes for this analysis was normalizing the EEG band power data, which rapidly fluctuates over a wide range of approximately 17 million units. These units have no metric (NeuroSky 2015). Experiments were conducted to normalize the band power data on a per-grid cell and per-point basis. The results discussed below reflect a per-point normalization in the range of 0–1 and a per-grid-cell normalization of 0–1000.
Heart Rate and Surveys
Because data collection of heart rate was a substudy conducted with a subset of participants (N = 12) in the final 5 weeks of the 12-week study, regressions on these data do not have significant statistical power. Since consumers of this work are interested in conducting indoor studies with only heart rate monitors, the sub-study was used to prototype the process of collecting and analyzing heart rate in the context of environmental data and to share the results (Online Resource 1, Table 7).
Survey data, which consisted of survey responses and comments recorded by the smartphone app, were not collected as part of a sub-study; all participants had the option to record survey responses throughout the study. However, as with heart rate data, statistical power could not be claimed. 67 participants recorded survey data, and of those 31 participants submitted textual comments related to a walk, run, ride, or drive. Most survey data was recorded at the end of a session, e.g. a walk or ride, so they do not correspond to a specific time or place.
Limitations and Problems
Limitations in Data Collection
A large number of cyclists responded during the participant recruitment period, and several potential participants had to be turned away. It was difficult to recruit drivers, despite significant resources put toward recruitment. Later in the study, the research team learned that professional drivers (e.g. taxi drivers) are legally forbidden by New York City from wearing anything electronic on their heads. Only a few drivers contributed valid biosensor data. As a result, the data collected was biased in toward cyclists and pedestrians.
Limitations in Data Analysis
During the data collection period of this study, more than 5 billion data points were recorded with the smartphone application (Fig. 16). This included EEG brainwave data, ECG heart rate data, surveys, and GPS readings, all of which were timestamped (Fig. 4). With a data volume of this size, one of the biggest analysis issues was to transform the data into manageable chunks so that it could be analyzed with consumer-grade computer processors. To this end, the research team developed PyMul (Python Multimer), an internal Python library to support MDCAS data transmission, transformation, caching, serialization, and preparation for use with other Python neural and spatiotemporal libraries (Fig. 6). PyMul, which also works with pre-transformed data output by MDCAS, helped the research team to rapidly experiment with and analyze the collected biometric data.
It took several experiments to determine how best to subdivide the large volume of collected data and generate spatial weights based on the subdivision. Analysis proceeded with contiguity-based rather than distance-based spatial weights, in part because they can be more quickly generated for large volumes of point data (Arribas and Rey n.d.). A PyMul function called gridify was developed to subdivide and quantize datasets based on a predetermined grid (cell size: 150 m). The spatial validation and modeling processes then generate laterally adjacent weights based on the same grid. An improved understanding of the subdivision techniques that are most effective with this type of data suggests an opportunity to update the underlying subdivision, quantization, and weighting techniques. This in turn has the potential to improve the accuracy and clarity of future analysis
Recommendations and Future Research
Future Research Objectives in Data Collection
This study demonstrated that non-EEG data is technically and ergonomically easier to collect. Examples of non-EEG data include heart rate data (which was collected from 12 individuals as part of a sub-study) and smartphone application data. In future studies, the largest number of smartphone apps will be distributed to participants without biosensors, the next largest number will be distributed to participants with their own wearables, and then the smallest number will be distributed to participants who are trained to use the specialized biosensors loaned by the research team. Additional objectives include modifying the smartphone application and training protocols to encourage participants to contribute survey data more uniformly. Training protocols will encourage participants to use surveys as they pertain to their current place and time. The application may display a spatial map of the participant’s data immediately after a session is completed, so that participants may use the map as a memory aid and for additional survey creation or annotation. Currently, the application prompts a participant to contribute a survey only at the end of each session; this application design decision was made to minimize distracting, potentially unsafe notifications while the participant was in transit. In the future, particularly for indoor studies, participants may have the option to increase notifications so that they will be prompted to contribute a survey when there is a large, rapid change in their biometric values.
Smartwatch compatibility was added to the smartphone application after the study, because many organizations who want to use MDCAS for indoor studies prefer to focus on wrist-based wearables (Ducao et al. 2020). Future steps include adding more sensor modalities to the smartphone application (such as ambient light, ambient sound, speech dictation, and temperature) and implementing tutorials and privacy settings to the smartphone application.
Future Research Objectives in Data Analysis
The signal validation processes described in the methodology section were conducted on a subgroup (N = 13; N = 6 in the case of the heart rate data) of the overall dataset, so a future objective is to conduct neuro-temporal analyses on all the sessions from all participants who collected valid EEG data (N = 70). This requires automation of analytical processes.
The research team used linear regression, one of the simplest types of statistical modeling, to examine the relationship of endogenous and exogenous data. This approach was especially useful because the full dataset from all participants who collected valid EEG data (N = 70) was used for this analysis, and the data volume was difficult to manage. Much of the spatial regression conducted on beta/gamma and delta/theta data was meant to be a proof-of-concept; pre-processed and normalized data values are feasible to analyze on a single workstation, unlike the larger, higher-frequency EEG band power data or raw signal data. Future research includes exploring more complex types of modeling based on current findings. It would also involve iterative training on subsets of the data to detect temporal and categorical patterns over recurring periods and session types.
While this study indicates that street-based spatial weights show the most promise to produce accurate results, data-driven grids consisting of Voronoi/Theissen polygons (Fortune 1992) may also be used to create spatial weights. Voronoi polygons have the advantages of distance-based weights, in that clusters of neighboring points are used for weight generation; but, in generating a grid-like mesh based on these clusters, Voronoi polygons also have the advantages of contiguity-based weights, in that a grid can be quickly used to generate spatial weights. At the time of the study discussed in this article, the function for Voronoi weights was not fully implemented in the Python Spatial Analysis Library (Rey and Anselin 2010) used by the research team.
The spatial validation processes described in the methodology section demonstrate clearer results when the endogenous data is aggregated and quantized along streets rather than an arbitrary grid, so a next step is to apply a street-based analysis as part of the quantization process. However, one of the challenges in quantizing data with a street-based approach is that, for any given study area, streets are rarely uniform in length—and in the case of OpenStreetMap (OpenStreetMap contributors 2015), a street line is not always equal to a street length. Consequently, additional development work is needed to subdivide street topologies at a scale appropriate for uniform analysis. Nevertheless, generating geospatial weights is not relevant to studies of indoor or virtual space, so different approaches will be needed to resolve spatial tracking and analysis issues with indoor spaces as well.
In all of the spatial regressions for this study, relatively high Jarque-Bera values (> = 100) indicate that non-linear regression models should be explored, including regime-based (Arribas-Bel n.d.) and segmented (Muggeo 2003) models. Quantized endogenous data will be explored further through map-based analysis packages, such as GeoDa (n.d.), in order to quickly generate visualizations that may suggest improved regression models.
Based on regression results from beta/gamma and delta/theta, data subdivided by time-of-day, day -of-week, and transport type, there is promise in analyzing the biometric data over temporal cycles and demographic categories. As the research team refines its regression modeling and explores spatial econometrics (Anselin 2013), it may use macroeconomics techniques to examine how periodicity and seasonality impact the relationship between the environment and cognition. The research team is interested in exploring asynchronicity in exogenous and endogenous data as part of this research.
Beyond the Study: Expanding on Community Well-Being and AI
Since this study’s conclusion, the same techniques have been applied to several pilot studies in collaboration with the United Nations Human Settlements Programme (UN-Habitat). These studies include a 2018 cycling study of Kuala Lumpur’s first separated bike lane and a 2019 cycling study of Nairobi’s proposed Bus Rapid Transit routes (Ducao 2019). The collaboration continued with a 2019–2020 walkability study of schoolchildren’s routes to a primary school in Kibera, Nairobi’s largest slum. This work demonstrates the value of the research team’s approach to community well-being and AI. It has also exposed the research team to cross-sector, interdisciplinary, and transdisciplinary settings in which mobility specialists, geographers, citizen journalists, government workers, urban planners, parents, children, cyclists, and many other kinds of stakeholders contribute to community well-being efforts. These stakeholders have played a role in each study’s structure, execution, and data analysis, through workshops and town hall meetings organized by the research team, UN-Habitat, and their partners in local advocacy.
Issues of privacy and security were raised by UN-Habitat and other stakeholders. In some cases, any personally identifiable data would be overly invasive. In other cases, geolocation or indoor location data is considered too sensitive. The devised solution, which was approved by UN-Habitat, consists of adjustments to both data collection and data analysis methods laid out in this project. In an updated, privacy-first approach, the smartphone application aggregates, quantizes, and categorizes biometric and location data before transmitting anything to a storage server. This adds a pseudo-anonymizing step that protects individual and group privacy in a way that is adjustable to the constraints of the study.
The collaboration with UN-Habitat has helped to expand MDCAS beyond the development of AI techniques towards an action research approach in which a local society-in-the-loop indicator is used to assess the impact of AI on community well-being (Musikanski et al. 2020). United Nations Sustainable Development Goals (UN SDG n.d.) that apply to MDCAS include SDG 3: Health and Awareness, and SDG 11: Sustainable Cities and Communities. These indicators have guided the research team to consider how its work fits into a larger network of SDG-oriented projects. This is particularly important when working with those in developing economies and when working with vulnerable populations discussed in the seminal Belmont Report (National Commission 1978). The research team and its collaborators at UN-Habitat believe that threats to community well-being from AI can only be averted when community members, especially those from vulnerable groups, act as partners in the AI projects that impact their communities.
The objective of this study was to spatiotemporally analyze participants’ neurophysiological state as a way to quantitatively understand how they encounter spatial environments. Several analytical findings from this study may be used for the research area of AI interventions to improve community well-being in the outdoor built environment:
Statistical significance was demonstrated for several environmental variables (street features, amenities, and accidents) as predictors of beta/gamma and delta/theta. Overall, beta/gamma was more strongly predicted by street features like service capacity (e.g. delivery levels) and speed limit, while delta/theta was more strongly predicted by amenities like cultural institutions and trees.
Within transport categories, pedestrians and cyclists were less impacted by environmental variables during weekdays, and drivers were impacted by a similar set of environmental variables throughout the entire week, including non-fatal vehicular collisions, wide lanes, and high-speed areas. On the weekends, pedestrians were impacted in areas of extreme speed and high street density, cyclists were impacted by service areas and cultural amenities during the weekend, and drivers were impacted by bike routes.
Analysis segmented by the parts of the day indicated a strong but fluctuating influence of accidents (non-fatal and non-injurious) on beta/gamma and delta/theta. All predictors were more influential in the morning. Beta/gamma and delta/theta regressors were shown to be non-oppositional, demonstrating an independent, complex relationship.
These findings, as part of a process that begins with signal and spatial validation, indicate how data from wearable technology can be used to inform urban design and planning that benefits the community. Typically, researchers and designers collect data about people in spaces through observations, interviews, fieldwork, focus groups, journals, and surveys. MDCAS is designed to support spatial designers and community stakeholders with continuous human biosensor data, recorded in high spatiotemporal resolution, and calibrated with periodic survey and sentiment information. By contributing insights from new kinds of mobile, sensor, wearable, and information technologies, this work has the potential to expand the role of spatial design in nurturing the community experience.
Available upon request. Raw data requires permission from participants who contributed data.
Amodio, D. M., & Frith, C. D. (2006). Meeting of minds: The medial frontal cortex and social cognition. Nature Reviews Neuroscience, 7(4), 268–277. https://doi.org/10.1038/nrn1884.
Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.
Anselin, L. (2013). Spatial econometrics: Methods and models. Vol. 4. Springer Science & Business Media. Heidelberg, Germany: Springer.
ArcGIS. (n.d.). How exploratory regression works. https://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/how-exploratory-regression-works.htm. Accessed Nov 2018.
Arribas, D., & Rey, S. (n.d.). Geographic Data Science with PySAL and the pydata stack. http://darribas.org/gds_scipy16/gds_scipy16.pdf. Accessed Feb 2018.
Arribas-Bel, D. (n.d.). Spatial Regression, https://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html. Accessed Feb 2019.
Aspinall, P., Mavros, P., Coyne, R., & Roe, J. (2015). The urban brain: Analysing outdoor physical activity with mobile EEG. British Journal of Sports Medicine, 49(4), 272–276. https://doi.org/10.1136/bjsports-2012-091877.
Berger, R. D., Akselrod, S., Gordon, D., & Cohen, R. J. (1986). An efficient algorithm for spectral analysis of heart rate variability. IEEE Transactions on Biomedical Engineering, (9), 900–904. 10.1109/TBME.1986.325789
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python: Analyzing text with the natural language toolkit. s.
Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems, 65, 126–139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004.
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.
Brayne, C., Ince, P. G., Keage, H. A., Mckeith, I. G., Matthews, F. E., Polvikoski, T., & Sulkava, R. (2010). Education, the brain and dementia: Neuroprotection or compensation? Brain, 133(8), 2210–2216. https://doi.org/10.1093/brain/awq185.
Burton, E. J., Mitchell, L., & Stride, C. B. (2011). Good places for ageing in place: Development of objective built environment measures for investigating links with older people’s wellbeing. BMC Public Health, 11(1), 839.
Cerin, E., Nathan, A., Cauwenberg, J. V., Barnett, D. W., & Barnett, A. (2017). The neighbourhood physical environment and active travel in older adults: A systematic review and meta- analysis. International Journal of Behavioral Nutrition and Physical Activity, 14(1), 1–23.
Dahlberg, B. (2018). Cornell food researcher's downfall raises larger questions for science. NPR. http://www.npr.org/sections/thesalt/2018/09/26/651849441/cornell-food-researchers-downfall-raises-larger-questions-for-science. Accessed Feb 2019.
Ducao, A. (2019). Neurophysiological experience of cyclists in kuala lumpur and nairobi. International Journal of Traffic Safety Innovation: Vision Zero Cities, https://www.transalt.org/sites/default/files/2019-1/VZC_Journal_2019_Full_Update-Small.pdf. Accessed Feb 2020.
Ducao A., Koen, I., & Guo, Z. (2018). Multimer: validating multimodal, cognitive data in the city: Towards a model of how the urban environment influences streetscape users. In Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data (MCPMD ‘18). Association for Computing Machinery, New York, NY, USA, Article 11, 1–8. https://doi.org/10.1145/3279810.3279853
Ducao A., Koen, I., van Bergen, T., Berry, Y., Sheu, S., & Mitchell, T. (2020). Multimer: User experience exercises from a bio-spatial study in the urban context. Proceedings of the Distributed, Ambient and Pervasive Interactions 8th International Conference, DAPI 2020, Held as Part of the 22st HCI International Conference, HCII 2020. (forthcoming).
EU GDPR portal. (n.d.). https://eugdpr.org. Accessed Oct 2019.
Ferguson, C. J. (2016). An effect size primer: A guide for clinicians and researchers. In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research (pp. 301–310). Professional Psychology: Research and Practice, American Psychological Association. https://doi.org/10.1037/14805-020
Fortune, S. (1992). Voronoi diagrams and delaunay triangulations. In Computing in euclidean geometry, 193–233. https://doi.org/10.1142/9789812831699_0007
GeoDa, an introduction to Spatial Data Analysis. (n.d.). https://geodacenter.github.io. Accessed Feb 2019.
Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13(3). https://doi.org/10.1371/journal.pbio.1002106.
Hutto, C. J. (n.d). Valence aware dictionary and sentiment reasoner, https://github.com/cjhutto/vaderSentiment. Accessed Nov 2018.
Jones, F. M. (2012). Accumulating neighborhood stress exposure: Effects on hypertension, obesity, and depression. UCLA. ProQuest ID: Jones_ucla_0031D_10293. Merritt ID: ark:/13030/m57p90cs. Retrieved from https://escholarship.org/uc/item/2n9500gf. Accessed Jan 2020.
Jones, M., & Pebley, A. R. (2014). Redefining neighborhoods using common destinations: Social characteristics of activity spaces and home census tracts compared. Demography, 51(3), 727–752. https://doi.org/10.1007/s13524-014-0283-z.
Karandinou, A., & Turner, L. (2017). Architecture and neuroscience; what can the EEG recording of brain activity reveal about a walk through everyday spaces? International Journal of Parallel, Emergent and Distributed Systems, 32(sup1), S54–S65. https://doi.org/10.1080/17445760.2017.1390089.
Kurzweil, R. (2012). How to create a mind: The secret of human thought revealed. New York: Viking Penguin.
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764–766. https://doi.org/10.1016/j.jesp.2013.03.013.
Luu, P., Flaisch, T., & Tucker, D. M.. (2000). Medial frontal cortex in action monitoring. Journal of Neuroscience, 464–469. https://doi.org/10.1523/JNEUROSCI.20-01-00464.2000
McCarthy, Minsky, Rochester, & Shannon, (1955). A proposal for the dartmouth summer research project on artificial intelligence. AI Magazine Volume 27 Number 4 (2006) (© AAAI). 31 August 1955. https://doi.org/10.1609/aimag.v27i4.1904.
Mouratidis, K. (2018). Built environment and social well-being: How does urban form affect social life and personal relationships? Cities, 74, 7–20. https://doi.org/10.1016/j.cities.2017.10.020.
Muggeo, V. M. R. (2003). Estimating regression models with unknown break-points. Statistics in Medicine, 22.19, 3055–3071. https://doi.org/10.1002/sim.1545.
Musikanski, L., Rakova, B., Bradbury, J., Phillips, R., & Manson, M. (2020). Artificial Intelligence and community well-being: A proposal for an emerging area of research. International Journal of Community Well-Being, 1–17. https://doi.org/10.1007/s42413-019-00054-6.
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, Department of Health, Education and Welfare. (1978). The Belmont Report. Washington, DC: United States Government Printing Office.
NeuroSky: Mindset Communications Protocol. (2015). http://developer.neurosky.com/docs/lib/exe/fetch.php?media=mindset_communications_protocol.pdf. Accessed Feb 2019.
NYC Open Data. (n.d.). https://opendata.cityofnewyork.us. Accessed Feb 2019.
OpenStreetMap contributors. (2015). Planet dump [Data file from 1 May 2018]. Retrieved from https://planet.openstreetmap.org.
Ord, J. K., & Getis, A. (1995). Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis, 27(4), 286–306. https://doi.org/10.1111/j.1538-4632.1995.tb00912.x.
Palva, S., & Palva, J. M. (2007). New vistas for α-frequency band oscillations. Trends in Neurosciences, 30(4), 150–158. https://doi.org/10.1016/j.tins.2007.02.001.
Python object serialization. (n.d.). https://docs.python.org/3/library/pickle.html. Accessed Feb 2019.
Renalds, A., Smith, T. H., & Hale, P. J. (2010). A systematic review of built environment and health. Family & Community Health, 33(1), 68–78. https://doi.org/10.1097/fch.0b013e3181c4e2e5.
Rey, S. J., & Anselin, L. (2010). PySAL: A python library of spatial analytical methods. In Handbook of applied spatial analysis (pp. 175–193). Heidelberg: Springer.
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern approach. London: Pearson.
Sarmiento, O. L., Schmid, T. L., Parra, D. C., Díaz-Del-Castillo, A., Gómez, L. F., Pratt, M., et al. (2010). Quality of life, physical activity, and built environment characteristics among Colombian adults. Journal of Physical Activity and Health, 7(S2). https://doi.org/10.1123/jpah.7.s2.s181.
Tatum, W. O. (2014). Ellen R. Grass lecture: extraordinary EEG. The Neurodiagnostic Journal, 54(1), 3–21.
Tilley, S., Neale, C., Patuano, A., & Cinderby, S. (2017). Older people’s experiences of mobility and mood in an urban environment: A mixed methods approach using electroencephalography (EEG) and interviews. International Journal of Environmental Research and Public Health, 14(2), 151. https://doi.org/10.3390/ijerph14020151.
Turing, A. M. (1950). I-computing machinery and intelligence. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433.
UN SDG. United Nations: Sustainable development goals. (n.d.). https://unstats.un.org/sdgs/indicators/indicators-list. Accessed Dec 2019.
National Science Foundation, Award #1721679.
Conflicts of Interest/Competing Interests
Matter Ventures and NYU Leslie E-Lab contributed meeting space, office space, and office supplies to this study. A preliminary phase of this study, conducted in 2016, was funded in part by SOSV and BMW.
Research Involving Human Participants and/or Animals
This study involved human participants. Its protocol was approved by the Biomedical Research Alliance of New York (BRANY), Protocol “NYC-003,” BRANY file number 17–08–184-455.
All participants in this study signed informed consent forms, which were approved by the Biomedical Research Alliance of New York (BRANY), Protocol “NYC-003,” BRANY file number 17–08–184-455.
Available upon request, pending permission from code authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ducao, A., Koen, I., Guo, Z. et al. Multimer: Modeling Neurophysiological Experience in Public Urban Space. Int. Journal of Com. WB 3, 465–490 (2020). https://doi.org/10.1007/s42413-020-00082-7
- Built environment
- Collective impact
- Community well-being
- Quantitative methods
- Technology and well-being
- Urban affairs
- Machine learning
- Spatial statistics
- Quantitative analysis