Mapping Land Use Dynamics Using the Collective Power of the Crowd

Traditional land use and land cover (LULC) mapping has long relied strongly on input from Earth Observation (EO) data sources at various resolutions and scale levels. With high performance and cloud computing on the rise, rapid processing of large volumes of very high resolution (VHR) satellite imagery—big EO data—is becoming less problematic. Consequently, scientific challenges in that topical domain move on to the next level.

increased awareness and efforts to tackle the space-time resolution dichotomy of traditional space-based Earth Observation and related analytics. Highly frequent revisit times and acquisition intervals associated with such constellations guarantee the availability of data at unprecedented spatio-temporal resolution, opening up a wealth of new possibilities for dynamic mapping of environmental changes (Aubrecht et al. 2017a). Complementing space-based data, unmanned aerial vehicles (UAVs), also commonly referred to as drones, provide similar rapid on-demand high-quality data acquisition capability in the airborne domain. Benefits include below-cloud operation, thus decreased weather-dependency in that regard, as well as low cost and VHR imagery (up to 5 cm). With fast-pace technological developments in recent years drone data has left the exclusiveness of military applications and can now rather easily be created and made accessible by and to the general public (Tiwari and Dixit 2015). Drone mapping is particularly addressing the ondemand application market, with continuous long-term change monitoring certainly remaining in the satellite EO domain.
Building upon this new applied scientific and more widespread appreciation of spatio-temporal analytics, traditional "static" methods of LULC mapping seem to be in urgent need of a conceptual revisit. Both timeliness (i.e. long processing and release cycles, irregular updating) and inherent temporal granularity (i.e. land use being commonly represented as simplified static non-dynamic attributive data) are clearly restricted in traditional approaches. Technological advancements and innovations in remotely sensed EO and data acquisition can provide improved input for highly frequent analysis of changing physical features of the Earth (i.e. referring to land cover), thus well-addressing the timeliness factor. Understanding land use and its dynamic variations, however, requires additional contextual information. This is particularly relevant when analyzing urban areas that feature highly variable patterns of human activities where remote sensing is restricted to providing identification of built-up and basic physical characteristics with limited thematic context and human usage variation.
Major innovation in LULC assessment therefore now needs to happen through information integration rather than via incremental improvements at technical level. Volunteered Geographic Information (VGI), i.e. data created and shared by anybody anywhere at any time, has the potential to cover the full dynamic range of human activities spatio-temporally linked to their respective location patterns. This paper takes the discussion on advancing LULC mapping one step further from the traditional schematic remote sensing-based approach by introducing VGI as ancillary data source for integrative dynamic analytics. The presented concepts illustrate the way forward with regard to advanced spatio-temporal decision support for a variety of applications stretching across the traditional LULC domain including improved population dynamics modeling, fine-scale human-adjusted "smart" urban development, and real-time disaster risk reduction.

The Collective Power of the Crowd
LULC mapping is one of the oldest and best established fields in the wider scope of EO and remote sensing research. It draws upon several decades of scientific studies and developments, with the "modern" era of land cover mapping starting in the early 1970s, in line with the rise of civil space-based remote sensing (Loveland 2012). The basic understanding of what is considered adequate output of a LULC model has not changed much ever since however. It is still mainly referring to characterizing use and/or land cover classes at one defined point in time, i.e. a static snapshot and abstract simplification of reality.
In particular in urban settings, changes of physical features commonly occur at a faster pace than the usual map update cycles. Advanced remote sensing and processing technology now has the power to catch up with this speed and provide near-real time coverage of the physical environment. Even more dynamic, however, are human activity patterns that form the backbone for a progressive understanding of space and place, closer to reality, which is essential in making customized decisions in the increasingly relevant smart city context. The emerging research domain of integrating big data, urban informatics, and associated geo-analytics (Thakuriah et al. 2017) aims at addressing these issues and fostering innovation through joint use of human and technical sensors (in situ and remote) to reveal previously un-identifiable information in near real time.
VGI has made a substantial impact on how data is understood, addressed, and analyzed in the geospatial domain since it was first conceptually highlighted about a decade ago (Goodchild et al. 2017). It is no longer an exotic prospect but has become a widely recognized additional stream of geospatial information opening up new research fields and application areas. While the scientific use of VGI and crowd sourcing is certainly still increasing, it is far from being mainstream in most traditional geographic application domains. Focus is mostly put on individual case studies and often VGI is seen as an add-on to potentially enhance established processes. Conceptual innovation and disruptive thinking is still rare and limited.
Under the umbrella of the EU FRESHER project we evaluated how volunteered geo-dynamic information (labeled VGDI) can contribute to exposure assessment for health risk analyses. The dynamic component was thereby particularly highlighted as innovative input factor for spatio-temporal population modeling as well as time-dependent urban zone characterization (Aubrecht and Steinnocher 2016). The Greater Lisbon area (Portugal) served as one of the test study areas. We used data from the location-based social network platform Foursquare/Swarm to illustrate the collective power of the crowd inherent in VGDI. In this article, we now focus on underlying aspects of data access and integration and conclude with an outlook to where innovation is driving future LULC mapping concepts.

VGDI Access and Integration: The Foursquare Use Case
There are various ways of categorizing VGI. One approach is to assess whether the data is implicitly or explicitly collective (Aubrecht et al. 2017b). Implicitly collective in that regard would signify that contributing users provide certain pieces of information for personal purposes (such as connecting with friends) but without the main intention of creating a "bigger picture". In contrast to the motivation of its individual users, the underlying data infrastructure and curation practice are precisely designed for that purpose, i.e. for understanding interrelations and deriving insight from collective patterns. Implicitly collective information is therefore usually considered the most valuable asset of service providers like Google, Facebook, or Foursquare. For this reason, public access to such databases in full detail is commonly technically limited as well as legally restricted. For effective integration in LULC mapping the collective input is required, which implies the need to identify efficient ways to access and compile this type of data.
Foursquare maintains an API for app developers which allows gathering information about specific venues (location of facilities where activities take place) and-if the user grants access-user interaction. For our case study of analyzing human activity dynamics and use patterns (1) a georeferenced list of all venues in the area of interest and (2) temporal information about the number of users checked in at each venue are required. The Foursquare API does not support bulk data download for a defined geographic area. It does, however, enable caching some data in restricted form for certain applications. Using a latitude/longitude coordinate feed for the API call, venues surrounding these coordinate pairs are returned together with categorical metadata and time-specific current user counts (i.e. users checked in at that venue at the time of the request). The number of venues returned in that process is limited to a maximum of 30. Furthermore, also the number of requests per developer are restricted to 5000 per hour in order to preclude large scale scraping.
To comprehensively extract all venue data for the area of interest numerous requests need to be performed in parallel, each returning the maximum of 30 entries. When implementing evenly distributed quadratic request point grids, dense activity areas are insufficiently covered (i.e. more venues exist than can be extracted) while output in more remote areas shows largely redundant information (i.e. neighbouring request points returning identical venues). With the maximum number of requests per hour restricted to 5000 (preventing simple request grid densification) the spatial distribution of request points needs to be optimized, thus minimizing return duplicates while maximizing spatial reach. We developed a novel extraction method 1 using hexagonal request grids to optimize spatial coverage. Implementing a system similar to the R-tree method (Guttman 1984), grid density is adapted in iterative steps to account for the spatially uneven distribution of venues. Additional request points are thereby placed in dense activity areas by creating smaller hexagons around every relevant point from the prior iteration (using half distance and a 30 ı rotation). If all extracted venues (up to 30) for a certain request point were already covered by prior requests, hexagon densification around that point is stopped for the next iteration. For further optimization, low profile venues with less than five total check-ins are omitted for insignificance throughout the process.
For the Lisbon study area we extracted approximately 40,000 venues from about 4000 request points using the described method. During testing, however, the effective number of successful requests per hour proved to actually be lower, likely due to performance restrictions imposed by the API. We therefore conducted the data harvesting in two parallel processes, each using its own developer account. Outputs of the two processes were then combined for the final compilation.
To integrate the extracted human activity information into a dynamic land use model we created spatially interpolated thematic time-dependent activity surfaces (see Fig. 1 for aggregated day and night activity in a generalized "events" category including restaurants, bars, and nightlife spots). These activity surfaces can then subsequently be integrated with EO based land cover maps, thus refining human use categories beyond the visible physical parameters.

What Is Next and Where Are We Headed?
Integrating VGI-based dynamic human activity data into land use mapping constitutes a major refinement over traditional static approaches (Jiang et al. 2015). While this chapter focuses on the human activity component, near real-time land use identification would in parallel require dynamic monitoring of the physical environment. Much reduced revisit and data acquisition times of novel satellite constellations and continuous data flow allow quick access while cloud processing environments such as the Amazon Web Services, the Google Cloud, or Microsoft Azure enable on-the-fly identification of rapid changes of the environment.
A major constraint of location-based social network (LBSN) data and VGI in general is their varying representativeness and partly low and uneven penetration rate across different sectors of society and regions. In Foursquare, while the completeness of the venue database is not considered to be affected, this is particularly relevant for venue-specific temporal considerations. Spatio-temporal integration of per se unrelated social media feeds like Facebook, Twitter and Instagram has potential to increase the reach in terms of social profiling and reduce bias effects. Service providers, however, are very hesitant in allowing such operations. Foursquare, for example, explicitly prohibits "combin [ing] or aggregate[ing] Foursquare location information with location data from other sources" (Foursquare 2017).
A glimpse at the future in terms of contextual real-time location tracking is provided by the recently activated live service of Google Maps "Popular Times". Google uses aggregated and anonymized data from users who have opted in to Google Location History to determine temporal usage profiles of specific locations. The new service feature integrates live visit data thus indicating relative current popularity compared to the usual trend (Aspinall 2016).
Combining live human activity patterns not impaired by LBSN penetration constraints with real-time mapping of land cover dynamics would imply the transition from static snapshots via time-dependent models to observation of current and ongoing activity flows and eventually predictive analytics of future patterns.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.