Resolving the Problem of Density Information as an Overlying Layer
Compared to the map with point clouds in Fig. 1, maps in Figs. 5, 6, 7, and 8 all provide a density overview of points with less display clutter and therefore much higher readability of the map. Still, there is another major difference between the heatmap in Fig. 5 and the following three maps: The surfaces of the density estimation are on the heatmap noticeably overlaying the basemap and urban geometry, making it rather inconvenient to relate spatial patterns of point density to exact local infrastructure. A possible method to solve this problem is to lower the opacity of the density estimation layer.
While this allows establishing a relation between the density pattern and the underlying layers, significantly lowering the opacity also results in a lower contrast between the point density estimation surfaces and underlying layers, which again decreases the readability of the density values. The maps in Figs. 6, 7, and 8 address this problem by relating the point density information directly to the urban geometry features. These features can, as a result, then be clearly and quickly identified as they directly provide the information about the (estimated) number of points—i.e., georeferenced images with bicycles detections—related to each urban geometry object. Visualizing an overlying layer with point density information above the urban geometry layer and relating the point density information between layers is, in this case, no longer needed.
Furthermore, heatmaps still require a visual exploration, while our methods allow also other ways of handling the data because the density-related information becomes an attribute appended to the geometry features dataset. This information can then be filtered, searched, further processed, or presented in additional ways (e.g., as a tabular list of subgeometry objects ordered by density values). Additionally, storing the density-related information in such a manner allows the visualizations to more easily be recreated; which provides a better time efficiency knowing that not all of the heatmap production methods can in GIS software be stored as permanent layers. In other words, the heatmap-creating tool would usually need to be re-run to re-create visualizations, which makes it more time-consuming and requires more effort.
Resolving the Problem of Point Density Identification
The approach presented in Sect. 3.2 has a major advantage: maintaining the original geometries. Technically, the numbers of geometry objects and object edges stay the same compared to the input dataset, which can, in some cases, be limiting in the exploration of spatial patterns. Despite this advantage, the approach does not offer the possibility to separately detect important pattern-related micro-locations within objects. This is, for small or short objects, not necessarily a significant issue, but can be limiting for larger or longer objects whose parts may be differently classified according to the number of corresponding points features. As a result, no meaningful spatial patterns can be identified on those objects other than the general conclusion: the closer to the city center (or the more popular place), the higher the number of points. This can be the case even for a dataset like a road network used in this paper, which consists of road segments segmented using road intersection points. Even with segmented roads, critical to visualize are segments within the city center where there is a large fluctuation of people and, thus, multiple micro-locations exist where important points clusters (e.g., georeferenced posts) could occur. Not even normalizing the number of point detections over the polygon’s area or length results in more helpful information: It only shows how frequented the urban geometry object in general is and does not allow a precise detection of all the important micro-locations.
In contrast, the approaches presented in Sect. 3.3subdivide these geometry features, leading to a higher number of map objects and respective edges. By implementing subdivision, they generally provide a higher level of details regarding the identification of spatial patterns within geometries, which is especially meaningful for investigating large or elongated geometry objects. Using the models described in Sect. 3.3, we were able to differentiate density hotspots on all geometry objects. The major advantage of these approaches is that the point density patterns are well visible on polygonal objects like squares and pedestrian zones—an effect that can be well observed in Fig. 9. The grid-based approach appears to be useful for the subdivision and classification of roads and other linear objects, providing a comprehensive subgeometry classification. It also allows visualizations of micro-locations for areas like squares or pedestrian zones (Fig. 9b). The advantage of the approach based on contour lines is that the point density patterns are very well visible on polygonal objects. The patterns receive a smoother shape on larger areas, which may be more intuitive to interpret than the grid cells because of high similarity to the zones of a heatmap (Fig. 9c). A drawback of the contour-line-based model in comparison to the grid-based is that on shorter linear objects, it tends to provide fewer details, meaning it results in a smaller number of subgeometry units.
Influence of Input Data and Parameter Values on Visualizations
As a matter of course, the level of resulting details for each of the visualization approaches is influenced by the input data and respective parameter values. Using a dataset where the urban geometry is modeled differently than in the dataset used in this paper could have a major impact on spatial precision and granularity of visualized patterns: e.g., using road data from OSM, where driving lanes of opposite directions on wider roads are modeled separately, would be especially meaningful when there is a larger distance between road lanes, like a green area or a tram line between them. Thereby, increasing the level of granularity of the input data—i.e., having more objects and a higher cumulative object area—will lower the density values for some of the urban geometries when using the nearest neighbor relation. By contrast, an advantage of both approaches presented in Sect. 3.3 is that the local density is calculated independently from the urban geometry, and therefore the visualized density pattern is not affected when changing the input data.
For all but the nearest neighbor approach, the parameter value selection has a crucial effect on the resulting map and has to be a task-related decision. When using the grid-based approach, the grid cell size, as well as the classification on which the cells get aggregated, influence the final result. The smaller the grid cell is, the more detailed the spatial pattern within the urban geometries will be. However, if the cell size is too small, a significant number of the grid cells could not contain any points, which could lead to the point spread pattern getting underrepresented on the urban geometry features because intersections of empty cells with urban geometry units will be more common. The result of the second step in Fig. 3 would then be comparable to a heat map created using the uniform kernel function with a very small bandwidth. As described in Sect. 3.1, the bandwidth selection is crucial for KDE. For the approach that uses contour lines, the equidistance between the contour lines is the second important parameter. The larger the distance between the contour lines is, the less will the spatial pattern be visible in the resulting visualization.
Introduction of Uncertainties
In this paper, we aggregated point data to improve the usability of the map, but this generalization step also introduced uncertainty to our approaches. For the grid-based approach, we discussed the influence of the grid cell size on the result in the section before. Besides the size of the cells, their initial positioning could also have a major impact on the resulting spatial patterns displayed on the map. For example, it could be interesting to research the effects of shifting the cells for 50% of their size into different directions. Our assumption is that, in some locations, the resulting density pattern could change significantly. Another approach could also be to design an algorithm that automatically places the cells in a fashion that takes into account the constellation of the densest point clusters. Then, the cells would not be distributed within a bounding box that surrounds the dataset strictly from border to border starting from one side but would be distributed independently on a perfect fit within the bounding box in a way that best covers most populated locations.
In general, placing grid cells without taking into account the positions of dense point clusters potentially highlights density borders as much more certain than they actually are—especially because edges are an important factor in the cognitive map reading process. When intersected with urban geometry objects, these uncertainties could increase even more. To reduce this edge-related bias, possible solutions (other than the one mentioned in the paragraph before) would be to rotate the grid cells according to the major orientation of the road network in a city, or to use other shapes for grid cells, such as diamond or hexagon shapes, to see if they suit better to the urban geometry. Another solution for the edge-related bias—implementing softer border shapes—is already implemented in the approach based on the contour lines of a heat map because the kernel shapes prevent hard edges through the distance decay effect.
Influence of Scale and Data Privacy
When searching for the best parameter values, the targeted map scale and its impact on the visualization also have to be considered. First, as we use point density to divide urban geometries, the size of the input geometries must be suitable for the targeted map scale, even after being divided into several smaller units. At this point, legibility constraints from map generalization could help to decide if the input geometries are suitable (Stigmar and Harrie 2011). Second, the cell size of the grid-based approach, as well as the equidistance between the contour lines in the approach based on heatmap zones, has to be selected accordingly to the map scale and dependently on the task.
In this paper, we used an input dataset where each point represents the location of a social media post, which could possibly corrupt its contributor’s privacy due to georeferencing. Using privacy-sensitive data like social media data—and VGI in general—needs to guarantee contributors’ privacy even when the data are not directly linked to the original dataset that contains information on the contributors. In this regard, the resulting dataset of our approaches provides data privacy because the information on single points is lost by being aggregated within density surfaces that are assigned to the geometry objects. We also foresee the possibility of adapting the models of our approaches for processing privacy-aware data format proposed by Dunkel et al. (2020) as long as the spatial precision is applicable to the used geometry.