On Cycling Risk and Discomfort: Urban Safety Mapping and Bike Route Recommendations

Bike usage in Smart Cities becomes paramount for sustainable urban development. Cycling provides tremendous opportunities for a more healthy lifestyle, lower energy consumption and carbon emissions as well as reduction of traffic jams. While the number of cyclists increase along with the expansion of bike sharing initiatives and infrastructures, the number of bike accidents rises drastically threatening to jeopardize the bike urban movement. This paper studies cycling risk and discomfort using a diverse spectrum of data sources about geolocated bike accidents and their severity. Empirical continuous spatial risk estimations are calculated via kernel density contours that map safety in a case study of Zurich city. The role of weather, time, accident type and severity are illustrated. Given the predominance of self-caused accidents, an open-source software artifact for personalized route recommendations is introduced. The software is also used to collect open baseline route data that are compared with alternative ones that minimize risk or discomfort. These contributions can provide invaluable insights for urban planners to improve infrastructure. They can also improve the risk awareness of existing cyclists' as well as support new cyclists, such as tourists, to safely explore a new urban environment by bike.


Introduction
The use of bikes is transforming urban environments to Smart Cities [2,12] with the capacity to meet challenging sustainable development goals: cycling supports a more healthy lifestyle, it can decrease energy consumption and carbon emissions in urban centers, reduce traffic jams, limit the need to extensive car parking infrastructures and instead unfold opportunities for building parks, greenery and recreation areas. Bikes also introduce a new experience for tourists to explore a city. Bike sharing infrastructures massively expanding in urban centers show a climax of this transformation.
However, there is evidence that bike accidents are rising as well: While cycling traffic has increased by 35% since 2013 in Zürich, reported bike accidents have increased 1 by 60%. Cyclists' accidents are 5 to 6 times higher per travelled kilometer than that of car occupants [9] in Norway. It is predicted that the accident rate of cyclists is almsot 20 times higher than that of car occupants, when unrecorded bike accidents are considered [9]. Almost half million cyclists die every year in traffic accidents [11]. New insights about why bike accidents happen and how to decrease them are imperative for the adoption of bike as a predominant transport mean in sustainable Smart Cities.
This paper introduces a data-driven estimation of cycling risk and discomfort, which is used to map safety in the city of Zürich. The proposed estimation model is based on kernel density contours that can provide a continuous estimation of risk on the traffic network. The severity of the accidents, their causes, the role of the weather/seasonality as well as the day and time that accidents occur are expensively studied using a diverse spectrum of data sources from public authorities, health insurance policies, OpenStreetMap traces as well as typical routes collected from Zürich cyclists. The predominance of self-caused accidents indicate the potential to improve bike safety via personalized route recommendations generated by an open-source software artifact with which users can balance cycling safety and comfort.
Earlier work on bike safety studies environmental and demographic factors related to cycling safety, e.g. age, gender, daylight conditions and use of helmet [14,13]. Findings focus on the limited protection by helmets or the necessity that a child is developmentally ready for cycling. Data are mainly analyzed at an aggregate national level in the United States and location-specific risks are not taken into account.
Other work focuses on the design of risk metrics to measure safety such as the concept of exposure that accounts for the cycling distance and time spent before an accidents occurs [10]. The measure of exposure requires the choice of specific areas and times for modeling, in contrast to the approach introduced in this paper that can generalize the risk estimate to a continuous geographic spectrum. Moreover, the concept of exposure conveys the potential of an accident, while the risk estimate of this paper is explicitly based on actual accident data reported to official authorities. The relation between risk and exposure as well as other methodologies to model and measure cycling safety are reviewed extensively in a recent report by U.S. Department of Transportation [17], without though providing any quantitative data analysis as performed in this paper.
Other related work focuses on assessing the cycling routes safety in Berlin by counting the number of hot spots and dangerous intersections that a route contains [7]. No continues risk estimation is provided and all such route features are treated and counted equally. Spatio-temporal analysis frameworks of bike accidents are earlier introduced based on network-based kernel density estimation [3,9]. For instance, applicability in the city center of Vienna, Austria, reveals that bike accident hot spots vary in space according to season, light, and precipitation conditions, while these hot spots cluster by intersections and bus/tram/subway/bike stations [9]. Although the scope of this work is the closest to this paper, it neither covers route discomfort nor route recommendations.
The contributions of this paper are outlined as follows: (i) A continuous spatial risk and route discomfort estimation model for mapping cyclists' urban safety. (ii) A personalized route recommendation system that balances cycling risk and discomfort. (iii) The design of a novel data analytics pipeline to map cycling risk that combines processes receiving as input a diverse spectrum of data sources. (iv) Findings about the number of accidents, their severity and their causes in the area of Zürich, the influence of weather/seasonality as well as daily/weekly accident patterns. (v) An open-source software artifact 2 for the interactive collection of bike route data as well as the computation of personalized route recommendations. (vi) An open dataset [15] of cyclists' bike routes in the area of Zurich that can be used as baselines for multiobjective bike route optimization.
This paper is organized as follows: Section 2 and 3 introduce the spatial risk and route discomfort estimation models respectively. Section 4 introduces the concept of personalized route recommendations that balance safety and comfort. Section 5 illustrates the experimental methodology for the evaluation of the risk estimation model as well as a software artifact for data collection and bike route recommendations. Section 6 illustrates the findings of the performed data analysis. Finally, Section 7 concludes this paper and outlines future work.

Spatial Risk Estimation Model
This section introduces a general-purpose data-driven model for the spatial estimation of transport risk using geolocated traffic and accident data. Table 1 illustrates the main mathematical symbols and their data applicability. Risk is the conditional probability of involvement in a traffic accident A given the use of a particular transit method T . It is represented by a conditional probability density A|T as follows: where f A (x) and f T (x) are the respective marginal densities and f A,T (x) is the joint density. f A,T (x) can be viewed as a normalization or regularization of f A (x), which accounts for the traffic level at a certain location. In practice, the estimation of the conditional density, f A|T (x), is feasible with geolocated accident and transit data for the transit method T , representing samples from the joint distribution f A,T (x) and the marginal density f T (x) respectively. Given that the individual accident coordinates {x i } i=1,...,n are discrete points, a continuous and non-parametric density estimate can be calculated using kernel density estimation 3 (KDE) [6].
x T x 2h x T Transpose operation f Estimator of f -As s th partition of the accident data Accidents with severity s S Number of partitions 3 Given n points in d dimensions and a local density function, or kernel, the equation for estimating the density at a point x using the set {x i } i=1,...,n for a given kernel K parameterized locally by h is given as follows: where K h is the kernel function normalized such that integration over its local support is one, e.g. the Gaussian density: An isotropic zero-centered Gaussian kernel ensures that the local density estimation does not preferentially estimate densities in any transit direction. This is applicable if the orientation of all streets within a region is not known a priori. However, transit points can be spuriously related using such a kernel, e.g. accidents that occur on parallel but disconnected streets. The kernel density estimation assumes a homogeneous, i.e. linear, influence of the geolocated accident data that estimate the risk. However, additional meta-information can be employed to polarize the risk at certain accident locations, for instance, and assign a higher weight to accidents that result in a death than those resulting only in minor injuries. Assume that {x i } i=1,...,n can be partitioned into S meaningful subsets indexed by s. Classifying the geolocated accident data by severity gives subsets A s of size n s = |A s | labeled by meta-information about accident severity. The kernel density is reestimated in this case according to Corollary 1.

Corollary 1
The kernel density estimation of geolocated accident data classified in S subsets each with size n s is calculated as follows: wheref As is the kernel density estimated with the data of subset A s .
Proof The sum over all n elements can be expanded as the sum over all subsets and subset elements x sj as follows: The summands can be multiplied with 1 = ns ns . However, n s is constant within the second summand over A s and as such the numerator is moved outside the inner sum. Similarly, 1 n can be moved inside the first summand. Based on Equation 3 the kernel density estimation of partition s is derived: Partitions can be reweighted to reflect the consequence of an accident type, for instance insurance compensations for accidents of the S partitions, or the relative importance of the partitions S to a policy directive. The following equation: is such a generalization, where ns n corresponds to weights a s such that S s=1 a s = 1.

Route Discomfort Estimation Model
Along with the risk estimation, discomfort estimation can be used to assign a respective weight to each edge of the street network. The discomfort of a bike route is defined by the level of physical effort required to traverse the route by bike in terms of length and grade, i.e. slope. Distinguishing the contributions of the two variables with a closed expression is not straightforward. A vast amount of data and domain experience is required for designing a realistic model. The IBP index 4 is introduced in the context of bike route discomfort. It is generated by an algorithm that analyzes the difficulty of a mountain or bike route and it is backed by years of iteration between measurement and adjustment. This index is currently used by many associations and guides, for instance, the French Federation of Hiking. Although the IBP index is not based on a single closed expression given a series of corrections and fits accounting for multiple other parameters, it can be used to calibrate one such expression.
To make use of the the IBP index, different synthetic tracks of constant grade and specific length are generated and formatted in GPX files required for the input to the IBP index application. A linear dependency with length and an exponential one with grade is inferred after inspecting the outputs and the fitting curves. The discomfort expression is given in the following form: where d is the length of a street or route and x is the average grade: The expression is set constant below a -2.5% grade, as grade values lower than this threshold make no difference in effort for a cyclist. This limit is in line with the results from the IBP index calculator. Note that at zero grade, f (d, 0) ∝ d.

Personalized Route Reccomendation Based on Risk and Discomfort
Personalized route recommendations between a departure and destination point are calculated using Breadth First Search (BFS) over the street network with assigned weights on the streets. These weights are a cyclist's determined combination of the risk and discomfort estimates as follows: where w r , w d are the risk and discomfort weights respectively. An α = 0 prioritizes routes with minimal discomfort, while α = 1 prioritizes routes with minimal risk.

Data Science and Experimental Methodology
This section introduces a realization of the proposed risk estimation model. It also introduces a software artifact for bike route recommendations. Figure 1 outlines the bike riding risk assessment in Zürich city. A data science pipeline is designed that consists of the following stages: Stage 1: Extraction and categorization of bike accident data from the Swiss GeoAdmin 5 API 6 to obtain {x i } i=1,...,n = {A s } s=1,...,S . Stage 2: Extraction and processing of GPS trip traces from the OpenStreetMaps (OSM) API 7 . Stage 3: Extraction and processing of the Zürich street network from the Swiss GeoAdmin 5 API 6 . Stage 4: The kernel density estimation on traces and labeled accidents to calculate f T (x) and f As,T (x) ∀s. Stage 5: Calculation off As|T (x)∀s according to Equation 1 by taking the ratio off T (x) andf As,T (x)∀s.

Ratio of Densities
7) Figure 1: An overview of the data science pipeline for bike riding risk assessment.

Stage 1: accident data extraction
The Swiss Confederation 5 web portal has an interactive map of Switzerland with several spatial layers of publicly available data. One of these layers compiles and displays accidents involving bikes between 2011 and 2017. The data are collected by the Swiss Federal Roads Office from electronic police reports 9 . Together with the localization of the accidents, this layer provides an accident specification that includes the date and time, severity, cause, and street type. These features are visualized on the map, which also serves as the basis of an API service for batch data extraction 6 .
The API service 'identify' is used for data extraction. It generates and returns a list of at most 200 elements from a layer satisfying a geographic geometry specified using ESRI syntax 10 . For simplicity and scalability, the geometry used in this investigation is a bounding box specified by its horizontal and vertical extents as shown in Figure 2a. It is delimited 11  The 'severity' field from the extracted data is used to categorize the different accidents into light, severe and death injuries.

Stage 2: trip data extraction
Transit data from users' GPS traces are downloaded in XML format from Open-StreetMaps 7 and treated as a sample of f T (x). A 5% of the traces are labeled by means of transport, from which all non-bike traces are removed to improve the quality of estimation of f T (x). For unlabeled traces, travel homogeneity between methods of transit is assumed as streets included in the selected data window are primarily multi-use, and so a high volume of general usage implies high bike usage as well.
This assumption is also motivated by the very limited available data sources that can be used in the scope of this paper. For instance, Open Data Zürich 13 has more precise bike traffic data, but they lack resolution and cannot be used for kernel density estimation. Although the introduced assumption imposes limitations on the generalization of the performed analysis, a comparison betweenf A,T (x) andf A|T (x) in Figure 4 shows smooth adjustments rather than drastic changes to the general trends, suggesting that the data pattern is not lost due to the imposed imprecision. 12 To ensure all data points are obtained including those on the boundaries, the final extraction windows applied to each subdivision are selected with deliberate overlap at the boundaries. This process created duplicate records that are removed using the unique ID associated with each element. 13 Available at https://data.stadt-zuerich.ch/ (last access: May 2019).

Stage 3: street network data extraction
The Zürich street network required for f A|T (x) is extracted from the Swiss Confederation web portal 14 as shown in Stage 1. It is stored by the coordinates (latitude, longitude, altitude) of points along the city streets and the street segments linking these points. While the street points are not equidistant, they are always present at intersections. Moreover, the large number of points defining the street network makes computations, such as graph search used for route recommendations, very expensive. The computational load is reduced by keeping the intersection points only, which is roughly 10% of the total street data elements. After the data extraction and pre-processing, the extracted data are modeled as a graph structure. Each street network point is assigned to a node and the points are connected by the edges representing street segments. In terms of the accuracy in this process, the source data are already grouped into street segments and the coordinates match at the intersections, with some tolerance in the last decimal digits.

Stage 4-7: risk estimation
The kernel density estimation is performed using the kde2d function in the stats package 15    The density f As,T (x) of bike accidents is calculated for each severity level s. Similarly, f T (x) is estimated using the OpenStreetMap traces to finally calculate Equation 4. As the studied area is not perfectly square, a grid with 560 horizontal and 440 vertical divisions is imposed and estimations are made at the intersections of the grid lines. This asymmetry ensures that the evaluation positions are equispaced in terms of WGS84 coordinates of latitude and longitude.
To generate f R (x), a relative weighting of 1:6:6 for light injuries, severe injuries, and death respectively is used to recombine the partition densities. These weights are based on the following: (i) Insurance compensation policy data of 5'000:30'000:100'000 CHF for the respective accident severity levels 8 . (ii) The relative factor of 20 for death is scaled down to 6 to prevent over-polarized contours with extreme peaks.
Low traffic areas, as measured by the volume of OpenStreetMap traffic, may result in contour peaks despite a similar accident rate per trip for these areas, i.e. the ratio off As,T (x)/f T (x) from Equation 1. Non-normalized and normalized density contours are compared in Figure 4a and 4b. The contour peaks of the latter are less extreme than those in Figure 4a, while the dominant peaks remain the same and distinguishable in both versions, suggesting that normalization has a desirable effect.  Moreover, note that the estimation window in Figure 3 extends beyond the specified studied region. Density estimation has highly variable boundary behavior due to the abruptly exclusion of points at the window edges. This boundary effect, further exacerbated by using the ratio of the densities estimated over the window, results in spuriously peaked boundary estimates of f As|T (x). An extended window is introduced to estimate the densities restricted back and normalized to the studied region.
At the final stage, f R (x) is mapped to the street network using simple linear interpolation. The resulting normalized risk is plotted on a map of Zürich using the ggmap [8] and ggplot2 [18] packages in R. The interpolated risks on the street network are displayed in Figure 4c.
Immediately apparent is the relatively high risk in two vibrantly orange areas near Hardbrücke 16 and Langstrasse 17 . These areas are, by a wide margin, the most dangerous in Zürich and the magnitude of their risk makes visual risk inspection throughout the rest of Zürich challenging. A Box-Cox power transformation [5] with an exponent of 1 2 is applied to the data as shown in Figure 4d. The variation in risk is more visually apparent and so it is easier to distinguish higher and lower risk areas.
The risk estimation method illustrated in this paper relies on the quality of the reported accident data. However, it is likely that accidents are under-reported to police, especially those that do not result in injuries or property damage. The following reasoning is made about these unreported accidents: (i) As unreported accidents are expected to be of light severity, they are not expected to significantly increase the estimated risk values. Moreover cyclists are more likely to trust data that are directly linked to reported accidents and convey a consequence or threat. (ii) Under the assumption that unreported accidents are homogeneously distributed in the studied area, their influence on the estimated density contours is negligible.

A software artifact for personalized bike route recommendations
A software artifact is introduced to calculate route recommendations on a weighted graph extracted from the street network. The departure and destination points are matched to nodes of the graph. The matching is performed by the distance minimization between the points and nodes. The graph is implemented as a Python object and the software executes the BFT algorithm 18 as illustrated in Section 4.
A user interface with an interactive map is designed and implemented in the Python tkinter library as shown in Figure 8a. It is used to acquire the required user input to generate a personalized recommended route, i.e. the α weight, the departure and destination point. The user can click on the map to determine these points, as well as intermediate route points, whose latitude and longitude coordinates are computed in the background. The recommended route for a given α is displayed on the map. The user input and the calculated values, i.e. total risk, discomfort and the points that form the route, can be exported in a .txt file.
Two evaluation methods of the software artifact and the route recommendations are introduced: (i) 24 typical bike routes by 8 individuals, who cycle regularly in the studied area, are collected via the software artifact. Data collection is performed via the software artifact by clicking the interactive map several times to form a route that can be then exported as illustrated earlier. These routes are referred to as baseline routes and they are compared to the route recommendations for the same departure and destination points and different α values. (ii) A number of 2000 random departure and destination points is generated on the map. For each pair, three route recommendations are generated for α = 0, α = 0.5 and α = 0.75. Based on these generated routes, the utilization of the street network can be compared for each of these three cases. In this way, the overall risk and discomfort estimation that stems from the route recommendations is mapped on the studied region.

Experimental Evaluation
This section illustrates a data analysis of the accidents and evaluates the bike route recommendations.

Accident analysis
The total number of reported accidents in the specified region and time frame is 1305: 1023 light injuries, 277 severe injuries, and 5 deaths. Because of the low number of death events, they are included in the group of severe injuries. It is likely that circumstances separating a severe injury from a death may have little to do with accident features and more with the constitution of the victim. Figure 5a illustrates the yearly evolution of accidents. From 2013 to 2017, an increase of approximately a 50% is observed. Figure 5b shows the probability of accidents resulting in severe injuries. The values remain within the expected variation 19 .  The total number of accidents per month across all years vs. the average temperature per month 20 is shown in Figure 5c. That fewer accidents occur in colder months reflects the temporal transit patters of citizens in Zürich: choosing public transport or driving by car during the colder months of the year to avoid the discomfort of cycling in the cold. On the other hand, the severity of the accidents in relative number is higher during the winter months, as shown in Figure 5d. Given the highly diverse grade, i.e. slope, of the studied region, snow and frozen street surfaces in winter are likely to explain this observation. Precipitation also seems important. Summer shows on average up to 70% higher precipitation 20 than the months of March and November during which the lowest fractions of severe injuries are observed. Figure 6a illustrates the relation between the accidents and their causes. Selfcaused accidents are predominant, a 40% of of all accidents are self-caused. Additionally, they show the highest severity, 28%, as shown in Figure 6b. Head-on collisions and accidents on crossing lanes follow in severity, suggesting that intersections entail higher risk overall. Given the predominance of self-caused accidents, the potential to improve bike riding safety via warnings and route recommendations is apparent. Such risk communication can improve cyclist awareness and simultaneously contribute to greater cycling confidence in tourists and new cyclists.   Figure 6c illustrates the time of accident occurrences during weekdays, which show significantly more accidents than weekends. Weekday accidents usually happen early morning and late afternoon, suggesting that accidents happen during commuting times, i.e. home-work and vice versa. During weekends accidents mainly appear during the following times: (i) Saturday afternoon, probably corresponding to shopping/outings. (ii) Early morning hours on Saturday and Sunday, suggesting accidents related to poor visibility conditions, fatigue, and alcohol consumption. Figure 6d shows that the latter are the most severe ones.

Bike route recommendations: safety vs. discomfort
Compared to the bike routes of Google Maps, the recommended routes of the designed software artifact are similar in overall for α = 0 in Equation 8. Nevertheless, they are in general slightly shorter in distance and longer in time than those of Google Maps. This difference can be attributed to additional information potentially utilized by Google Maps, for instance information about the street infrastructure, drivers' and cyclists' route preference as well as traffic lights. Figure 7 shows the relative improvement of risk and discomfort between the 24 baseline routes and the recommended routes for different α values. A clear trade-off between safety and comfort is observed. Optimal values of α lie between 0.2 and 0.4. Figure 8b and 8c illustrate the changes in street utilization by increasing from α = 0 to α = 0.5 and α = 0.75 respectively, i.e. higher priority on safety is given to the recommended routes by BFS. The 2000 randomly generated departure and destination points are used for the mapping of street utilizations.
The colored maps show that areas such as Langstrasse 17 and Bahnhofstrasse 21 are avoided already for α = 0.5, while two cross-city routes become dominant. Relative improvement discomfort risk Figure 7: Average relative improvement of risk and discomfort between baseline and recommended routes. The light purple line is the mean improvement between the two estimates. It can be used to assess α values with a good balance between the two.

Conclusion and Future Work
This paper concludes that a data-driven approach for the estimation and mapping of cycling risk in complex evolving urban environments can provide invaluable empirical insights about safety. This is shown for the city center of Zürich, in which continuous risk contours are calculated based on historical geolocated accident data and information about their severity, linked to compensation policies of health insurances. Findings shows that bike accidents increase at a higher rate than bike use, while weather, seasonality, day of the week and time play a role on the likelihood of an accident and its severity. The predominance of self-caused accidents suggests the requirement for a higher awareness of risks and safe routing information. This requirement is met by personalized route recommendations that balance safety and comfort. The findings of this paper have an impact on the following: (i) The cyclists' risk awareness and safety improvement. (ii) Policy-making for improving transport infrastructure and encourage further the use of environmentally friendly transport means such as bikes by existing city habitants, new cyclists as well as tourists.
Future work includes the expansion of the risk and discomfort estimation with exposure and vibration measures [17,4], the study and comparison of several other cities [9], the influence of other traffic in cycling safety as well as the design of traffic simulation models for the participatory multi-objective optimization of traffic flows [16,1,12].