Introduction

Car crashes have long been recognised as exerting tremendous effects on economic development and human well-beings among societies. Most people involved in road traffic crashes will get injured or even die from the accidents, where the number of crash deaths peaked at 1.35 million around the world in 2016, with more than half of fatalities aged between 15 and 44 (World Health Organization (WHO, 2018). London also engenders such challenges and has around 80,000 reported collisions during 2017–2019 before the pandemic in 2020, when people aged between 25 and 59 accounting for the most casualties, whilst those over 60 have a higher fatality rate. Such an issue was predicted to get worse alongside the rapid population growth, as the Greater London Authority (GLA) projected a potential population of around 11 million in 2050 based on 8.17 million citizens censused in 2011 (GLA, 2020). On the other hand, the COVID-19 pandemic exerted significant influences on road traffic activities with the figure that, UK Office for National Statistics (ONS, 2020) reported up to 40% of workers work from home during the pandemic, dramatically reducing traffic demand and expected to have a long-term effect on people’s daily travel behaviour (Katrakazas et al., 2020) and transport mode preference (Batty, 2020). For example, Almansouri & Cetin, (2021) had found that commuters’ frequency of visits, and their preference of distances to shopping malls had changed significantly upon the pandemic. It had been further testified by Yasin et al., (2021) and Amberber et al., (2021)’s work that, there was a significant reduction in traffic collisions during the pandemic due to changes in traffic volumes, driver behaviours, and travel behaviours. To investigate such changes in depth, spatial analysis is recognised as one of the most efficient methods (Kaya et al., 2019). Therefore, studies on road safety by varied daily patterns during the target period were considered to be in a better position formulating car crash prevention deployment strategies.

Empirical studies on crash distributions identified the spatial and temporal variations that, crashes tend to occur more around working places rather than residential places (Levine et al., 1995), and had been thought to be affected by multiple factors, e.g., daily human activities and build environment factors (Ziakopoulos & Yannis, 2020). Land use types play an important role in estimating car crashes (Pulugurtha et al., 2013) and teen crash frequency (Mathew et al., 2022) at aggregated levels, which can provide valuable insights for design intervention measures, in the aim to reduce the risk of crashes among vulnerable population. However, land use characteristics are of less use in analyzing car crashes occurred in highly urbanized and cosmopolitan cities or regions, where a high-degreed mosaic of land uses and spatial heterogeneity are expected. To address such issue at finer spatial scale in urbanised regions, POIs had been selected as the proxy for land use types, in order analyse the relationships between urban forms and collision patterns (Worachairungreung et al., 2021). Besides, POIs could be more powerful in explaining the varied adversity from collisions by age group. Because collision-vulnerable population were normally defined by age groups, considering the concentration of certain age groups in proximity to POIs, the differences in mobility patterns among age groups, and the varied travel behaviour and mode preferences by age group (Pour et al., 2017). However, there were still few research examining the interactive effect between multiple POIs and traffic crashes by age groups, making the topic worthy of further exploration.

Being motivated by the incentive to improve road safety intelligently and support traffic recovery, this study aims to identify spatiotemporal patterns of age-categorised traffic crashes, and further to investigate the influential factors towards severer crashes pre- and during pandemic. Besides, to gain richer knowledge of traffic crashes’ variations among age groups in London pre- and during pandemic, two main objectives are expected to be arrived at:

  • The spatiotemporal pattern and varied recognition of car crashes in two different representative years, and the underlying influences from POI factors.

  • The spatiotemporal casualty patterns by age group, as well as the effects from people’s varied daily activities pre- and during the pandemic.

To realise such objectives, this study will (1) analyse the spatiotemporal patterns of car crashes considering injury severity; (2)quantify their relationships with human activities presented by POI categoriesthrough the geographical detector; (3) extract the spatial and temporal crashes patterns of victims by age groups; and (4) measure the relationship between their daily activities and corresponding patterns of car crashes based on the road safety data in 2019 and 2020, respectively.

Literature Review

The COVID-19 pandemic had a significant impact on road safety and vehicle collision patterns, such as changes in their frequency and severity. Specifically, there is a notable decline in the number of vehicle collisions and associated injuries due to reduced traffic volume and remote replacements for social activities (Cappellari & Weber, 2022; Sutherland et al., 2020). On the contrary, the severity of injuries and deaths occurred from those collisions has relatively increased (Yasin et al., 2021). However, the nexus among different factors, especially those related to casualties need to be further analyzed to fully understand these changes. This understanding is crucial for policymakers and experts to develop effective strategies for road safety.

Their pattern changes and factors can be investigated through various methods. Two potential approaches are spatial analysis and factor analysis.. The spatial analysis techniques include examining car crashes’ first-order effect, i.e., the hot spot of car crashes detection, and the second-order effect, i.e., spatial clusters detection, with Kernel Density Estimation (KDE) as a widely adopted approach. In order to map traffic crashes, Okabe et al.’s proposes network-based KDE (NetKDE) estimating crashes along the roadway and the development of the SANET toolbox, followed by an increasing number of studies on car crashes using the NetKDE approach (Xie & Yan, 2008; Zahran et al., 2021). Besides, Koloushani et al., (2022) found that hot spots of young-driver-involved crashes data supported the justification that NetKDE has a better performance than planar KDE when receiving localized focuses. The second-order effect had traditionally been measured by widely recognised Moran’s I index, in the aim to estimate the strength of spatial autocorrelation and discover statistically significant risk clusters. For example, Iyanda, (2019)’s work explores the distribution of accident severity, and Shabanikiya et al., (2020)’s work identifies high-risk areas of pedestrian crashes with children’s involvement.

To investigate the influences on traffic crashes from environmental factors, factor analysis had been utilised as the mainstream with car crashes being treated as point events, and the count data models as the basic methods embracing spatial heterogeneity existence. For example, the spatial zero-inflated negative binomial model (Champahom et al., 2020), and the fixed bandwidth geographically and temporally weighted ordered logistic regression model (Chen et al., 2022). However, these models failed to capture the dynamic interactions between selected factors and traffic crashes. To fulfill such a gap, geographical detector proposed by Wang et al., (2016) had been considered in light of its being excellent at measuring the influence of factors on geographical phenomena based on spatial stratified heterogeneity analysis, and received extensive applications in varied research fields, like housing price (Wang et al., 2017) and air pollution (Ding et al., 2019), as well as car crash analysis (Zhang et al., 2020) assessing injury factors’ influences on casualties considering their mutual interaction, but still leaving the research on interactions of factors for road safety less explored.

In parallel, spatiotemporal data mining technique has been taken as a more efficient method exploring patterns of spatiotemporal data (Han et al., 2012), in comparison with traditional spatial analysis methods. This data mining rooted method on spatiotemporal data mining and urban computing (Zheng et al., 2014a) was represented by the most favourite approach, tensor decomposition (Kolda & Bader, 2009). Tensor decomposition has been advantaged by its modelling multidimensional data and dealing with data sparsity issues, with example applications in analysing human mobility from OD-matrix data and decomposing the spatiotemporal patterns of human mobility through measuring the variability of multidimensional mobility patterns (Sun & Axhausen, 2016; Yao et al., 2015); Zheng et al. (2014b) also applied it onto New York City’s sparse noise data analysis, and tried to recover the noise level by combining with POIs, road network, and social media data. However, there was still very limited literature applying it in traffic crash research.

In a nutshell, the limitations of empirical studies (summarized in Table 1) rendered this study the opportunity for improvements in: combining spatiotemporal data mining with spatial analysis and factor analysis techniques, onto age-categorised traffic crashes analysis from three dimensions which are time, space, and casualty severity by age group, to highlight the age-bounded effects from traffic crashes.

Table 1 Limitations of current studies

Data and Methodology

London, as the capital of the United Kingdom, has been chosen as the study area with its 4,835 Lower Super Output Area (LSOA) units (GLA, 2014), where each has an average population of 1500 (Green et al., 2011).

Data Acquisition

Road safety data of London had been collected from the GOV.UK in point format at LSOA scale from 01 January to 30 December 2019 and from 01 January to 29 December 2020, with detailed information on recorded car crashes at 29,023 in 2019 and 23,551 in 2020 respectively (Table 2). Among the records, 84.47% in 2019 and 85.3% in 2020 were slight, but 0.57% in 2019 and 0.53% in 2020 were fatal. In reference to Department for Transport (DfT)’s 11 breakdown categories of the casualties by age group, this study restructured the age groups into seven research conveniences as listed in Tables 2 and 3.

Table 2 Car crashes and causalities by age group (2020 vs. 2019)
Table 3 Statistical summary of POI categories in London

It could be read from Table 2 that, the car crashes and causality in 2020 had decreased from the level in 2019 during the pandemic, but with a similar distribution pattern across age group breakdowns; the frequency of all crashes or by varied severity in 2020 had been decreased proportionally by age group against 2019. Since this study is aiming to explore the pattern and mechanism of safer traffic recovery from pandemics, 2019 and 2020 as the representative normal and abnormal years, respectively, would be sufficient to suggest the patterns, hence shed light on the traffic recovery strategies upon recovery from 2020 back to normal. So the following analysis will focus on the case study in 2019 and 2020 to explore the patterns in detail.

POI datasets in this study had been collected from the Ordnance Survey, consisting of geospatial information on all London business, entertainment, education, etc. in 2019, with ten categories: accommodation, eating and drinking, attractions, commercial services, sport and entertainment, education and health, public infrastructure, transport, manufacturing and production, and retail, as summarised and represented as relevant icons in Table 3.

Severity-weighted Index

Severity-weighted index (SWI) had been calculated to indicate the seriousness of crashes rather than purely the volume of crashes, based on the combined 5–3-1 weighting system (Geurts et al., 2004). The system was developed and deployed by Flanders government of Belgium to locate dangerous accident locations. Based on sensitivity analysis, Geurts et al., (2004) proved that it is a moderate approach to highlight the importance of deadly accidents, whilst capture the overall severity of injuries resulting from collisions, with each fatal injury weighted at 5, serious injury weighted at 3, slight injury weighted at 1 in Eq. (1):

$$\mathrm{ SWI}= 5\times {\mathrm{x}}_{\mathrm{fatal}}+3\times {\mathrm{x}}_{\mathrm{serious}}+{\mathrm{x}}_{\mathrm{slight}}$$
(1)

where \({\mathrm{x}}_{\mathrm{fatal}}\), \({\mathrm{x}}_{\mathrm{serious}}\) and \({\mathrm{x}}_{\mathrm{slight}}\) respectively presents for the total number of fatal, serious, and slight traffic crashes.

Spatial Analysis

Spatial analysis of crash distributions on basis of road networks and planar maps at LSOA level, to discover the first-order and second-order effects through NetKDE and Moran’s I statistics respectively.

Hot spots Detection Based on the NetKDE

NetKDE targeting at point events in a network space is an extension of planar KDE with its two main specialties (Xie & Yan, 2008): (1) network distance QUOTE \({\mathrm{d}}_{\mathrm{is}}\) from crash point QUOTE \(\mathrm{ i}\) to location QUOTE \(\mathrm{ s}\) is calculated as the shortest-path distance along the road network; and (2) the density estimator is computed per linear unit as defined below in Eq. (2):

$$\uplambda \left(\mathrm{s}\right)=\frac{1}{\mathrm{r}}\sum\nolimits_{\mathrm{i}=1}^{\mathrm{n}}{\mathrm{w}}_{\mathrm{i}}\cdot \mathrm{K}(\frac{{\mathrm{d}}_{\mathrm{is}}}{\mathrm{r}})$$
(2)

where \(\uplambda \left(\mathrm{s}\right)\) presents the density estimator at location ; is the bandwidth of the KDE, and is the weight of crash point \(\mathrm{ i}\). \(\mathrm{ K}\) is the kernel modeled as the kernel function of distance \({\mathrm{d}}_{\mathrm{is}}\). The method assumes that crashes occur alongside roadways. Crashes data need to be snapped on the network within a 10-m road network distance, because previous crashes may not be recorded on road networks due to issues such as the low precision of the measurement system or car moving right after a crash.

Xie & Yan, (2008) also pointed out that the network needs to be divided into an equal-length linear unit, which may have an impact on the local variation details. So this study selects the 200-m road segment as the basic unit considering London’s vast road network, in order to decrease the number of linear units so for a good balance between computational efficiency and accuracy. On the other hand, the choice of bandwidth also affects performance of the network KDE, As suggested by (Okabe & Sugihara, 2012) an optimal bandwidth should be 100–300 m would be optimal, so this study designed a bandwidth of 200 m for network KDE and Gaussian function as the kernel function. It is hypothesised that network KDE can improve the drawback of planar KDE (Xie & Yan, 2008) on over-detecting clustered point events, hence enhancing the accuracy of density estimation in intersections.

Spatial Clustering Identification using Moran’s I Statistics

Spatial autocorrelation can reflect the strength of the spatial dependence between factors, so Moran’s I statistics could be utilised to identify clusters with similar SWIs or the dispersion of those with dissimilar SWIs. The global Moran’s I statistic, denoted as QUOTE , can be computed as in Eq. (3):

$$\mathrm{ I}=\frac{\mathrm{n}\sum_{\mathrm{i}=1}^{\mathrm{n}}\sum_{\mathrm{j}=1}^{\mathrm{n}}{\mathrm{w}}_{\mathrm{ij}}{\mathrm{d}}_{\mathrm{i}}{\mathrm{d}}_{\mathrm{j}}}{\sum_{\mathrm{i}=1}^{\mathrm{n}}\sum_{\mathrm{j}=1}^{\mathrm{n}}{\mathrm{w}}_{\mathrm{ij}}\sum_{\mathrm{i}=1}^{\mathrm{n}}{{\mathrm{d}}_{\mathrm{i}}}^{2}}$$
(3)

where is the total number of factors, and present the deviations of the -th and -th LSOA’s SWI to their means: and , and is the spatial weight between factor and , which represents the relationship between research units under Queen criterion. Down to the local indicators of spatial association (LISA) measuring the degree of local spatial autocorrelation of SWIs could be expressed in Eq. (4) as:

$${\mathrm{LISA}}_{\mathrm{i}}={\mathrm{z}}_{\mathrm{i}}\sum\nolimits_{\mathrm{j}}{\mathrm{w}}_{\mathrm{ij}}{\mathrm{z}}_{\mathrm{j}}{\mathrm{z}}_{\mathrm{i}}=\frac{{\mathrm{SWI}}_{\mathrm{i}}-\overline{\mathrm{SWI}}}{\sqrt{\frac{1}{\mathrm{n} }\sum ({{\mathrm{SWI}}_{\mathrm{i}}-\overline{\mathrm{SWI} })}^{2}}}$$
(4)

where LISA clusters denoted as High-High (HH) illustrate the high-valued target clustered with high-valued neighbours; the low-low (LL) cluster indicates for low values are clustered with each other; the high-low (HL) or low–high (LH) clusters are high values surrounded by low values and vice versa.

Geographical Detector for Factor Analysis

Upon exploring the spatial clustering of crashes, it is necessary to quantify the influence of POI factors on traffic crashes through geographical detectors, which can reflect daily human activities for our in-depth interpretation. In this study, it is assumed that if road traffic crashes were affected by POI categories, they may follow a similar spatial distribution pattern. Among the four geographical detectors, which are factor detector, interactive detector, ecological detector, and risk detector, the former two are deployed in this study to measure the nonlinear or linear relationship between factors. The factor detector is used to find the dominant factor of the geographical phenomena, while the interactive detector is applied to measure the interconnected effects between pairs of factors on crashes’ distributions.

The factor detector computes the value of the power determinant (PD) through q statistics, denoting the percentage of explainability for target variable by another variable, which is similar to R-square statistics of regression models, as in Eq. (5):

$$\mathrm{ PD}=1-\frac{1}{\mathrm{N}{\upsigma }^{2}}\sum \nolimits_{\mathrm{h}=1}^{\mathrm{l}}{\mathrm{N}}_{\mathrm{h}}{\upsigma }_{\mathrm{h}}^{2}$$
(5)

where denotes the number of LSOA units, and represent the variance of the dependent variable of the whole study area and that in the units respectively; the PD value indicates the strength of the spatially stratified heterogeneity of crash distributions and the contribution of the POIs to their spatial pattern.

The interaction detector is good at estimating the interactions between two individual factors. If the PD value of interaction is greater than the accumulation of the PD values from individual effect, a significant enhancement influence from the interaction is present. This study will build up the model with the number of POI categories in each LSOA as an explanatory variable, and numeric variables need to be stratified for detectors; whilst Jenks’s natural breaks (JNB) method is applied to determine the optimised categorisation of each POI, in the criteria of minimising the variances within classes but maximising them between classes (Chen et al., 2013).

Spatiotemporal Pattern Detection using Tensor Decomposition

Tensor decomposition excels at analysing the multidimensional road safety data in time and space patterns, hence being able to decide the various crash scenarios for geographical detector methods to fit in. For example, if a crash pattern involving minors during rush hours has been identified with a high PD value between POIs of education, that indicates a scenario about students & rush hours.

Tensor Construction and Decomposition

An Nth-order tensor is a multidimensional matrix of N vector spaces, with each vector as a first-order tensor (N = 1) and the matrix as a second-order tensor (N = 2), but in vision to present three- or higher-dimensional data hereby with a higher order of the tensor. For example, three-dimensional data can be written in a tensor as QUOTE \(\mathrm{ A}\in {\mathrm{R}}^{{\mathrm{I}}_{1}\times {\mathrm{I}}_{2}\times {\mathrm{I}}_{3}}\) with three dimensions denoting QUOTE \({\mathrm{I}}_{1}\), QUOTE \({\mathrm{I}}_{2}\), and QUOTE \({\mathrm{I}}_{3}\). To extract and discover latent patterns from tensors, approaches such as singular vector decomposition (SVD), latent factor modeling (LFM), and principal component analysis (PCA) are required. However, such methods may cause information loss due to corrupted data, and make tensor decomposition outstanding with much better performances (Kolda & Bader, 2009). The most efficient tensor decomposition method is tucker tensor decomposition (Tucker, 1966), to decompose the tensor QUOTE \(\mathrm{ A}\in {\mathrm{R}}^{{\mathrm{I}}_{1}\times {\mathrm{I}}_{2}\times \cdots \times {\mathrm{I}}_{\mathrm{N}}}\) into a core tensor QUOTE \({\mathrm{G}\in \mathrm{R}}^{{\mathrm{J}}_{1}\times {\mathrm{J}}_{2}\times \cdots \times {\mathrm{J}}_{\mathrm{N}}}\) and a set of nonsingular matrices QUOTE \({{\mathrm{A}}^{(1)}\in \mathrm{R}}^{{\mathrm{I}}_{1}\times {\mathrm{J}}_{1}}\), QUOTE \({{\mathrm{A}}^{(2)}\in \mathrm{R}}^{{\mathrm{I}}_{2}\times {\mathrm{J}}_{2}}\), QUOTE , and QUOTE \({{\mathrm{A}}^{(\mathrm{n})}\in \mathrm{R}}^{{\mathrm{I}}_{\mathrm{n}}\times {\mathrm{J}}_{\mathrm{n}}}\). It should be noted that the size of the core tensor must be defined manually, which may affect the performance of the model in Eq. (6).

$$\mathrm{ A}\cong \left[\mathrm{G};{\mathrm{A}}^{(1)}, {\mathrm{A}}^{(2)},\cdots , {\mathrm{A}}^{(\mathrm{n})}\right]=\mathrm{G}{\times }_{1}{\mathrm{A}}^{(1)}{\times }_{2}{\mathrm{A}}^{(2)}\cdots {\times }_{\mathrm{n}}{\mathrm{A}}^{(\mathrm{n})}$$
(6)

where is the n-mode tensor-matrix product of a tensor with a matrix , which means multiplying a tensor by an n-mode matrix and is defined in Eq. (7):

$${\left[\mathrm{G}{\times }_{\mathrm{n}}\mathrm{A}\right]}_{{\mathrm{j}}_{1},\cdots ,{\mathrm{j}}_{\mathrm{n}-1},\mathrm{i},{\mathrm{j}}_{\mathrm{n}+1},\cdots {\mathrm{j}}_{\mathrm{N}}}=\sum\nolimits_{{\mathrm{j}}_{\mathrm{n}}=1}^{{\mathrm{J}}_{\mathrm{n}}}{\mathrm{G}}_{{\mathrm{j}}_{1},\cdots ,{\mathrm{j}}_{\mathrm{n}-1},\mathrm{i},{\mathrm{j}}_{\mathrm{n}+1},\cdots {\mathrm{j}}_{\mathrm{N}}}{\mathrm{A}}_{\mathrm{i},{\mathrm{j}}_{\mathrm{n}}}\in {\mathrm{R}}^{{\mathrm{J}}_{1},\cdots ,{\mathrm{J}}_{\mathrm{n}-1},\mathrm{I},{\mathrm{J}}_{\mathrm{n}+1},\cdots {\mathrm{J}}_{\mathrm{N}}}$$
(7)

Nonnegative Tucker Decomposition

The Tucker tensor decomposition also has a drawback on possible negative elements in its results, which deviates from the fact that no negative SWI of crashes data, hereby calls up the utilisation of nonnegative Tucker decomposition (NTD) (Shashua & Hazan, 2005) to find a nonnegative core tensor QUOTE and nonnegative matrices QUOTE \({\mathrm{A}}^{(1)}\), QUOTE \({\mathrm{A}}^{(2)}\), QUOTE \(\cdots\), QUOTE \({\mathrm{A}}^{(\mathrm{n})}\), and be modelled in Eq. (8) for optimisation:

$$\mathrm{ minimize }\frac{1}{2}{\Vert \mathrm{G}{\times }_{1}{\mathrm{A}}^{(1)}{\times }_{2}{\mathrm{A}}^{(2)}\cdots {\times }_{\mathrm{n}}{\mathrm{A}}^{\left(\mathrm{n}\right)}-\mathrm{A}\Vert }_{\mathrm{F}}^{2}$$
(8)

subject to \({\mathrm{G}\in \mathrm{R}}^{{\mathrm{J}}_{1}\times {\mathrm{J}}_{2}\times \cdots \times {\mathrm{J}}_{\mathrm{N}}}\ge 0,\) \({{\mathrm{A}}^{(\mathrm{n})}\in \mathrm{R}}^{\mathrm{I}\times {\mathrm{J}}_{\mathrm{n}}}\ge 0 \forall \mathrm{n}=\mathrm{1,2},3\cdots ,\mathrm{N}\) where \({\Vert \bullet \Vert }_{\mathrm{F}}\) presents the Frobenius norm of a tensor \(\upchi \in {\mathrm{R}}^{{\mathrm{I}}_{1}\times {\mathrm{I}}_{2}\times \cdots \times {\mathrm{I}}_{\mathrm{N}}}\) defined as:

$${\Vert \upchi \Vert }_{\mathrm{F}}=\sqrt{\sum\nolimits_{{\mathrm{i}}_{1}=1}^{{\mathrm{I}}_{1}}\cdots \sum\nolimits_{{\mathrm{i}}_{\mathrm{N}}=1}^{{\mathrm{I}}_{\mathrm{N}}}{\mathrm{x}}_{{\mathrm{i}}_{1},{\mathrm{i}}_{2},\cdots {\mathrm{i}}_{\mathrm{N}}}^{2}}$$
(9)

Since the size of the core tensor influences the capability of the NTD model, it is crucial to find an appropriate size on basis of Kullbask-Leibler (KL) divergence, which is also known as the relative entropy and is widely used to measure the similarity between two discrete density distributions \(\mathrm{ M}\) and \(\mathrm{ N}\) (Hershey & Olsen, 2007).

$${\mathrm{D}}_{\mathrm{KL}}\left(\mathrm{M}\Vert \mathrm{N}\right)=-\sum \mathrm{m}\left(\mathrm{x}\right)\mathrm{logn}\left(\mathrm{x}\right)+\sum \mathrm{m}\left(\mathrm{x}\right)\mathrm{logm}\left(\mathrm{x}\right)=\mathrm{H}\left(\mathrm{M},\mathrm{N}\right)-\mathrm{H}(\mathrm{M})$$
(10)

where the entropy of distribution \(\mathrm{ M}\) is referred to as \(\mathrm{ H}\left(\mathrm{M}\right)\), and the cross-entropy of distributions \(\mathrm{ M}\) and \(\mathrm{ N}\) is presented as \(\mathrm{ H}\left(\mathrm{M},\mathrm{N}\right)\). Additionally, the following equation illustrates the measurement of the original tensor \(\mathrm{ A}\in {\mathrm{R}}^{\mathrm{i}\times \mathrm{j}\times \mathrm{k}}\) and decomposed tensor \(\widehat{\mathrm{A}}\in {\mathrm{R}}^{\mathrm{i}\times \mathrm{j}\times \mathrm{k}}\) based on the KL divergence: the smaller value is, the closer between two distributions; and the optimal size will be indicated by the converging point of the minimum value.

Results

Temporal Pattern of Road Traffic Crashes

The SWI index of traffic crashes (accumulative) and their temporal patterns in 2019 and 2020 are summarized in Fig. 1 (a, b, c), illustrating similar distribution patterns. The daily patterns in either time-of-day or day-of-week could be presented in Fig. 1(a), where the y-axis indicates day-of-week, the x-axis stands for time-of-day by every two hours, and the legend on the right scaled values of SWI. It is obvious that crashes occur more frequently on weekdays than on weekends; the peak hours on weekdays are routinely 8 am to 9 am and 5 pm to 7 pm; comparatively, weekends have a higher risk of traffic crashes at midnight in these two years. An exceptional observation on Wednesday afternoon with a minor peak time 3 pm to 4 pm might owe to its routinely being the sports day after 2 pm. However, there was a small difference that the maximums of the SWI during the pandemic were less than those in 2019. This phenomenon is also illustrated in Fig. 1b and c on the characteristics of crashes’ breakdown by seven age groups on weekdays and weekends, respectively. Besides, Fig. 1(b) clearly shows the higher causality for 26–35 years old during peak hours on weekdays, followed by the relatively higher causality for children and youngsters (0–18 and 19–25) after school. Weekends’ car crashes rarely happen at midnight, as reflected in Fig. 1(c), but more tend to accumulate among young and middle-aged groups (19–25, 26–35 and 36–45) from rush hours in the afternoon throughout to midnight.

Fig. 1
figure 1figure 1

Temporal patterns of car crashes. (a) SWI of traffic crashes in each time of a week in 2019 & 2020; (b) SWI index of casualties changing over time of weekdays in 2019 & 2020; (c) SWI index of casualties changing over time of weekends in 2019 & 2020

Spatial Analysis of Road Traffic Crashes

High-risk locations of traffic crash in London pre- and during the pandemic are detected by NetKDE, as shown in Fig. 2, and colored in red. It illustrates that crashes are mainly distributed in the central part of London and extended in all directions alongside the road network, with most being located at road intersections and complex road structures such as roundabouts. Besides, there are smaller number of high-risk locations of crashes in 2020 than those in 2019, which shows a hollow distribution.

Fig. 2
figure 2

Network KDE of crashes in 2019 & 2020

To further explore the riskiest roads, the top four and first densest road segments in 2019 and 2020 respectively, which are in the same density interval, had been selected in North London or nearer to central London (Fig. 3a and b). In 2019, the first road segment is Seven Sister Roads in the borough of Haringey of north London, in a busy multicultural neighbourhood near Finsbury Park and close to the major transport hub, the Finsbury Park Station featured busy local amenities. The second road is part of Regent Street in the borough of Westminster, hosting the most famous shopping centre and many flagship retail stores, shops, and restaurants. The third road locates at the intersection of Denmark Hill and Camberwell Green in Southwark, as the access road to one of the busiest stations in South London annoyed by congestion and overcrowding. The last road is Brixton Road in the borough of Lambeth, characterised by its being overcrowded but narrow with a pair of Bus Lines, and embracing several markets as well as the Brixton Tube Station. In 2020, there is only one riskiest road, Clapham High Street, which is near Brixton Road and has a similar characteristic. In terms of the hotspot analysis, it is clear that the pandemic and its prevention policies like lockdown have a great influence on crash hotspots.

Fig. 3
figure 3

The riskiest roads of Network KDE in 2019 & 2020

It is obvious that all these roads are bustling with POIs, i.e., shops, transport stations, and restaurants, evidencing the necessity to explore traffic crashes by varied POI categories.

The spatial cluster characteristics of two years’ traffic crashes had been measured by Moran’s I statistics, arriving at the significant global Moran’s I at 0.25 and 0.18 (p-value of 0.001), respectively, indicating positive spatial autocorrelation hence clustering tendency at LSOA level of severity-weighted crashes. To further locate where are the clusters, LISA cluster map had been drawn using local Moran’s I in Fig. 4a and b. It could be found that the four most significant high-high (HH) clusters are in northern London, the city centre (high density of POIs), and on the west and east (the Heathrow airport location) wings of the city. In addition, Fig. 4c and d highlighted similarities and differences pre- and during the pandemic, where the area in a black circle indicates those unchanged clusters, and areas in green or red circle are clusters experiencing an increase or decrease, respectively. For example, there are similar clusters of the Heathrow airport, Ealing borough (major metropolitan centre), and the City of London in 2019 and 2020, indicating the modest effect of the pandemic on such transport junctions and city centre. Majority of workplace clusters located in eastern London alongside the River Thames had shrunk in 2020, due to the work-from-home policy. On the contrary, parks and entertainments such as, Richmond Park, Hampton Court Park, Walthamstow Wetlands, etc., circled in green had expanded during the pandemics, showcasing citizens’ rising leisure activities.

Fig. 4
figure 4

LISA cluster map for crashes in 2019 & 2020

Factor Analysis of Road Traffic Crashes

A geographical detector measures the individual and interactive effects of POIs on crash distribution and realise the POIs’ categorisation through the JNB division method. A further factor detector implied the categorisation’s statistical significance pre- and during the pandemic, as shown in Table 4. In general, cliff drops of PD values had been witnessed but with unchanged rankings throughout the observation years, except for the rank for SHOP ( ) increased from 5th to 3rd in 2020. It shows that the impact of POIs distribution on car crashes remained relatively stable over time by interpreting the PD value. For instance, the PD values for STATION ( ) were the highest, with 43.2% and 29.6% for corresponding years respectively, followed by the second highest PD values for WORK ( ) at 0.397 and 0.229 in these two years, respectively. The locations of these two factors could then signify car crash spatial distributions.

Table 4 PD values of individual and interactive effects of POIs in 2019 & 2020

An interaction detector had been utilised to further quantify the interconnected effect of paired factors on crashes in 2019 and 2020. In theory, it would be 36 pairs of interactions among nine factors, but only listed the top five in Table 4 for information, with STATION as the compulsory. There are also declines of PD values for the consistent top four pairs of interactive factors, where the SHOP interacting with STATION that ranks from 3rd to 1st pre- and during the pandemic, by replacing the interaction between RESTAURANT ( ) and STATION. Such results implied a possible high risk of car crashes in areas with a higher density of top five POIs near transport stations. With restrictions on eating indoors and the rise of online shopping, people may have visited shops near transport stations more often than restaurants, leading to an increased risk of car crashes in these areas. Another difference is that the 5th interactive factor has been changed from STATION interacting with ENTERTAINMENT ( ) to its being interacting with EDUCATION ( ). Londoners may use public transport more often to travel to educational institutions, instead of visiting entertainment venues which experienced shut down or being less accessible due to pandemic restrictions.

Spatiotemporal Data Mining of Road Traffic Crashes

To locate the locations and periods with higher car crashes rates by age group, this section models the crashes in each LSOA using two tensors, QUOTE \({\mathrm{A}}_{\mathrm{weekday}},{\mathrm{A}}_{\mathrm{weekend}}\upepsilon {\mathrm{R}}^{\mathrm{I}\times \mathrm{J}\times \mathrm{K}},\) or weekdays and weekends in 2019 and 2020, respectively. The three dimensions in the tensors are QUOTE \(\mathrm{ I}\) for regions, QUOTE \(\mathrm{ J}\) for age groups, and QUOTE for time slots in a single day. Since there are 4,835 LSOAs in London, I will be presented by QUOTE \(\mathrm{ r}=[{\mathrm{r}}_{1},{\mathrm{r}}_{2},\cdots ,{\mathrm{r}}_{4835}]\); J for age groups are QUOTE \(\mathrm{ a}=[{\mathrm{a}}_{1},{\mathrm{a}}_{2},\cdots ,{\mathrm{a}}_{7}]\) and K for time slots by an hour as QUOTE \(\mathrm{ t}=[{\mathrm{t}}_{1},{\mathrm{t}}_{2},\cdots ,{\mathrm{t}}_{24}]\). The SWI index, therefore, is stored in entry (i,j,k), which refers to a cumulative value of the SWI index of age groups QUOTE \({\mathrm{a}}_{\mathrm{j}}\) in LSOA QUOTE \({\mathrm{r}}_{\mathrm{i}}\) and time slot QUOTE \({\mathrm{t}}_{\mathrm{k}}\) throughout the study period.

Tensor Size Selection

The sizes of core tensors had been calculated respectively for 2019-weekday and 2020-weekday as well as 2019-weekend and 2020-weekend scenarios, with three decomposed time patterns for weekdays (morning peak hours, afternoon peak hours, and other times) and two distinctive patterns for weekends (evening peak hours and other times). After multiple attempts and integration of three similar age groups, the KL divergence parameter suggests a core tensor size at QUOTE \([\mathrm{13,5},3]\) for weekdays and QUOTE \([\mathrm{8,4},2]\) for weekends pre- and during the pandemic (Fig. 5). In Fig. 5a and c, the x-axis indicates different numbers of spatial patterns, and the y-axis lists the corresponding value of KL divergence, illustrating that the iteration of the parameter with initial value QUOTE \(\mathrm{ KL}=3.1 \& 3.2\) converges to nearly 2.4 & 2.5 where QUOTE \(\mathrm{ x}=13\) as the most appropriate size; the 2019-weekend’s and 2020-weekend’s selection in Fig. 5b and d exhibits four main patterns by age group, with 8 optimal spatial patterns being suggested by the KL divergence.

Fig. 5
figure 5

Size selections of four core tensors

Tensor Decomposition

These two tensors for car crashes in 2019 and 2020, respectively, QUOTE \({\mathrm{A}}_{\mathrm{weekday}}\) and QUOTE \({\mathrm{A}}_{\mathrm{weekend}}\), are decomposed separately into a core tensor and three matrices as presented: QUOTE \({\mathrm{G}}_{\mathrm{weekday}}\in {\mathrm{R}}^{13\times 5\times 3}\) and QUOTE \({\mathrm{G}}_{\mathrm{weekend}}\in {\mathrm{R}}^{8\times 4\times 2}\), Fig. 6 shows the spatial patterns, age patterns, and temporal patterns on weekdays are identifiable as matrices of (a) QUOTE \({\mathrm{A}}_{\mathrm{weekday}}\in {\mathrm{R}}^{4835\times 13}\), QUOTE \({\mathrm{B}}_{\mathrm{weekday}}\in {\mathrm{A}}^{7\times 5}\), QUOTE \({\mathrm{C}}_{\mathrm{weekday}}\in {\mathrm{T}}^{24\times 3}\), and (b) QUOTE \({\mathrm{A}}_{\mathrm{weekend}}\in {\mathrm{R}}^{4835\times 8}\), QUOTE \({\mathrm{B}}_{\mathrm{weekend}}\in {\mathrm{A}}^{7\times 4}\), QUOTE \({\mathrm{C}}_{\mathrm{weekend}}\in {\mathrm{T}}^{24\times 2}\). It should be noted that the value of the SWI index storing each entry of tensors is normalised to QUOTE \([\mathrm{0,1}]\) for nonnegative Tucker decomposition; the degree of importance for study subjects has been referred to as the “degree of importance”, where a high value means a vital role of a target object in its pattern and vice versa.

Fig. 6
figure 6

Tensor decomposition of tensors

Pattern analysis of road traffic crashes during weekdays

Upon tensor decomposition, Fig. 7 depicted daily patterns, age patterns and a core tensor of traffic crashes on weekdays in 2019 and 2020, respectively. Figure 7(a) exhibited the obvious daily pattern pre- the pandemic in the green line, with a peak at 8:00 am, when car crashes occurred in the morning peak hours; this morning peak hour is not obvious in 2020 but the dinner peak hours with a peak at 5:00 pm. The orange line peaked at 5:00 pm—6:00 pm in 2019 and 4:00 pm in 2020 are approximately the afternoon rush hours, implying the afternoon peaks of car crashes in 2020 appear earlier than those in 2019; the blue line illustrated the pattern during off-peak hours.

Fig. 7
figure 7

Decomposed patterns of 2019-weekday & 2020-weekday tensor. (a) and (b) Temporal pattern; (c) and (d) Age pattern; (e) and (f) Examples of matrices of Core tensor. The matrices of core tensor are only for illustrating our workflow but not for comparative analysis, so one example of this matrix is shown for two years respective

In light of the varied age groups presented in Fig. 7c and d, the purple colour with higher proportions indicated the main subject of the age-group pattern, especially the patterns for 26–35-year-olds, 36–45-year-olds, 46–55-year-olds, 18–25-year-olds, and 0–18-year-olds. Other age group patterns were difficult to detect due to their possible high similarities. In total, 13 spatial patterns have been detected with complicated information about car crashes in these two years, contributing to the later geographical detector. To match with the core tensor reflected nexus of such patterns at three dimensions, four further sliced matrices from five age dimensions had been drawn in Fig. 7e and f, where the light purple indicates a high degree of closeness. For example, the first matrix reflected first age patterns for 26–35 age group, and it has 1st spatial pattern & off-peak hours daily pattern, 3rd spatial pattern & afternoon peak hour daily pattern, and 2nd spatial pattern & morning rush hour daily pattern. It should be noted that the third and fourth age patterns have a major pattern and minor pattern, rendering their potential similar patterns with other age groups.

Pattern Analysis of Road Traffic Crashes During Weekends

In a similar design, decomposed tensor results on weekends for two years were presented in Fig. 8. As described in Sect. 4.1, most car crashes happened during the weekend evenings. In Fig. 8a, the orange line depicted that age-involved crashes occurred at 20:00 were distinctive, hence being reorganised as the pattern of evening peak hours; the blue line indicated car crashes distribution at other times on the weekends.

Fig. 8
figure 8

Decomposed patterns of weekday tensor. (a) Temporal pattern; (b) Age pattern; (c) Matrices of Core tensor

Figure 8b and d clearly exhibited four age patterns for people aged between 26 and 35, 18 to 25, 36 to 45, and 0–18 & 46–55. The patterns for people over 55 are not revealing due to their being similar as on weekdays. It could be observed that people aged 0–18 and 46–55 have the same crash patterns. There are eight spatial patterns and a core tensor for weekends being sliced by age dimensions but in identified metrics, with the last matrix representing the age patterns of 0–18 and 46–55.

Spatiotemporal Pattern Analysis for each Age Group

On basis of such spatial and temporal patterns in 2019 and 2020 by age group, both dominant POI factors and interactive effects of severity-weighted crashes could be explored through geographical detector, with results presented in Tables 5 and 6.

Table 5 Dominant influence and interaction of factors for age groups during weekdays in 2019 & 2020
Table 6 Dominant influence and interaction of factors for age groups during weekends in 2019 & 2020

According to Table 5, STATION and WORK factors have the same significant effects on car crashes involving every age group on weekdays pre- and during the pandemic, except for victims aged above 55 years old. Nevertheless, it is obvious that every age group has different dominant POI factors. In terms of 0–18 age group specifically, POIs factors do not have significant PD-values in 2020, except for the EDUCATION and its interaction with ENTERTAINMENT during dinner peak hour. It suggested that residents had changed their routine behaviour throughtout the pandemic, i.e., spending more time at home rather than visiting POIs as frequently as they would have pre-pandemic, and areas with a high density of schools should be considered to protect them from crashes. Adults (age groups 19–25, 46–55), had similar changes of dominant factor, the SHOP imposing its greater influence on patterns in 2020 than 2019. With more people working from home and avoiding crowded places, the number of people paying visit to shops near transport stations had increased. Besides, people may take shopping as a replacement of entertainment due to the shutdown of entertainment venues. These behaviour changes were likely to be found in younger adults, who were more inclined to engage in leisure and shopping activities in commercial areas; as well as those middle-aged adults, who had more disposable incomes and greater possibility to work from home, or flexibility in adjusting their working hours. It illustrates more car crashes near the areas with a high density of shops, especially shops near attractions, entertainment venues and workplaces, during afternoon and dinner peak hours. For age group of 26–35 and 36–4, the RESTAURANT and WORK as well as their interactions with STATION become the dominant factors in 2020, respectively, identifying they may drive more frequently to restaurants were reopen or offering take-aways, and those risky areas with a high density of workplaces and restaurants around transport stations.

In comparison to weekdays’ factors, the WORK on weekends has lower effects on crash distribution and led out the dominant effects to ENTERTAINMENT and RESTAURANT since people tend to have more leisure time hence be able to engage in these recreational activities. There are also differences of distinctive factors and interactions at the weekend by age group pre- and during the pandemic, especially during evening peak hours. For example, RESTAURANT becomes the dominant factor for age groups 0–18 and 46–55, during the evening peak hours, although STATION and its interactions with EDUCATION and ENTERTAINMENT have the dominant effect in 2020 because restaurants tend to be located in areas that are popular for entertainment and may also be located near schools. Besides, during evening peak hours, the individual factors and interactive factors for 19–25 and 26–35 age groups have been changed from ENTERTAINMENT, RESTAURANT, and their interactions with STATION to PARKING, STATION, WORK and SHOP interacting with ENTERTAINMENT during the pandemic. This could be because people were more likely to engage in outdoor activities and visit places with more parking, such as parks or shopping centres.

In summary, in the observing years, the distributions of transport stations, workplaces, and restaurants significantly affected casualty patterns of all age groups, while transport stations nearer to restaurants and workplaces have the largest impact on crashes patterns, calling for extra attention to deploy corresponding countermeasures in such areas to reduce traffic crashes risk behaviours, and to apply certain technique support in consolidating motorway safety engineering. Additionally, these results can be applied to identify the riskier areas requiring extra attention towards better traffic management and control strategies, tailoring for featured age groups through their featured activities pre- and during the pandemic.

Conclusion

This paper provides an in-depth analysis of the changes in factors influencing car crashes during the COVID-19 pandemic and how these changes differ by age group, day of the week and time of day, and suggests potential interventions that can be taken to reduce the risk of crashes in specific areas. The findings of this paper can be used by policymakers, urban planners and transport engineers to improve road safety after the pandemic and future pandemic.

Upon recognising the spatial and temporal variations of traffic crashes in London in two representative years towards recovery from pandemic, it had been found that weekdays and weekends exhibited distinct patterns with daily trends against the time of day. Specifically, the riskiest roads and the main high-risk locations were spatially distributed in central London but spread out along the road, with a higher likelihood to happen at busier places and highly correlated to their neighbouring areas. Especially, areas with many transport stations and surrounding shops require particular attention to local transportation planning and traffic management. Such findings could be used to target specific areas with collision-alleviation intervention measures, such as traffic calming measures during rush hours or improved pedestrian crossings; it could also be informative for urban planners and policymakers to make evidence-based decisions on future design of POIs, taking into consideration of local safety. Hereby the study contributed to the literature on age-group involved crashes’ casualty and daily human activity analysis through its utilisation of geographical detector, in the results to discover their varied latent mechanism of spatiotemporal patterns with defined POI categories. In addition, it highlights the importance of data-driven approach for age-group involved collision patterns and their relationship to the built environment. Policymakers could develop evidence-based targeted intervention measures on basis of such findings, and further evaluate their effectiveness to improve road safety for different age groups over time.

Despite its contribution, some limitations could get improved in future extended work. For example, traffic demand for public transport has been taken as a temporary phenomenon with a speedy recovery rise after a short-term decline. Analysis of their long-term changes should be further undertaken to support traffic management decision-making in the long run.Besides, the spatiotemporal patterns of casualties aged over 55 have been unclear, which may owe to its relatively lower proportion of casualties hindered the representative pattern emerging; some more POI categories, such as hospitals and petrol stations, deserve to be added for comprehensive studies in the future towards a more in-depth study.

In the future, retrospective research on deploying up-to-date data will be conducted once with official datasets in place, to test the methodologies proposed in this study onto the real data and try to capture the full picture on London’s road safety upon recovery from the pandemic.