1 Introduction

According to the latest report of the National Statistical Institute, more than 150,000 road accidents occurred during 2021 in the Italian territory, of which 2875 were deaths within 30 days. Early estimates for the first semester in 2022 (January–June) highlight an increasing—though not alarming—number of road accidents compared to the pandemic period. The major causes of such events can be addressed as distracted driving, failure to give way, and speeding. Although the long-term trend of road accidents in Italy has been decreasing in the last two decades, the social costs of such events are still huge. The monitoring of road accident occurrences is then essential for the identification of crash hot spots and the timely implementation of prevention policies. Recent technological developments are finally enabling straightforward and cheap ways of recording the exact space-time location of the vehicles at the moment of the accident, favouring the implementation of advanced statistical techniques to model the dynamics of the accident occurrences. For example, since 2014, road authorities of the city of Rome have been assigned the task of recording the time and the location of every car accident on which their intervention is required. These geo-referenced records are collected monthly and published for public use at (https://dati.comune.roma.it/catalog/dataset?tags=Incidenti &groups=sicurezza-urbana). This huge gold mine of public data has already drawn some attention (Comi et al. 2018; Alaimo Di Loro et al. 2019), as many other similar sources focused on road accidents (Kalair et al. 2021; Borgoni et al. 2021; Gilardi et al. 2022).

The most natural way to model an observed point pattern is represented by spatio-temporal point processes (Daley and Vere-Jones 2003; Diggle 2013). A very interesting challenge in modelling road accidents is understanding whether the occurrence of an event increases the risk of other events in its proximity and, if so, quantifying the number of subsequent crashes triggered by the first occurrence (Acker and Yuan 2019). This sort of cascading effect might be due to the direct effect of the original crash or its indirect consequences: increased traffic congestion, lane reduction, reduced visibility, rubbernecking, etc. This particular dynamic in point processes is known as self-excitation and it is the defining property of the Hawkes process (Hawkes 1971; Reinhart 2018). It has been largely used in the literature to characterize the clustered point patterns arising from earthquakes (Ogata 1988; Zhuang et al. 2004) or financial shocks (Bacry et al. 2015; Hawkes 2018). Its application has been recently extended to many more contexts, e.g. crimes (Mohler et al. 2011; Zhuang and Mateu 2019), infectious diseases (Chiang et al. 2022; Briz-Redón et al. 2023)) and road accidents (Li et al. 2018; Kalair et al. 2021). Kalair et al. (2021) is the first contribution to propose such modelling in a spatio-temporal setting, where the road accidents occurring on a single (uni-dimensional) one-way road are disentangled into primary and secondary accidents. We expand on this work to model the event occurrences on a whole road network, road segment by road segment. For estimation purposes, we take the lead from the EM-based stochastic-reconstruction algorithm proposed by Zhuang and Mateu (2019) and extend it (i) to account for a buffering area, (ii) to propose an alternative smoothing of the spatial excitation function that enforces isotropy (as for the histogram estimator by Fox et al. (2016)), (iii) to account for the effect of covariates in the excitation component, (iv) to use the Expected Complete-Data Log-likelihood (ECDL) for the branching structure of the Hawkes process (Hawkes and Oakes 1974). (i) It is especially useful to mitigate the edge effects that bias the kernel smoothing and the Hawkes process estimation near the boundaries of the domain (Zhuang et al. 2004; Silverman 2018; ii) increases the model parsimony limiting its ability to over-fit on the available data; (iii) can be useful to determine what characteristics of an event make it more prone to trigger others; (iv) the use of ECDL eases the numerical maximization of the likelihood (Veen and Schoenberg 2008). The estimation algorithm presents several computational bottlenecks for increasing data size and domain resolution. Therefore, we implemented several strategies to expedite the estimation process.

The remainder of the paper is organized as follows: The available data are described in Sect. 2; the model is introduced in Sect. 3 and its estimation in Sect. 4; the application is shown in Sect. 5; finally, Sect. 6 provides the conclusions and some discussion.

2 Available Data

We use data that can be freely downloaded from the website of Roma Capitale (https://dati.comune.roma.it/catalog/dataset?tags=incidenti). This source contains all the road accidents that occurred on the streets under the jurisdiction of the city of Rome in which a patrol of the local police has intervened. In the present work, we consider all records referred to the latest three years fully available, i.e. from 2019 to 2021. The data contain one separate record for each subject involved in each road accident, with records referring to the same accident identifiable through a common protocol identification number (PID). A common issue of open data is that, in principle, they are not collected for statistical but for monitoring purposes only. Therefore, they are often keen to measurement errors and/or under-reporting. In our case, a major problem is represented by mistakes in the data entry of the accident’s location, which presented altered records in \(\approx 20\%\) of all records. Therefore, the raw data have undergone a pre-processing procedure consisting of three main steps: (i) grouping and combination by PID, (ii) correction and filtering of altered records, (iii) elimination of all accidents whose location is outside any of the Roman municipalitiesFootnote 1 or of those for which correction was not possible. This process reduces the original car accident count from 79,456 to 76,305. Further details of this data cleaning procedure are provided in Section A of the Appendix.

2.1 Data Description

Every record of the dataset includes ancillary information on the road accident—in addition to the exact time of the event and the location (lon, lat)—such as the number of injured–dead–unharmed people involved in the accident and the type of accident. Table 1 contains basic summary statistics at the yearly level for the road accidents that occurred in this area.

Table 1 Basic summary statistics of car accidents occurred in the Roma Capitale territory from 2019 to 2021: total number of accidents (N), total number of accidents with at least one injured or dead (\(N_{DI}\)), the rate of accidents with at least one injured or dead, the total number of deaths (D), the total number of injured (I), and the death and injury rates

Between 2019 and 2021, a total of 76,305 road accidents led to 34,730 injured and 309 dead people. The grand totals correspond to averages of 25,435 road accidents, 11,577 injured, and 103 dead per year, respectively. Notice that these averages are downward biased by the exceptionality of 2020, during which COVID-19 lockdown and restriction policies were in place. These numbers define the huge economic and social cost that car accidents have on the Roman community. The dataset includes also information about the type of accident, which is, however, categorized very roughly. Looking at the accident description, we were able to find 4 major categories that well-describe the main dynamic of the event. These new categories are: (i) type 1: observed for \(\approx 8\%\) of all events, describes a situation where a car runs over a person or an animal; (ii) type 2: observed for \(\approx 27\%\) of all events, describes a situation where a car hit a fixed obstacle; (iii) type 3: observed for \(\approx 16\%\) of all events, describes the case of a rear-end collision between vehicles; type 4: the most frequent (\(\approx 49\%\) of cases) and describes other crashes between two or more vehicles.

Fig. 1
figure 1

Temporal trends: monthly (a), weekly (b) and hourly (c)

Figure 1a compares the monthly trend of the three years. The effect of the COVID-19 lockdown is clear-cut visible from the sudden reduction in accident rates since March 2020. The rate started increasing again in June 2020 but reached the pre-pandemic levels only in July 2021 (end of the smart-working regulation in Italy). Figure 1b shows the average number of events during the weekdays and highlights how these present higher rates than weekends, with the minimum touched during Sundays. Figure 1c shows the hourly rates. The behaviour reflects the typical commuting patterns in the city of Rome, presenting very low levels during night-time and larger values during day-time, with peaks around 7:00–8:00, 10:00–11:00, and 17:00–18:00. It is interesting to notice how the morning peak around 7:00–8:00 is flattened out in both 2020 and 2021, a likely effect of the smart-working policy in place for all or part of the year. Figure 2 shows a preliminary (naive) intensity estimation of the car-accident occurrence in the analysed area.

Fig. 2
figure 2

Spatial trend: simple kernel estimate of the intensity of the point process, separately for each year. Dark colours correspond to larger intensities (and vice versa) (color figure online)

It is clear how the pattern is not homogeneous in space but has significant variations, with larger intensities towards the centre of the urban area (with a greater road density) and around the main roads. Figure 3 reports the joint distribution of spatial and temporal lags between pairs of car accidents.

Fig. 3
figure 3

Spatio-temporal triggering density (events per unit-time per unit-area) of the lags between subsequent events (cut to 0.2 km and 0.1 days). The brighter the colour, the higher the density (Color figure online)

The evident peak occurring for lags lower than 0.1 km (100 m) in space and than 0.025 days (\(\approx 40\) mins) in time highlights the clustered pattern of the car accident under analysis.

All these aspects motivate the need for a flexible spatio-temporal point-process model that can account not only for spatial and temporal heterogeneity but also for the daily and weekly cyclical behaviour, and the potential of a primary car accident to trigger secondary ones.

3 The Periodic Spatio-Temporal Hawkes Process

We seek to model the occurrence of traffic collisions over a spatio-temporal domain \(\mathcal{Q}=\mathcal{D}\times \mathcal{T}\), where \(\mathcal{D}\subseteq \mathbb {R}^2\) denotes the spatial dimension and \(\mathcal{T}=[0, T]\) the temporal dimension. We assume that the number of car-crashes \(N(B\times [t_1,t_2])\), where \(B\subset \mathcal{D}\) and \([t_1,t_2]\subset \mathcal{T}\), is the result of a simple and (locally) finite spatio-temporal point process. It can be defined through the suitable specification of the conditional intensity function:

$$\begin{aligned} \lambda _c(\varvec{s}, t)=\lim _{\textrm{d} \varvec{s}, \textrm{d} t\rightarrow 0}\frac{\mathbb {E}\left[ N(\textrm{d} \varvec{s}, \textrm{d} t) \left| \mathcal{H}_t\right. \right] }{\textrm{d} \varvec{s}\cdot \textrm{d} t}, \end{aligned}$$
(1)

where \(\mathcal{H}_t=\left\{ (\varvec{u}, \tau )\right\} _{\tau <t}\) is the history of the process up to time t (t excluded). As suggested in Sect. 1, point patterns arising from road accidents present an inhomogeneous and clustered pattern: (i) events are not spread uniformly along the observed region \(\mathcal{Q}\), mainly because the risk of collision is affected by space-time-varying factors; (ii) events may exhibit self-excitation behaviour due to sudden and unexpected slowdowns or other consequential traffic events. These two components can be jointly captured by expressing Eq. (1) as one of a spatio-temporal Hawkes process \(\lambda _c(\varvec{s}, t)=\mu (\varvec{s}, t) + \int _0^t\int _\mathcal{D}g(\varvec{s}-\varvec{\sigma }, t-\tau )\textrm{d} N(\textrm{d}\varvec{\sigma }\times \textrm{d}\tau )\). \(\mu (\cdot ,\cdot )\) is the background intensity, describing the space-time-varying general risk of a car accident, and \(g(\cdot ,\cdot )\) is a space-time excitation function, describing the space-time propagation of the excitation of each event. In particular, we express these two components as in the semi-parametric, periodic, and space-time separable version by Zhuang and Mateu (2019):

$$\begin{aligned} \begin{aligned} \lambda _c(\varvec{s}, t)&=\mu _0\cdot \mu _{s}(\varvec{s})\cdot \mu _t\left( t\right) + A\cdot \int _0^t\int _\mathcal{D}g_s(\varvec{s}-\varvec{u})\cdot g_t(t-\tau ) N\left( \textrm{d} \varvec{u}\times \textrm{d} \tau \right) , \end{aligned} \end{aligned}$$
(2)

where \(\mu _s(\cdot ), \mu _t(\cdot )\) are the spatial and temporal background intensities such that their average value over \(\mathcal{D}\) and \(\mathcal{T}\) is 1, \(g_s(\cdot ), g_t(\cdot )\) are the spatial and temporal excitation functions such that their integral over \(\mathcal{D}\) and \(\mathcal{T}\) is 1, and \(\mu _0,\, A>0\) are two real-valued parameters that regulate the overall level of the background and the excitation. The purely spatial term \(\mu _s(\cdot )\) accounts for proportional spatial variations in the risk of collision (due to e.g. generally higher traffic levels, low visibility, rough roads, dangerous intersections, etc.) and shall vary freely on the whole spatial domain \(\mathcal{D}\). The purely temporal term \(\mu _t(t)\) must capture the proportional variations in the long-term trend, and the daily and weekly periodicity of collisions. This need can be formalized by expressing \(\mu _t(t)\) as the product of three components: \( \mu _t(t)=\mu _d\left( \textrm{d}(t)\right) \cdot \mu _w\left( w(t)\right) \cdot \mu _{tr}\left( t\right) , \) where \(\mu _d(\cdot ), \mu _w(\cdot )\) model the daily and weekly periodicity, \(\mu _{tr}(\cdot )\) represents the long-term trend, and \(d(\cdot ),\, w(\cdot )\) match each time \(t\in \mathcal{T}\) with the corresponding day-hour and week-day. Finally, the spatial and temporal excitation functions \(g_s(\cdot ),\, g_t(\cdot )\) are two decreasing functions that describe the increase in risk occurring in the spatial and temporal proximity of a previous accident at different lags, respectively.

Isotropic excitation. The original parametrization by Zhuang and Mateu (2019) considers a potentially anisotropic excitation in space, with the excitation magnitude of event \(\varvec{s}=(x, y)\) on location \(\varvec{s}'=(x', y')\) depending on the whole separation vector \(\varvec{s}'-\varvec{s}=(x'-x, y'-y)\). However, the typical expression of the spatio-temporal Hawkes process when adopted in a parametric framework considers an isotropic spatial excitation that is a function of the Euclidean distance between the primary event and nearby locations:

$$\begin{aligned} g_s\left( \varvec{s}'-\varvec{s}\right) =g_s\left( ||\varvec{s}'-\varvec{s}||\right) =g_s\left( \sqrt{(x'-x)^2+(y'-y)^2}\right) , \end{aligned}$$
(3)

so that \(g_s(\cdot ):\,\mathbb {R}^+\rightarrow [0,+\infty )\). This imposes additional structure on the excitation decay, which cannot behave freely across different directions, and induces a more parsimonious model specification. On a more general note, the freedom to decay differently across different directions is redundant in most applications and the unconstrained smoothing often results in an (approximately) isotropic excitation in space. Exploiting the isotropy as a preliminary assumption can be beneficial from multiple standpoints: (i) computational convenience, both in terms of computing runtime and storage memory; (ii) more robust estimation of the excitation at a certain distance, as the smoothing is performed on a lower-dimensional space; (iii) reduced liability to overfitting, as the resulting model is less flexible and less free in adapting to the current data configuration. More details are provided in Section C of the Appendix, where all statements are supported by a simulation study. Nevertheless, achieving a nonparametric kernel estimate of \(g_s(\cdot )\) under the isotropic setting has some implications on the smoothing process, which is discussed in Sect. 4.

Generalized linear model on the excitation. The spatial and spatio-temporal Hawkes process is widely used in the seismological literature in the guise of the Epidemic Type Aftershock Sequence (ETAS) model (Ogata 1988, 1999; Zhuang et al. 2002, 2004; Veen and Schoenberg 2008). The parametric functional form given to the excitation function is based on scientific knowledge of the seismological process, also supported by decades of studies on earthquake aftershock sequences (Omori’s law). In particular, the excitation effect is usually assumed to be (positively) associated with the magnitude of the primary event.

In other contexts, what characteristics of the event and in what directions they should act are not as straightforward to determine. Nevertheless, we can express the effect of the available characteristics on the excitation coefficient A through a linear prediction and a suitable link function. Let \(\varvec{x}_{i}\) be the \((k\times 1)\) vector of covariates available on each event, we can express:

$$\begin{aligned} \log \left( A_i\right) = \varvec{x}_i^\top \cdot \varvec{\beta },\quad i=1,\dots , n, \end{aligned}$$
(4)

and the right-hand expression of Equation (2) becomes:

$$\begin{aligned} \int _0^t\int _\mathcal{D}A(\varvec{u})\cdot g_s(\varvec{s}-\varvec{u})\cdot g_t(t-\tau ) N\left( d \varvec{u}\times d \tau \right) =\sum _{t_i<t} A_i\cdot g_s(\varvec{s}-\varvec{s}_i)\cdot g_t(t-t_i). \end{aligned}$$

In the same way, given a set of \(\ell \) covariates available at each space-time locations \(\varvec{z}(\cdot ,\cdot )\), the background coefficient can be expressed as \(\log \left( \mu _0(\varvec{s}, t)\right) = \varvec{z}(\varvec{s}, t)^\top \cdot \varvec{\alpha }\), where \(\varvec{\alpha }\) is a \(\ell +1\) vector of coefficients. While straightforward to formulate, the inclusion of this term for space-time varying covariate can easily become unfeasible for large spatio-temporal domains and, in the sequel, we refer to the constant specification \(\mu _0(\varvec{s}, t)=\mu _0\).

4 Semi-parametric Estimation

Let \(\mathcal{O}=\left\{ (\varvec{s}_i, t_i)\right\} _{i=1}^n\subset \mathcal{Q}\) be the realization spatio-temporal self-exciting point process. We want to achieve a nonparametric estimation of all components building up the conditional intensity function \(\lambda _c(\cdot )\). Whilst the nonparametric estimation is a standard solution for the background rate in the point-process literature, the nonparametric estimation of the excitation function is a quite novel aspect originally introduced in Marsan and Lengline (2008) and further extended in Mohler et al. (2011); Zhuang and Mateu (2019); Kalair et al. (2021). As a matter of fact, in many real-world applications, there is no physical theory (or previous literature) to support any specific functional form of \(g_s(\cdot )\) and \(g_t(\cdot )\). Hence, the nonparametric estimation provides a fully data-driven approach to their estimation. Model-based inference for the conditional intensity function \(\lambda _c(\cdot )\) and its components can be obtained by maximizing the log-likelihood function for a general point process (GPP):

$$\begin{aligned} \ell _{\lambda _c}(\mathcal{O})=\sum _{i=1}^n\log \left( \lambda _c(\varvec{s}_i, t_i)\right) -\int _0^T\int _\mathcal{D}\lambda _c(\varvec{\sigma }, \tau ) \textrm{d} \varvec{\sigma }\textrm{d}\tau \end{aligned}$$
(5)

However, the conditional intensity function of a self-exciting process presents various discontinuities arising at each observation because of the resulting sudden excitation effect. This hinders the possibility of directly getting a nonparametric estimate of the spatio-temporal variations of \(\lambda _c(\cdot )\) through kernel smoothing or other methods. Furthermore, the log-likelihood happens to be ill-behaved and nearly flat in large regions of the parameter space, troubling numerical maximization algorithms and making convergence extremely slow (or fail altogether).

Fig. 4
figure 4

a Clustering representation of a Hawkes process: solid circles are background events, and open circles and squares are excited events (Reinhart 2018); b example of background/excitation contributions for events from a self-exciting process

The cluster-process representation of a Hawkes process by Hawkes and Oakes (1974) provides a possible solution to this issue. Events are split in background events and triggered events (effect of the excitation of previous events, see Fig. 4a). If labels denoting which events are background and which events are triggered are available, then the complete log-likelihood becomes:

$$\begin{aligned} \begin{aligned} \ell ^C_{\lambda _c}(\mathcal{O})&=\sum _{i=1}^n\mathbb {I}(u_i=0)\cdot \log \mu (\varvec{s}_i, t_i)+\sum _{i=1}^n\sum _{j:t_j<t_i}^n\mathbb {I}(u_i=j)\cdot \log g \left( \varvec{s}_i-\varvec{s}_j, t_i-t_j \right) +\\&\quad -\int _0^T\int _\mathcal{D}\lambda _c(\varvec{\sigma }, \tau ) \textrm{d} \varvec{\sigma }\textrm{d} \tau , \end{aligned} \end{aligned}$$
(6)

where \(u_i=0\) if event i is a background event, \(u_i=j\) if event i is triggered by event j, and \(\mathbb {I}(\cdot )\) is the indicator function (Veen and Schoenberg 2008). The maximization of Eq. (7) is much easier as background events only contribute to the background part of the likelihood and vice versa. This cluster representation becomes extremely useful if we want to perform the non-parametric estimation of the background and excitation functions. In particular, all background events \(\left\{ i\,:\,u_i=0\right\} \) can be used to estimate the background components; while all triggered pairs \(\left\{ i, j\,:\,u_i=j\right\} \) can be used to estimate the excitation components. Given that these labels are not available, we need a way to quantify how likely each event is a triggered or a background event. The corresponding probabilities are:

$$\begin{aligned} \begin{aligned}&\text {Pr}(u_i=j)=\rho _{ij}= {\left\{ \begin{array}{ll} \frac{A_j\cdot g_s(||\varvec{s}_i-\varvec{s}_j||)\cdot g_t(t_i-t_j)}{\lambda _c(\varvec{s}', t')}\; &{} \quad t_i>t_j\\ 0\; &{} \quad \text {otherwise} \end{array}\right. }\\&\text {Pr}(u_i=0)=\phi _i=1-\sum _{j}\rho _{ij}=\mu _0\cdot \frac{\mu _s(\varvec{s}_i)\cdot \mu _t(t_i)}{\lambda _c(\varvec{s}_i,t_i)}, \end{aligned} \end{aligned}$$
(7)

which can be used to evaluate the Expected Complete-Data Log-Likelihood (ECDL):

$$\begin{aligned} \begin{aligned} \ell ^E_{\lambda _c}(\mathcal{O})&=\sum _{i=1}^n\phi _i\cdot \log \mu (\varvec{s}_i, t_i)+\sum _{i=1}^n\sum _{j:t_j<t_i}^n\rho _{ij}\cdot \log g(\varvec{s}_i-\varvec{s}_j, t_i-t_j)+\\&\quad -\int _0^T\int _\mathcal{D}\lambda _c(\varvec{\sigma }, \tau ) \hbox {d}\varvec{\sigma }\hbox {d}\tau . \end{aligned} \end{aligned}$$
(8)

It is now self-evident how \(\lambda _c(\cdot ,\cdot )\) and all its components are necessary to evaluate these probabilities, and these probabilities are necessary to allocate each event in the background or triggered set. This circularity can be solved using an EM strategy where the update of Equation (7) is alternated with the maximization of the log-likelihood (Marsan and Lengline 2008; Marsan and Lengliné 2010). Mohler et al. (2011) considers the Stochastic Declustering (Zhuang et al. 2002), where events are randomly allocated to one of the two sets at each iteration according to the current values of (7). We here consider the stochastic reconstruction (SR) algorithm originally introduced (Zhuang et al. 2004) and later used in Zhuang and Mateu (2019) and Kalair et al. (2021).

4.1 The Stochastic Reconstruction

In the stochastic reconstruction, all components are estimated nonparametrically and all events contribute to the estimation of all components proportionally to their probability of being a background or triggered event (see Fig. 4b). The nonparametric estimation of the functions can then be achieved through weighted kernel smoothing. A major issue with kernel smoothing on limited domains is the bias occurring near the domain boundaries. This is especially true when the target intensity does not naturally decrease to 0 near the edges. Several strategies (mostly based on prior knowledge of the target intensity shape) have been proposed in the literature to mitigate this problem (Chiu 2000; Silverman 2018). Here, we suggest considering a buffering area, say \(\mathcal{B}_b\times [-b, T+b]=\mathcal{Q}_b\supset \mathcal{Q}\), for smoothing purposes. All data points in \(\left\{ (\varvec{s}_i, t_i)\right\} _{i=1}^{n_b}\subset \mathcal{Q}_b\), with \(n_b>n\), are used to smooth the functional forms of the background and excitation components. Instead, the conditional intensity function and the likelihood are evaluated only on points in the main area \(\left\{ (\varvec{s}_i, t_i)\right\} _{i=1}^{n}\subset \mathcal{Q}\). Not only does this overcome the kernel boundary bias in the spatial and long-term component of the background, but it also allows accounting for the excitation of points in \(\mathcal{Q}_b/\mathcal{Q}\) on points in \(\mathcal{Q}\). As noted in (Zhuang et al. 2004), neglecting this effect would yield bias near the edge of the notwithstanding the way the functions are estimated.

Smoothing the Background Components The spatial background function \(\mu _s(\cdot )\) is smoothed by averaging every single event \((\varvec{s}_i, t_i),\, i=1,\dots ,n_b\) with weights proportional to the corresponding background weights \(\phi _i\):

$$\begin{aligned} \hat{\mu }_s(\varvec{s}) \propto \sum _{i=1}^{n_b} \phi _i \cdot \frac{k^2_{\omega _s}(\varvec{s}- \varvec{s}_i)}{\int _{\mathcal{D}} k^2_{\omega _s}(\varvec{s}- \varvec{\sigma }) \hbox {d}\varvec{\sigma }}, \end{aligned}$$

where \(k^2_{\omega _s}(\cdot )\) is a two-dimensional kernel function with bandwidth \(\omega _s\) and the denominator is needed for the domain normalization. The temporal background components can be re-constructed very similarly but rely on a slightly different set of weights:

$$\begin{aligned} w^a_i = \frac{\mu _a(a(t_i))\cdot \mu _s(\varvec{s}_i)}{\lambda _c(\varvec{s}_i,t_i)},\quad \hat{\mu }_a(t) \propto \sum _{i=1}^{n_b} w^a_{i}\frac{k_{\omega _a}(a(t) - a(t_i))}{\int _{\mathcal{T}_a} k_{\omega _a}(a(t) - \tau ) \hbox {d}\tau }, \end{aligned}$$

where \(a(\cdot )\) maps the raw time \(t_i\) to the corresponding temporal scale (e.g. raw, daily, or weekly) and \(k_{\omega _a}(\cdot )\) is a one-dimensional kernel function with bandwidth \(\omega _a\) (Zhuang 2006). The long-term trend does not require any specific boundary correction as the problem is already addressed by the buffering area. The daily and weekly periodic components, instead, can benefit from the adoption of a periodic correction (Silverman 2018; Kalair et al. 2021).

All smoothed background components are then rescaled so that their average on \(\mathcal{Q}\) is equal to 1.

Smoothing a spatially isotropic excitation function The reconstruction of the excitation components can be obtained by weighted kernel smoothing of all the pairwise space lags \(\delta _{ij}=||\varvec{s}_i-\varvec{s}_j||\) and time lags \(\tau _{ij}=t_i-t_j\) such that \(t_i>t_j\). The number of possible pairs ij scales quadratically with the size of the data \(n_b\) but most pairs lie far apart. In practice, the excitation must have a finite range, and only close pairs (in space or time) actually interact. One could decide on conservative cut-off points \(t^*, d^*\) and set the excitation functions to 0 for larger values while dropping all pairs such that \(\tau _{ij}>\tau ^*\) or \(\delta _{ij}>\delta ^*\). Thus, the two functions can be reconstructed on \((0, \tau ^*)\) and \((0, \delta ^*)\) by averaging each pair with weights proportional to the relative importance of the inter-event excitation at location \((\varvec{s}_i, t_i)\). The expressions are:

$$\begin{aligned} \hat{g}_t(t) \propto \frac{\sum _{i, j=1}^{n_b} \rho _{ij}\frac{k_{h_t}\left( t - \tau _{ij}\right) }{\int _0^{\tau ^*} k_{h_t}\left( \tau - \tau _{ij} \right) d\tau }}{\sum _{j=1}^{n_b}\mathbb {I}(t_j+t<T+b)},\quad \hat{g}_s(d) \propto \frac{\sum _{i, j=1}^{n_b} \rho _{ij}\frac{k_{h_s}\left( d - \delta _{ij}\right) }{\int _0^{\delta ^*} k_{h_s}\left( \delta - \delta _{ij} \right) d\delta }}{\sum _{j=1}^{n_b}|\mathcal{C}(\varvec{s}_j, d)\cap \mathcal{D}_b|}, \end{aligned}$$
(9)

where \(k_{h_s}(\cdot ), k_{h_t}(\cdot )\) are one-dimensional kernel function, the outer denominators are the repetition correction, and \(\mathcal{C}(\varvec{s}_j, d)\) is the circle with centre \(\varvec{s}_j\) and radius d. In the case of the background smoothing, all \(n_b\) points contribute to the estimated value at all locations. However, in the case of lags between pairs of points, not all available \(n_b\) points can be leveraged to estimate the occurrence of a target lag \(t\in (0, t^*)\) or \(d\in (0, d^*)\). The repetition correction term is needed to adjust for this disequilibrium and plays a key role in the proper smoothing of the isotropic spatial excitation. When averaging the time-lag t only the points \(t_j\) such that \(t_j+t<T+b\) can actually contribute to the smoothing with a potential lag (that might or might not be observed). Points closer to the upper limit have no chance of contributing as that would occur in \(t_j+t>T+b\), outside the observed window. For the spatial excitation, it is not only a matter of how many observed points have the potential of having another point at lag d. It is also a matter of the differing potential that each event j has to have another point at lag d within the domain \(\mathcal{D}_b\). Indeed, we must consider that the larger the distance the wider the circle on which two points at the same distance may lie and the smoothing must account for the spreading of the mass across circles of different sizes. Therefore, the potential of each point must be proportional to \(|\mathcal{C}(\varvec{s}_j, d)|\) properly intersected with the buffer spatial domain \(\mathcal{D}_b\). Notice that if the spatial domain were the unlimited \(\mathbb {R}^2\), then \(\sum _{j=1}^{n_b}|\mathcal{C}(\delta _j, d)\cap \mathbb {R}^2|=\sum _{j=1}^{n_b}|\mathcal{C}(\varvec{s}_j, d)|\propto 2\cdot \pi \cdot d\), which corresponds to the scaling factor used for the pair-correlation function (Diggle 2013) and adopted by Fox et al. (2016) for their histogram estimator.

Let us point out that both excitation functions have a natural border in 0 and are expected to have a monotone-decreasing behaviour. For practical purposes, in the absence of a robust theoretical model, it would be desirable for the excitation functions to meet this left border with a flat form (\(g_{t}^{'}(0^{+})=g_{s}^{'}(0^{+})=0\)). We can achieve that by applying the reflection correction (Silverman 2018) to the smoothing of the temporal excitation function while such adjustment is not necessary in the case of the isotropic spatial excitation function. Indeed, the density of points at decreasing spatial lags naturally decreases to 0 (as an effect of the reducing circumference), and this decrease shall balance out with the above-mentioned repetition correction. After smoothing, the temporal and spatial excitation functions are rescaled so that \( \int _0^{\tau ^*}\hat{g}_t(\tau )\, \textrm{d} \tau =\int _0^{\delta ^*}2\cdot \pi \cdot \delta \cdot \hat{g}_s(\delta )\, \textrm{d} \delta =1\).

4.2 Estimation Algorithm and Computational Details

Given the set of smoothed functions, the optimal values of \(\mu _0\) and A (or of the corresponding covariates’ effects \(\varvec{\beta }\)), can be obtained by optimizing a target log-likelihood. All this procedure can be unified in an EM-type algorithm that alternates between the weights update, the function smoothing, and the log-likelihood optimization. We here modify the original versions in three major directions: (i) we consider all \(n_b\) events occurring in a buffering area \(\mathcal{Q}_b\supset \mathcal{Q}\) for smoothing purposes but evaluate the log-likelihood only on the n points within the main area \(\mathcal{D}\); (ii) we optimize the ECDL (8) in place of the GPP log-likelihood (5); (iii) we introduce a weight update between the smoothing and the ECDL optimization. A sketch of the pseudo-code is provided in Algorithm 1 of the Appendix. For the sake of brevity, we write it for constant coefficients \(\mu _0, A\) and the set of background and excitation functions is denoted as \(\left\{ \mu (\cdot ), g(\cdot )\right\} \).

It is well known that the EM algorithm may converge to local maxima of the log-likelihood function or singularities at the edge of the parameter space, where the log-likelihood is unbounded. To avoid such a problem, we pursue a multi-start strategy and run the EM algorithm from multiple random initializations keeping only the best solution. We stop the optimization when the increase of two successive log-likelihoods falls below \(10^{-4}\). Another drawback of the stochastic-reconstruction algorithm lies in its computational complexity. This is a very common problem in the point-process literature and is exacerbated by the second-order characteristics of the Hawkes process. At each iteration, all functions must be evaluated on a grid of given space-time locations \(\mathcal{G}\) (or space-time lags \(\mathcal{H}\)) and integrated over (possibly irregular) domains. The finer the grid, the better the approximation, but these operations scale with \(n\times |\mathcal{G}|\) in the background components and with \(n^2\times |\mathcal{H}|\) in the excitation components. The proposed excitation tapering (with cut-offs \(\tau ^*, d^*\)) can reduce the burden of the latter, but it is rarely sufficient to get a timely estimation for n large and fine grids (i.e. \(|\mathcal{G}|, |\mathcal{H}|\) large). However, it is worth noting that many quantities (the kernels, the repetition correction, and the sizes of the integration domain) remain unchanged through all iterations and can be computed only once at the beginning. This can noticeably expedite the estimation process but requires the storage of large matrices, which might be unbearable in terms of needed storage.

In our implementation, we exploit the sparse allocation of such matrices to reduce the memory burden while retaining most of the computational advantage of prior computation. At the same time, we exploit parallel computing any time possible and code the more evident computational bottlenecks in C++. All these strategies greatly enhance the computational performances, which is a key aspect of our application in Sect. 5 developed on a quite large dataset and a fine grid. Codes with a snippet example are available at https://github.com/PAlaimo/STHawkesSRIsoRoadNet.

5 Application

The model presented in Sect. 3 strongly relies on the quality and coherency of the input data and on the stationarity of presumed dynamics (periodicity and excitation). The territory of the city of Rome is vast and remarkably heterogeneous in terms of landscape, road typology and condition, population density, etc. Such characteristics are a major source of heterogeneity that might hinder a proper estimation of the model. Therefore, we consider the three years separately and focus on the road accidents occurred within the Great Ring Road (GRA) surrounding Rome. This major road outlines the urban area of the city, which is still huge: 346 km\(^2\) for a total of \(\approx 2985\) km of streets.

Note that all computations for the results of this section have been executed on the HPC TeraStat2, a high-performance computing infrastructure developed by the Department of Statistics of Sapienza University of Rome. The computing environment is equipped with 24 modern computational nodes, having up to 256Gb of RAM, and counting a total of 1920 cores (https://www.dss.uniroma1.it/en/HPCTeraStat2/Specifiche).

5.1 Model Fitting and Comparison

For the sake of brevity, we here report details on the model fitting and model comparison for the 2019 but the procedure follows the same rules and yields analogous conclusions on 2020 and 2021.

Fig. 5
figure 5

a Raster approximation (10 m \(\times \) 10 m) of the street network; b study area and observed point pattern (2019)

Figure 5a provides a snapshot of the gridded approximation of the road network and its spatial buffer. The latter is highlighted in red and is a 2-km buffering area in space surrounding the GRA. The 10 m \(\times \) 10 m grid approximation is able to well capture all road segments. Figure 5b reports all events of 2019 colour-coded for buffer or no buffer. The red dots inside the GRA are those that occurred in the two additional months in time, one preceding the beginning of the year (December 2018) and the other one following the end of the same year (January 2020). The n points reported in black directly contribute to the likelihood, while the red dots fall in the spatio-temporal buffer (the one inside the GRA are from the previous or following month) and are used for the smoothing process only. The smoothing is performed using Gaussian kernels with different bandwidths for each component. The bandwidth choice determines the smoothness of the resulting function, i.e. the resolution at which the function varies. The long-trend, weekly periodic, and daily periodic components have been smoothed with bandwidths of 7 (1 week), 1 (day), and 0.05 (1.1 h). For the spatial background, an isotropic Gaussian kernel with a varying-bandwidth approach has been chosen, with a minimum level set to 0.1 (i.e. 100 m), and individual bandwidths \(b_s(i)\) such that at least 10 other events were within \(b_s(i)\) distance from event i.

Finally, the temporal and spatial excitation bandwidths have been set to 0.03 (\(\approx 45\) min) and 0.05 (50 m) as they must capture smaller scale variations than the background components.

We consider the possibility that different types of accidents have different excitation abilities. Indeed, different accident dynamics might have varying degrees of severity (on average) or correspond to different consequences (e.g. modify the road conditions, the passing drivers’ behaviour, etc. ). We consider the \(k=1,\dots ,K\) categories introduced in Sect. 2 and use Eq. (4) to express the excitation coefficient as a function of the type of accident. This results in \(K=4\) different excitation effects for the different accident types. We fit the proposed full model (Full) and other several competitors with various degrees of complexity on this same data. We consider a full model including only one common A (Full—One A), a model without the periodic component of the background (NP), a model without the excitation effect (NE), and a model with neither the periodic nor the excitation components (NENP). We compare the model adaptations in terms of the GPP log-likelihood of Eq. (5). Table 2 shows that when the periodicity is taken into account, the model including the excitation effect provides a sensible improvement in terms of in-sample adaptation. The same data have been used to fit the anisotropic spatial excitation version of the full model. The results do not differ significantly and are reported in Section D of the Appendix.

This proves the explanatory role of the excitation component in describing the realized point pattern. Hence, all the following analysis and discussion is focused on the full model.

5.2 Model Diagnostic: Residual Analysis

A large value of the log-likelihood does not guarantee that the model fits the data adequately. We perform additional diagnostics to verify if the estimated conditional intensity rate \(\hat{\lambda }_c(\cdot )\) produces a residual analysis that complies with the model assumptions. For instance, it is established that for a given realization \(\mathcal{O}=\left\{ (\varvec{s}_i, t_i)\right\} _{i=1}^n\subset \mathcal{Q}\) of a spatio-temporal point process N with conditional intensity rate \(\lambda _c(\cdot , \cdot )\), the transformation of the observed time-sequence \(\tau _i=\Lambda _c(t_i)=\int _0^{t_i}\int _\mathcal{D}\lambda _c(\varvec{\sigma }, \tau )\textrm{d}\varvec{\sigma }\textrm{d}\tau \) yields a stationary Poisson process with unit rate in time (Daley and Vere-Jones 2007). From a qualitative point of view, this transformation is expanding or compressing arrival times according to the cumulative intensity \(\Lambda _c(\cdot )\). We consider a diagnostic based on this property (Ogata 1988), where the estimated conditional intensity rate is plugged in to get:

$$\begin{aligned} \hat{\tau }_i=\hat{\Lambda }_c(t_i)=\int _0^{t_i}\int _\mathcal{D}\hat{\lambda }_c(\varvec{\sigma }, \tau ) \textrm{d}\varvec{\sigma }\textrm{d}\tau . \end{aligned}$$
(10)

If the model is well-specified, the scaled transformed time sequence \(\hat{\tau }_i\) must be distributed as \(i\cdot Z\) where \(Z\sim Beta(i+1, n-i+1)\) for each positive integer \(i=1,\dots , n\). This provides a tool to build up a confidence interval and check for deviation from the model assumptions. Figure 6a shows the counting measure of the original times compared to the bisector (unit-rate Poisson process) and 6b shows the estimated transformed time sequence with the corresponding confidence bounds.

Table 2 Log-likelihood values at the parameter estimates of the different models for 2019
Fig. 6
figure 6

Residual diagnostic on the original and transformed times for 2019

Since it is poorly visible from Fig. 6b, we here report that \(95.3\%\) of the \(\hat{\tau _i}\) are included within the nominal 0.95 confidence bounds.

5.3 Estimated Parameters Across the Years

The MLE of the coefficients for all years is reported in Table 3. Given the large sample size under analysis, the uncertainty has been evaluated relying on the asymptotic normality of the MLE by evaluating \(\Sigma _{\hat{\mu },\hat{\varvec{\beta }}}=-\mathcal{H}\left( \ell _{\lambda _c}(\hat{\mu }, \hat{\varvec{\beta }})\right) ^{-1}\) (Rathbun 1996; Wang et al. 2010).

Background components The estimate \(\hat{\mu }_0\) represents the average intensity of road accidents across space and time and can be interpreted as the average number of primary road accidents occurring in each unit of space (10 m \(\times \) 10 m) and unit of time (1 day) in the Rome urban road network. The yearly estimates confirm and corroborate the preliminary discussion in Sect. 2 as the coefficient for 2020 is lower than for 2021 and 2019. The functional form of the estimated background components is reported in Figs. 7a–c and 8a.

Fig. 7
figure 7

a Cyclical components of the background: hour (left), day of the week (centre) and long term (right); b Excitation: temporal (left) and spatial (right)

The estimated long-term trend component of 2019 (Fig. 7a) highlights the typical seasonal variations, with the major peaks in January, March and October, and a tight and deep valley in August. The latter is easily explained by the greatly reduced presence of people (and traffic) during the Italian summer holidays. The long-term trend of 2020 clearly captures the abrupt reduction during the strict lockdown period that lasted from March to May and the second lockdown of October 2020. Finally, the long-term trend of 2021 shows a clear increasing trend from April 2021 onwards. Figure 7c shows the estimated weekly cyclical pattern that confirms the original impressions from Sect. 2. The high values on Thursday and Friday, at the edge of the weekend, may be explained by the combination of the standard commuting patterns and the night traffic associated with leisure activities. It is worth noticing that the peak is slightly shifted backwards in 2020 (Thursday instead of Friday), reflecting once again the restrictions (e.g. lockdown, green pass, etc.) imposed by the government for a large part of the year to reduce the spread of COVID-19. Figure 7b shows the estimated daily periodic components in the three years that do not vary sensibly and do not add much to what has already been said in Sect. 2. Figure 8a shows the estimated spatial background for 2019 only, for the sake of brevity. Indeed, this is once again almost identical across the three years, suggesting that the spatial dynamic of road accident occurrence within the GRA did not change significantly. It highlights some major hot spots and we report in the following just three examples, zooming in on these areas of major interest for the Roman community. The first area (see Fig. 8b) belongs to the VIII municipality, where the bridge (Ponte Marconi) separates the northwest side of the city from the southern part; the major road in the centre of the figure (Viale G. Marconi) collects all people that commute from the north and either go to Roma Tre University (one of the largest universities in Rome) or go to work in the southern area (EUR neighbourhood) that hosts central offices of large companies. Figure 8c instead shows the risk close to the second largest train and bus station of Rome (stazione Roma Tiburtina); the risk is larger on the major road in the centre of the figure (Via Tiburtina) that collects the commuting pattern of people living in the northeastern part of the city instead. Finally, Fig. 8d pinpoints a larger area where the risk of collision is high and includes a couple of major roads placed in the northern part of the city (behind Vatican City) that are usually very busy during the whole day.

Fig. 8
figure 8

a Spatial background in 2019; Zoom: b Ponte Marconi, c Ponte Tiburtino, d Baldo degli Ubaldi

Excitation components The estimates of the \(\beta \) coefficients of the excitation parameter A in Table 3 must be interpreted with respect to the baseline category 1 and reported on the exponential scale. They correspond to \(\hat{A}_1=0.058, 0.097, 0.028\), \(\hat{A_2}=0.197, 0.168, 0.218\), \(\hat{A_3}=0.074, 0.038, 0.076\), and \(\hat{A}_4=0.036, 0.035, 0.023\), for 2019, 2020 and 2021, respectively. The accidents of type 2—“vehicle hit fixed obstacle”—are those with the larger excitation effect.Footnote 2 They include eye-catching dynamics such as the vehicle overturn, which might capture the attention of the passing driver and make him lose focus on driving. If we want to get an estimate of the average number of observed road accidents that were actually due to the excitation effect, we can evaluate \(\sum _{ij}\hat{\rho }_{ij}\approx 328, 190, 293\), corresponding to percentages of \(1.5\%, 1.2\%, 1.4\%\) triggered secondary accidents. Figure 7d, e shows the estimated temporal and spatial excitation in the left and right panels, respectively. These do not differ significantly across the years. In particular, the excitation decreases steeply getting away in space or time from the original event. The effect is limited to \(\approx 200m\) radius in space and \(\approx 2\) h (0.1 of a day) in time. This can be a very useful indication for traffic authorities to delimit for how long and how far the area surrounding a primary accident should be kept under control.

Table 3 Estimated background and triggering coefficients from 2019 to 2021 (standard error). The triggering coefficients represent deviations from the baseline category (1)

6 Conclusion and Discussion

This research paper introduces the analysis of the spatio-temporal distribution of road accidents in the city of Rome. We consider all data from the year 2019 to 2021, as they are the most recent data available, with a particular focus on 2019. The final objective of the analysis is to verify if there is any excitation effect between road accidents occurring on Rome urban road network. Section 2 provides a deep exploratory analysis of the temporal and spatial features of the analysed point patterns. It motivates the need to include in the model a temporal trend with (weekly and daily) cyclical components, a non-homogeneous spatial intensity, and an excitation effect to capture the clustered point pattern. Therefore, we consider a semi-parametric and periodic spatio-temporal Hawkes process and extend the previous proposal in various directions. We restrict the domain to an approximation of the actual road network, where each road segment is represented by a 10 m \(\times \) 10 m, so that the process intensity is evaluated only on the portion of space where the accident can actually happen. We introduce a buffer area and extend the original stochastic reconstruction algorithm to take into account the shape of the domain and to smooth an isotropic spatial excitation. This is a particularly delicate aspect as in irregular domains such as the road network the specific road configuration might affect the estimation of the excitation in some directions. The simulation study in Appendix C substantiates the practical advantages of considering a constrained isotropic spatial excitation function. Finally, we generalize the excitation effect to depend on an arbitrary set of covariates. Previous preliminary attempts to analyse the same set of data using a similar model were hindered by the overwhelming computational cost to evaluate all components for such a large set of points (Alaimo Di Loro 2021). We here attain improved performances by implementing the core of the algorithm in C++ and adopting a number of computational strategies that go from multi-core parallelization to sparse-matrix representation.

The application of our model on this original set of data allowed identifying road accidents hot spots (in space and time) and verifying the presence (and strength) of the triggering effect between road accidents in an urban road network. The results suggest that the excitation has a sensible impact on the accident dynamics, with \(> 1\%\) of triggered events (\(> 200\) per year), and decays smoothly in space and time. In particular, it appears that some types of road accidents are more likely to cause others and trigger a cascading effect. All this gives policy-makers and local authorities precious indications for preventing and managing road accident events and reducing the chance of both primary and secondary collision.

There is still room for improvement in the proposed model and methods. Indeed, even if the domain on which the intensity is evaluated is an approximation of the road network, the excitation effect is still a function of the Euclidean distance between pairs of points/locations. Further developments will focus on respecting the topology of the linear road network in all the model aspects. (D’Angelo et al. 2022, 2022b) consider the linear network in a similar model specification, and we are currently working to combine their and our ideas using efficient tools to make the computations feasible. By then, the full application could be adapted to the linear road network of Rome to provide more reliable inference. We are also working on the possibility of including auxiliary covariates to explain differences in the background components and developing feasible cross-validation strategies for model and bandwidth selection, such as the forward predictive likelihood by Adelfio and Chiodi (2015a, 2015b). All this could yield a deeper understanding of the statistical properties of the road accident process and provide more accurate guidance in traffic management to local authorities.