Keywords

Introduction

Crime forecasting plays a vital role in law enforcement agencies’ efforts to prevent and address criminal activities. Accurate predictions regarding the spatial and temporal patterns of crime can assist in resource deployment, proactive intervention, and effective crime prevention strategies [1]. In this sense, it is essential to identify the possible crime hotspots within narrow regions spatially as general predictions on larger areas, such as the city or district level, do not allow to design and implement strategies to combat crimes effectively [2]. A substantial amount of previous research has been performed on the application of machine learning (ML) for the task of crime predictions [3]. In this chapter, we present a deep learning (DL) attention-based approach to geo-temporal crime forecasting.

Our research focuses on developing a transformer-based model specifically designed for crime forecasting. The model consists of an encoder that takes as input the crimes that occurred during a given context window of n days and a decoder that generates the forecasts for the next m days based on the input fed by the encoder. By leveraging the power of DL techniques, we aim to capture and utilize the intricate relationships between crime occurrences over time and their spatial context. The model leverages the power of transformers and attention mechanisms to capture the spatial and temporal correlations of crime occurrences. Our experimental results highlight the superior performance of our model and its potential to contribute significantly to the field of crime prevention and law enforcement efforts.

Related Work

Numerous studies have tackled the challenging task of geo-temporal crime forecasting, aiming to provide accurate predictions and assist law enforcement agencies (LEAs) in combating crime. In this section, we discuss relevant works that have explored different approaches and techniques in this domain.

Traditional statistical models, such as linear regression [4] and random forest [5], have been widely used for crime prediction and to identify possible crime hotspots. These models often rely on historical crime patterns and spatial information to identify correlations and forecast future crime occurrences. However, their limitations in capturing complex spatial and temporal relationships restrict their predictive capabilities.

Other ML approaches have also been employed for crime forecasting. Clustering algorithms, for instance, have been utilized to identify crime hotspots and spatial patterns [6]. These methods leverage spatial analysis to detect areas with high crime rates and predict future criminal activities. However, the absence of temporal dynamics may hinder their forecasting accuracy.

In recent years, transformer-based models have gained attention across various domains, including natural language processing and computer vision [7]. Their ability to capture long-range dependencies and model interactions across different input elements makes them suitable also for crime forecasting tasks. By incorporating attention mechanisms, transformers are able to effectively consider both spatial and temporal contexts, improving the predictive performance for geo-temporal crime forecasting.

Our work focuses on developing a novel attention-based model tailored specifically for the task of geo-temporal crime forecasting, aiming to overcome the limitations of previous approaches and achieve enhanced accuracy in predicting crime occurrences within a given area.

Model and Data

In this section, we outline the methodology adopted and the data used for the development of our geo-temporal crime forecasting model.

Model Architecture

We developed a transformer-based model (see Fig. 26.1) adopting an encoder–decoder architecture [7] that consists of multiple layers of self-attention and feedforward networks, which allows the model to capture long-term dependencies in the sequential data.

Fig. 26.1
A schematic diagram presents the relation between context and forecast window through transformer encoder and transformer decoder.

Depiction of the encoder–decoder architecture adopted. Each input token of the model represents the daily distribution of crimes

The encoder receives as input a context window, containing the crime occurrences from the previous n days, while the decoder generates the forecasts for the next m days based on the input provided by the encoder. The attention mechanism within the model facilitates the consideration of both spatial and temporal correlations, enabling effective crime prediction.

Data Source

To test the model, we utilized the public dataset “Boston Incident Crime Report” published by the Boston Police Department1. This comprehensive dataset covers crimes that occurred from August 2015 to December 2022 on incidents such as larceny, burglary, and robbery. A total of 4,68,208 crimes are reported in the dataset, with an average of 5202 crimes per month. Each crime is geo-localized (with latitude and longitude coordinates) and time-stamped.

Spatial Grid and Input Data Representation

To perform crime forecasting on a fine-grained spatial level, we adopted a grid-based approach. The grid consists of cells with dimensions of 1 km by 1 km. By dividing the area of interest into these cells, we can effectively capture localized crime patterns and predict crime occurrences at a granular level. The grid-based approach enables us to assess crime trends and forecast crime hotspots within each cell. The resulting grid is composed of 122 cells (see Fig. 26.2).

Fig. 26.2
A schematic diagram presents a process in which the grid passes through grid flattering to give result features embedding vector.

Depiction of the crime distribution for a given day and the corresponding feature embedding vector obtained by flattening the 2D-grid representation

Starting from the grid division, we constructed the input tokens of the model for each day. The distribution of crimes over the grid was flattened, resulting in a feature embedding vector of dimension 1 times the number of cells in the grid. Each element of the feature vector represents the total number of crimes within a specific cell on a given day. Consequently, the input tokens capture the spatial distribution of crimes for each day, facilitating the learning of spatial correlations by the model.

Training and Testing

For model training, we used the data from 2015 up to the end of 2021, encompassing several years’ worth of crime incidents. To evaluate the model’s performance and assess its generalization capability, we tested it on the data of 2022.

We considered a context window composed of the crimes that happened during the previous 30 days and a forecast window of the following 7 days.

We implemented this work on Google Colaboratory Pro+ with Python 3.10.11, using Pytorch 2.0 for the implementation of the transformer model (i.e., nn.TransformerEncoder and nn.TransformerDecoder) and scikit-learn for the baseline models (i.e., RandomForestRegressor and LinearRegression). We set the transformer model with a hidden size equal to 32, 2 layers, a dropout equal to 0.1, and a learning rate of 1e-4, while for the random forest model, we use 100 trees and a maximum depth of 4.

Results

We evaluated the model’s performance by measuring the mean average error (MAE) and mean squared error (MSE) of each cell’s predicted daily number of crimes. The dataset was split, considering as a training set all the crimes that happened before January 1, 2022, and as a test set all the remaining ones. Our experimental results show that the proposed model outperforms traditional machine learning models, such as the linear regression model [8] and random forest [9] for crime forecasting. As it is possible to observe from Table 26.1, the transformer model proposed provides a substantial improvement with respect to standard machine learning models. In particular, the model obtains a score of 1.674 in MSE, achieving a reduction of 68% and about 18% compared to the linear regression and random forest models, respectively.

Table 26.1 The obtained MAE and MSE for different models on the test set

Conclusions

Accurate crime predictions can assist law enforcement agencies in allocating resources to effectively address crime in specific areas, thereby improving public safety. In this chapter, we proposed a deep learning model based on an encoder–decoder transformer architecture for geo-temporal crime forecasting. The model demonstrated its ability to capture crime incidents’ spatial and temporal dependencies and forecast localized crime patterns, improving the prediction accuracy against baseline models proposed in previous studies. In future work, we plan to extend our model by incorporating additional features (e.g., weather forecasts, point of interest, and land use) to make the model spatially agnostic and scalable to different cities.