Abstract
Crime prediction is a crucial problem in law enforcement, and the ability to forecast where and when crimes are likely to occur can help police departments allocate their resources effectively and prevent crimes. In this chapter, we propose a geo-temporal crime forecasting model based on a transformer architecture. We use a public dataset from the Boston Police Department and forecast crimes in each cell of a 1 km × 1 km grid. We use an encoder–decoder structure to capture the spatiotemporal patterns of the crimes. The encoder elaborates the crimes that occurred in each cell during the previous n days, and the decoder generates predictions of future crimes in each cell for the next m days. Our model considers both spatial and temporal correlations, which is challenging for traditional models. We evaluate the model on the Boston crime dataset and compare it with traditional solutions. Our experiments show that our model outperforms traditional models, achieving better accuracy in crime prediction. Overall, our proposed geo-temporal crime forecasting model is a promising approach for predicting crime in a given area.
You have full access to this open access chapter, Download chapter PDF
Keywords
Introduction
Crime forecasting plays a vital role in law enforcement agencies’ efforts to prevent and address criminal activities. Accurate predictions regarding the spatial and temporal patterns of crime can assist in resource deployment, proactive intervention, and effective crime prevention strategies [1]. In this sense, it is essential to identify the possible crime hotspots within narrow regions spatially as general predictions on larger areas, such as the city or district level, do not allow to design and implement strategies to combat crimes effectively [2]. A substantial amount of previous research has been performed on the application of machine learning (ML) for the task of crime predictions [3]. In this chapter, we present a deep learning (DL) attention-based approach to geo-temporal crime forecasting.
Our research focuses on developing a transformer-based model specifically designed for crime forecasting. The model consists of an encoder that takes as input the crimes that occurred during a given context window of n days and a decoder that generates the forecasts for the next m days based on the input fed by the encoder. By leveraging the power of DL techniques, we aim to capture and utilize the intricate relationships between crime occurrences over time and their spatial context. The model leverages the power of transformers and attention mechanisms to capture the spatial and temporal correlations of crime occurrences. Our experimental results highlight the superior performance of our model and its potential to contribute significantly to the field of crime prevention and law enforcement efforts.
Related Work
Numerous studies have tackled the challenging task of geo-temporal crime forecasting, aiming to provide accurate predictions and assist law enforcement agencies (LEAs) in combating crime. In this section, we discuss relevant works that have explored different approaches and techniques in this domain.
Traditional statistical models, such as linear regression [4] and random forest [5], have been widely used for crime prediction and to identify possible crime hotspots. These models often rely on historical crime patterns and spatial information to identify correlations and forecast future crime occurrences. However, their limitations in capturing complex spatial and temporal relationships restrict their predictive capabilities.
Other ML approaches have also been employed for crime forecasting. Clustering algorithms, for instance, have been utilized to identify crime hotspots and spatial patterns [6]. These methods leverage spatial analysis to detect areas with high crime rates and predict future criminal activities. However, the absence of temporal dynamics may hinder their forecasting accuracy.
In recent years, transformer-based models have gained attention across various domains, including natural language processing and computer vision [7]. Their ability to capture long-range dependencies and model interactions across different input elements makes them suitable also for crime forecasting tasks. By incorporating attention mechanisms, transformers are able to effectively consider both spatial and temporal contexts, improving the predictive performance for geo-temporal crime forecasting.
Our work focuses on developing a novel attention-based model tailored specifically for the task of geo-temporal crime forecasting, aiming to overcome the limitations of previous approaches and achieve enhanced accuracy in predicting crime occurrences within a given area.
Model and Data
In this section, we outline the methodology adopted and the data used for the development of our geo-temporal crime forecasting model.
Model Architecture
We developed a transformer-based model (see Fig. 26.1) adopting an encoder–decoder architecture [7] that consists of multiple layers of self-attention and feedforward networks, which allows the model to capture long-term dependencies in the sequential data.
The encoder receives as input a context window, containing the crime occurrences from the previous n days, while the decoder generates the forecasts for the next m days based on the input provided by the encoder. The attention mechanism within the model facilitates the consideration of both spatial and temporal correlations, enabling effective crime prediction.
Data Source
To test the model, we utilized the public dataset “Boston Incident Crime Report” published by the Boston Police Department1. This comprehensive dataset covers crimes that occurred from August 2015 to December 2022 on incidents such as larceny, burglary, and robbery. A total of 4,68,208 crimes are reported in the dataset, with an average of 5202 crimes per month. Each crime is geo-localized (with latitude and longitude coordinates) and time-stamped.
Spatial Grid and Input Data Representation
To perform crime forecasting on a fine-grained spatial level, we adopted a grid-based approach. The grid consists of cells with dimensions of 1 km by 1 km. By dividing the area of interest into these cells, we can effectively capture localized crime patterns and predict crime occurrences at a granular level. The grid-based approach enables us to assess crime trends and forecast crime hotspots within each cell. The resulting grid is composed of 122 cells (see Fig. 26.2).
Starting from the grid division, we constructed the input tokens of the model for each day. The distribution of crimes over the grid was flattened, resulting in a feature embedding vector of dimension 1 times the number of cells in the grid. Each element of the feature vector represents the total number of crimes within a specific cell on a given day. Consequently, the input tokens capture the spatial distribution of crimes for each day, facilitating the learning of spatial correlations by the model.
Training and Testing
For model training, we used the data from 2015 up to the end of 2021, encompassing several years’ worth of crime incidents. To evaluate the model’s performance and assess its generalization capability, we tested it on the data of 2022.
We considered a context window composed of the crimes that happened during the previous 30 days and a forecast window of the following 7 days.
We implemented this work on Google Colaboratory Pro+ with Python 3.10.11, using Pytorch 2.0 for the implementation of the transformer model (i.e., nn.TransformerEncoder and nn.TransformerDecoder) and scikit-learn for the baseline models (i.e., RandomForestRegressor and LinearRegression). We set the transformer model with a hidden size equal to 32, 2 layers, a dropout equal to 0.1, and a learning rate of 1e-4, while for the random forest model, we use 100 trees and a maximum depth of 4.
Results
We evaluated the model’s performance by measuring the mean average error (MAE) and mean squared error (MSE) of each cell’s predicted daily number of crimes. The dataset was split, considering as a training set all the crimes that happened before January 1, 2022, and as a test set all the remaining ones. Our experimental results show that the proposed model outperforms traditional machine learning models, such as the linear regression model [8] and random forest [9] for crime forecasting. As it is possible to observe from Table 26.1, the transformer model proposed provides a substantial improvement with respect to standard machine learning models. In particular, the model obtains a score of 1.674 in MSE, achieving a reduction of 68% and about 18% compared to the linear regression and random forest models, respectively.
Conclusions
Accurate crime predictions can assist law enforcement agencies in allocating resources to effectively address crime in specific areas, thereby improving public safety. In this chapter, we proposed a deep learning model based on an encoder–decoder transformer architecture for geo-temporal crime forecasting. The model demonstrated its ability to capture crime incidents’ spatial and temporal dependencies and forecast localized crime patterns, improving the prediction accuracy against baseline models proposed in previous studies. In future work, we plan to extend our model by incorporating additional features (e.g., weather forecasts, point of interest, and land use) to make the model spatially agnostic and scalable to different cities.
References
Benbouzid, B. (2019). To predict and to manage. Predictive policing in the United States. Big Data & Society, 6(1). https://doi.org/10.1177/2053951719861703
Weisburd, D., Bernasco, W., & Bruinsma, G. J. N. (2009). Putting crime in its place: Units of analysis in geographic criminology. Springer.
Jenga, K., Catal, C., & Kar, G. (2023). Machine learning in crime prediction. Journal of Ambient Intelligence and Humanized Computing, 14, 2887–2913. https://doi.org/10.1007/s12652-023-04530-y
Cavadas, B., Branco, P., & Pereira, S. (2015). Crime prediction using regression and resources optimization. In F. Pereira, P. Machado, E. Costa, & A. Cardoso (Eds.), Progress in artificial intelligence. EPIA 2015. Lecture notes in computer science() (Vol. 9273). Springer. https://doi.org/10.1007/978-3-319-23485-4_51
Yao, S., et al. (2020). Prediction of crime hotspots based on spatial factors of random forest. 2020 15th international conference on computer science & education (ICCSE), Delft, Netherlands, pp. 811–815, https://doi.org/10.1109/ICCSE49874.2020.9201899.
Cesario, E., Lindia, P., & Vinci, A. (2023). Detecting multi-density urban hotspots in a smart city: Approaches, challenges and applications. Big Data and Cognitive Computing, 7(1), 29. https://doi.org/10.3390/bdcc7010029
Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–84. JSTOR.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Acknowledgments
This project received EU funding through the STARLIGHT project (grant agreement no. 101021797), the APPRAISE project (grant agreement no. 101021981), and the LAGO project (grant agreement no. 101073951).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2025 The Author(s)
About this chapter
Cite this chapter
Caffaro, F., Bongiovanni, L., Rossi, C. (2025). Geo-temporal Crime Forecasting Using a Deep Learning Attention-Based Model. In: Gkotsis, I., Kavallieros, D., Stoianov, N., Vrochidis, S., Diagourtas, D., Akhgar, B. (eds) Paradigms on Technology Development for Security Practitioners. Security Informatics and Law Enforcement. Springer, Cham. https://doi.org/10.1007/978-3-031-62083-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-62083-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62082-9
Online ISBN: 978-3-031-62083-6
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)