Abstract
Crime statistics in Ecuador show us that in recent years the number of cases for different types of crimes has increased. Although the different state entities have criminal data, analyzes are not always carried out to predict new cases. This work proposes an analysis of the information based on automatic learning algorithms that allows extracting knowledge about the relationships between the different variables that affect criminal acts. These results can be used as tools for the country's authorities and organizations to better control and prevent crime. Using machine learning algorithms, crime counts by province can be predicted using techniques that are based on multiple regression or other techniques. Using monthly counts of different types of crimes over several years, three machine learning algorithms are implemented: Multiple Linear Regression (MLR), Decision Tree Regression (DTR), and Random Forest Regression (RFR). These models are trained and tested for use in predicting new crimes, especially rapes, burglaries, and personal thefts. The R-squared, adjusted R-squared, and root mean square error (RMSE) metrics are used to evaluate and compare the proposed regression models. The results show that the RFR model achieves a better fit to the data with an adjusted R-squared value of 0.965746 for the case of home burglaries and a value of 0.974088 for thefts. In addition, this model presents the lowest RMSE value for the three types of crimes. The best adjusted R-squared value for the rape case was obtained using the MLR model with a value of 0.929960. The most affected provinces in absolute counts are Guayas and Pichincha, whose crime levels remain at alarming levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Datos Abiertos – Servicio Nacional de Gestión de Riesgos y Emergencias. https://www.gestionderiesgos.gob.ec/datos-abiertos/. Accessed 21 June 2020
Ministerio de Defensa Nacional – Instancia polÃtico-administrativa del Gobierno de Ecuador encargada de dirigir la polÃtica de defensa y administrar las Fuerzas Armadas; armonizando las acciones entre las funciones del Estado y la institución militar. https://www.defensa.gob.ec/. Accessed 11 April 2022
Chen, P., Yuan, H., Shu, X.: Forecasting crime using the ARIMA model. In: Proceedings - 5th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008, vol. 5, pp. 627–630 (2008). https://doi.org/10.1109/FSKD.2008.222
Kshatri, S.S., Singh, D., Narain, B., Bhatia, S., Quasim, M.T., Sinha, G.R.: An empirical analysis of machine learning algorithms for crime prediction using stacked generalization: an ensemble approach. IEEE Access 9, 67488–67500 (2021). https://doi.org/10.1109/access.2021.3075140
Sundhara Kumar, K.B., Bhalaji, N.: A study on classification algorithms for crime records. In: SmartCom 2016. CCIS, vol. 628, pp. 873–880. Springer, Singapore (2016), doi: https://doi.org/10.1007/978-981-10-3433-6_104
Mcclendon, L., Meghanathan, N.: Using machine learning algorithms to analyze crime data. Mach. Learn. Appl. Int. J. (MLAIJ) 2(1), 1–12 (2015). https://doi.org/10.5121/mlaij.2015.2101
Rani, A.: Crime trend analysis and prediction using mahanolobis distance and dynamic time warping technique. Int. J. Comput. Sci. Inf. Technol. 5(3), 4134–4135 (2014). www.ijcsit.com
Awodele, O., Ernest, O.E., Olufunmike, O.A., Oluwawunmi Ugo-Ezeaba Anita A, S.O.: A real-time crime records management system for national security agencies. Europ. J. Comput. Sci. Inf. Technol. 3(2), 1–12 (2015). www.eajournals.org
Khan, M., Ali, A., Alharbi, Y.: Predicting and preventing crime: a crime prediction model using san francisco crime data by classification techniques. Complexity 2022, 1–13 (2022). https://doi.org/10.1155/2022/4830411
Hossain, S., Abtahee, A., Kashem, I., Hoque, M.M., Sarker, I.H.: Crime prediction using spatio-temporal data. In: Chaubey, N., Parikh, S., Amin, K. (eds.) COMS2 2020. CCIS, vol. 1235, pp. 277–289. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6648-6_22
Singh, R., Umrao, R.K., Ahmad, M., Ansari, M.K., Sharma, L.K., Singh, T.N.: Prediction of geomechanical parameters using soft computing and multiple regression approach. Measurement 99, 108–119 (2017). https://doi.org/10.1016/j.measurement.2016.12.023
Farhadian, H., Katibeh, H.: New empirical model to evaluate groundwater flow into circular tunnel using multiple regression analysis. Int. J. Min. Sci. Technol. 27(3), 415–421 (2017). https://doi.org/10.1016/J.IJMST.2017.03.005
Bandekar, S.R., Vijayalakshmi, C.: Design and analysis of machine learning algorithms for the reduction of crime rates in India. Procedia Computer Science 172, 122–127 (2020). https://doi.org/10.1016/J.PROCS.2020.05.018
Ahmad, M.W., Reynolds, J., Rezgui, Y.: Predictive modelling for solar thermal energy systems: a comparison of support vector regression, random forest, extra trees and regression trees. J. Clean. Prod. 203, 810–821 (2018). https://doi.org/10.1016/J.JCLEPRO.2018.08.207
Yang, L., Liu, S., Tsoka, S., Papageorgiou, L.G.: A regression tree approach using mathematical programming. Expert Syst. Appl. 78, 347–357 (2017). https://doi.org/10.1016/J.ESWA.2017.02.013
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019). https://doi.org/10.1016/J.ESWA.2019.05.028
Chen, Y., Zheng, W., Li, W., Huang, Y.: Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recogn. Lett. 144, 1–5 (2021). https://doi.org/10.1016/J.PATREC.2021.01.008
Alves, L.G.A., Ribeiro, V.H., Rodrigues, F.A.: Crime prediction through urban metrics and statistical learning. Phys. A Stat. Mech. Appl. 505, 435–443 (2018). https://doi.org/10.1016/J.PHYSA.2018.03.084
Li, Y., Yan, C., Liu, W., Li, M.: A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Appl. Soft Comput. 70, 1000–1009 (2018). https://doi.org/10.1016/J.ASOC.2017.07.027
YeÅŸilkanat, C.M.: Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm. Chaos Solitons Fractals 140, 110210 (2020). https://doi.org/10.1016/J.CHAOS.2020.110210
Gunturi, S.K., Sarkar, D.: Ensemble machine learning models for the detection of energy theft. Electric Power Syst. Res. 192, 106904 (2021). https://doi.org/10.1016/J.EPSR.2020.106904
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, V. (2023). Crime Data Analysis Using Machine Learning Models. In: Botto-Tobar, M., Zambrano Vizuete, M., Montes León, S., Torres-Carrión, P., Durakovic, B. (eds) Applied Technologies. ICAT 2022. Communications in Computer and Information Science, vol 1755. Springer, Cham. https://doi.org/10.1007/978-3-031-24985-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-24985-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24984-6
Online ISBN: 978-3-031-24985-3
eBook Packages: Computer ScienceComputer Science (R0)