Abstract
In this paper, the authors propose a data-driven approach to draw insightful knowledge from the Indian crime data. The proposed approach can be helpful for police and other law enforcement bodies in India for controlling and preventing crime region-wise. In the proposed approach different regression models are built based on different regression algorithms, viz., random forest regression (RFR), decision tree regression (DTR), multiple linear regression (MLR), simple linear regression (SLR), and support vector regression (SVR) after pre-processing the data using MySQL Workbench and R programming. These regression models can predict 28 different types of IPC cognizable crime counts and also a total number of Indian Penal Code (IPC) cognizable crime counts region-wise, state-wise, and year-wise (for all over the country) provided the desired inputs to the model. Data visualization techniques, namely, chord diagrams and map plots, are used to visualize pre-processed data (corresponding to the years 2014 to 2020) and predicted data by the relatively best regression model for the year 2022. For the chosen data, it is concluded that Random Forest Regression (RFR), which predicts total IPC cognizable crime, fits relatively the best, with a 0.96 adjusted r squared value and a MAPE value of 0.2, and among regression models predicting region-wise theft crime count, the random forest regression-based model relatively fits the best, with an adjusted R squared value of 0.96 and a MAPE value of 0.166. These regression models predict that Andhra Pradesh state will have the highest crime counts, with Adilabad district at the top, having 31,933 predicted crime counts.
Similar content being viewed by others
Data Availability
All used data are benchmark and are freely available in repositories.
Code Availability
All used code are freely available on net.
References
Gupta M, Chandra B, Gupta MP (2014) A framework of intelligent decision support system for Indian police. J Enterp Inf Manag 27(5):512–540. https://doi.org/10.1108/JEIM-10-2012-0073
Himabindu BL, Arora R, Prashanth NS (2014) Whose problem is it anyway? Crimes against women in India. Glob Health Action 7(1):23718
Zavadzki T, de Pauli S, Kleina M, Bonat WH (2020) Comparing artificial neural network architectures for Brazilian stock market prediction. Ann Data Sci 7(4):613–628
Aziz R, Verma CK, Srivastava N (2017) A novel approach for dimension reduction of microarray. Comput Biol Chem 71:161–169
Misra S (2021) The Police System in India, Global Perspectives in Policing and Law Enforcement
Kassem M, Ali A, Audi M (2019) Unemployment rate, population density and crime rate in Punjab (Pakistan): an empirical analysis. Bull Bus Econ 8(2):92–104
Shi Y (2022) Advances in big data analytics: theory, algorithms and practices. Springer Nature, Singapore
Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin, New York, pp 2250–2254
Shermila AM, Bellarmine AB, Santiago N (2018) Crime data analysis and prediction of perpetrator identity using machine learning approach. In: 2018 2nd international conference on trends in electronics and informatics (ICOEI), 2018. IEEE, pp 107–114
Musheer RA, Verma C, Srivastava N (2019) Novel machine learning approach for classification of high-dimensional microarray data. Soft Comput 23(24):13409–13421
Aziz RM (2022) Cuckoo search-based optimization for cancer classification: a new hybrid approach. J Comput Biol. https://doi.org/10.1089/cmb.2021.0410
Shabat H, Omar N, Rahem K (2014) Named entity recognition in crime using machine learning approach. In Asia information retrieval symposium, 2014. Springer, pp 280–288
Goody J (2012) The theft of history. Cambridge University Press, Cambridge
Heeramun R, Magnusson C (2017) Gumpert CH, Granath S, Lundberg M, Dalman C, Rai D. Autism and convictions for violent crimes: population-based cohort study in Sweden. J Am Acad Child Adolesc Psychiatry 56(6):491–497
McDermott RC, Kilmartin C, McKelvey DK, Kridel MM (2015) College male sexual assault of women and the psychology of men: past, present, and future directions for research. Psychol Men Masc 16(4):355
Morewitz S (2019) Kidnapping and Violence: New Research and Clinical Perspectives. Springer, New York
van Dijk A, Wolswijk H (2017) Criminal liability for serious traffic offences: essays on causing death, injury and danger in traffic. Eleven International Publishing, Amsterdam
ToppiReddy HKR, Saini B, Mahajan G (2018) Crime prediction & monitoring framework based on spatial analysis. Procedia Comput Sci 132:696–705
Shi Y, Tian Y, Kou G, Peng Y, Li J (2011) Optimization based data mining: theory and applications. Springer, Berlin
Liao R, Wang X, Li L, Qin Z (2010) A novel serial crime prediction model based on Bayesian learning theory. In 2010 international conference on machine learning and cybernetics, 2010, vol 4. IEEE, pp 1757–1762
Hosseinkhani J, Taherdoost H, Keikhaee S (2021) ANTON framework based on semantic focused crawler to support web crime mining using SVM. Ann Data Sci 8(2):227–240
Keyvanpour MR, Javideh M, Ebrahimi MRJPCS (2011) Detecting and investigating crime by means of data mining: a general crime matching framework. Proc Procedia Comput Sci 3:872–880
Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178
Tayal et al (2015) (2015) Crime detection and criminal identification in India using data mining techniques. AI Soc 30(1):117–127
Awal MA, Rabbi J, Hossain SI, Hashem M (2016) Using linear regression to forecast future trends in crime of Bangladesh. In: 2016 5th international conference on informatics, electronics and vision (ICIEV), 2016. IEEE, pp 333–338
Yadav S, Timbadia M, Yadav A, Vishwakarma R, Yadav N (2017) Crime pattern detection, analysis & prediction. In: 2017 International conference of electronics, communication and aerospace technology (ICECA), 2017, vol 1. IEEE, pp 225–230
Kim S, Joshi P, Kalsi PS, Taheri P (2018) Crime analysis through machine learning. In 2018 IEEE 9th annual information technology, electronics and mobile communication conference (IEMCON), 2018. IEEE, pp 415–420
Kumar H, Sainia B, Mahajana G (2018) Crime prediction & monitoring framework based on spatial analysis. In: International conference on computational intelligence and data science, Jaipur
Rastogi I et al (2020) Knowledge discovery in databases for prediction of future crimes. Turk J Physiother Rehabil 32:3
Mittal M, Goyal LM, Sethi JK, Hemanth DJ (2019) Monitoring the impact of economic crisis on crime in India using machine learning. Comput Econ 53(4):1467–1485
Das P, Das AK (2019) Application of classification techniques for prediction and analysis of crime in India. In: Computational intelligence in data mining. Springer, pp 191–201
Hossain S, Abtahee A, Kashem I, Hoque MM, Sarker IH (2020) Crime prediction using spatio-temporal data. In: International conference on computing science, communication and security, 2020. Springer, pp 277–289
Pinto M, Wei H, Konate K, Touray I (2020) Delving into factors influencing New York crime data with the tools of machine learning. J Comput Sci Coll 36(2):61–70
Wheeler AP, Steenbeek W (2021) Mapping the risk terrain for crime using machine learning. J Quant Criminol 37(2):445–480
Hatcher WG, Yu WJIA (2018) A survey of deep learning: platforms, applications and emerging research trend. IEEE Access 6:24411–24432
Aziz RM, Baluch MF, Patel S, Kumar P (2022) A machine learning based approach to detect the Ethereum fraud transactions with limited attributes. Karbala Int J Mod Sci 8(2):139–151
Aziz RM, Hussain A, Sharma P, Kumar P (2022) Machine learning-based Soft Computing regression analysis approach for crime data prediction. Karb Int J Mod Sci 8(1):1–9
Aziz RM, Baluch MF, Patel S, Ganie AH (2022) LGBM: a machine learning approach for Ethereum fraud detection. Int J Inf Technol 29:1–1
Safat W, Asghar S, Gillani SA (2021) Empirical analysis for crime prediction and forecasting using machine learning and deep learning techniques. IEEE Access 9(2021):70080–70094
Berger PD, Maurer RE, Cell GB (2018) Multiple linear regression. In: Experimental design. Springer, pp 505–532
Aziz RM (2022) Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 60(6):1627–1646
Vural MS, Gök M (2017) Criminal prediction using Naive Bayes theory. Neural Comput Appl 28(9):2581–2592
Aziz R, Verma CK, Srivastava N (2018) Artificial neural network classification of high dimensional data with novel optimization approach of dimension reduction. Ann Data Sci 5(4):615–635
Cootes TF, Ionita MC, Lindner C, Sauer P (2012) Robust and accurate shape model fitting using random forest regression voting. In: European conference on computer vision, 2012. Springer, pp 278–291
Xia Z, Stewart K, Fan J (2021) Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major us metropolitan area. Comput Environ Urban Syst 87:101599
Aziz RM (2022) Application of nature inspired Soft Comput. techniques for gene selection: a novel frame work for classification of cancer. Soft Comput. https://doi.org/10.1007/s00500-022-07032-9
Aziz R, Verma C, Srivastava N (2015) A weighted-SNR feature selection from independent component subspace for NB classification of microarray data. Int J Adv Biotech Res 6(2015):245–255
Desai NP, Baluch MF, Makrariya A, MusheerAziz R (2022) Image processing model with deep learning approach for fish species classification. Turk. J. Comput. Math. Educ. 13(1):85–99
Lakovic V (2020) Modeling of entrepreneurship activity crisis management by support vector machine. Ann Data Sci 7(4):629–638
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Material preparation, data collection, and data analysis were performed by Prajwal Sharma and Aftab Hussian. Manuscript writing and all other work performed by Dr. Rabia Musheer Aziz.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical statements
This material is the authors' own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors' own research and analysis in a truthful and complete manner.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aziz, R.M., Sharma, P. & Hussain, A. Machine Learning Algorithms for Crime Prediction under Indian Penal Code. Ann. Data. Sci. 11, 379–410 (2024). https://doi.org/10.1007/s40745-022-00424-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40745-022-00424-6