Skip to main content
Log in

Machine Learning Algorithms for Crime Prediction under Indian Penal Code

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

In this paper, the authors propose a data-driven approach to draw insightful knowledge from the Indian crime data. The proposed approach can be helpful for police and other law enforcement bodies in India for controlling and preventing crime region-wise. In the proposed approach different regression models are built based on different regression algorithms, viz., random forest regression (RFR), decision tree regression (DTR), multiple linear regression (MLR), simple linear regression (SLR), and support vector regression (SVR) after pre-processing the data using MySQL Workbench and R programming. These regression models can predict 28 different types of IPC cognizable crime counts and also a total number of Indian Penal Code (IPC) cognizable crime counts region-wise, state-wise, and year-wise (for all over the country) provided the desired inputs to the model. Data visualization techniques, namely, chord diagrams and map plots, are used to visualize pre-processed data (corresponding to the years 2014 to 2020) and predicted data by the relatively best regression model for the year 2022. For the chosen data, it is concluded that Random Forest Regression (RFR), which predicts total IPC cognizable crime, fits relatively the best, with a 0.96 adjusted r squared value and a MAPE value of 0.2, and among regression models predicting region-wise theft crime count, the random forest regression-based model relatively fits the best, with an adjusted R squared value of 0.96 and a MAPE value of 0.166. These regression models predict that Andhra Pradesh state will have the highest crime counts, with Adilabad district at the top, having 31,933 predicted crime counts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Data Availability

All used data are benchmark and are freely available in repositories.

Code Availability

All used code are freely available on net.

References

  1. Gupta M, Chandra B, Gupta MP (2014) A framework of intelligent decision support system for Indian police. J Enterp Inf Manag 27(5):512–540. https://doi.org/10.1108/JEIM-10-2012-0073

    Article  Google Scholar 

  2. Himabindu BL, Arora R, Prashanth NS (2014) Whose problem is it anyway? Crimes against women in India. Glob Health Action 7(1):23718

    Article  CAS  PubMed  Google Scholar 

  3. Zavadzki T, de Pauli S, Kleina M, Bonat WH (2020) Comparing artificial neural network architectures for Brazilian stock market prediction. Ann Data Sci 7(4):613–628

    Article  Google Scholar 

  4. Aziz R, Verma CK, Srivastava N (2017) A novel approach for dimension reduction of microarray. Comput Biol Chem 71:161–169

    Article  CAS  PubMed  Google Scholar 

  5. Misra S (2021) The Police System in India, Global Perspectives in Policing and Law Enforcement

  6. Kassem M, Ali A, Audi M (2019) Unemployment rate, population density and crime rate in Punjab (Pakistan): an empirical analysis. Bull Bus Econ 8(2):92–104

    Google Scholar 

  7. Shi Y (2022) Advances in big data analytics: theory, algorithms and practices. Springer Nature, Singapore

    Book  Google Scholar 

  8. Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin, New York, pp 2250–2254

    Google Scholar 

  9. Shermila AM, Bellarmine AB, Santiago N (2018) Crime data analysis and prediction of perpetrator identity using machine learning approach. In: 2018 2nd international conference on trends in electronics and informatics (ICOEI), 2018. IEEE, pp 107–114

  10. Musheer RA, Verma C, Srivastava N (2019) Novel machine learning approach for classification of high-dimensional microarray data. Soft Comput 23(24):13409–13421

    Article  Google Scholar 

  11. Aziz RM (2022) Cuckoo search-based optimization for cancer classification: a new hybrid approach. J Comput Biol. https://doi.org/10.1089/cmb.2021.0410

    Article  MathSciNet  PubMed  Google Scholar 

  12. Shabat H, Omar N, Rahem K (2014) Named entity recognition in crime using machine learning approach. In Asia information retrieval symposium, 2014. Springer, pp 280–288

  13. Goody J (2012) The theft of history. Cambridge University Press, Cambridge

    Book  Google Scholar 

  14. Heeramun R, Magnusson C (2017) Gumpert CH, Granath S, Lundberg M, Dalman C, Rai D. Autism and convictions for violent crimes: population-based cohort study in Sweden. J Am Acad Child Adolesc Psychiatry 56(6):491–497

    Article  PubMed  Google Scholar 

  15. McDermott RC, Kilmartin C, McKelvey DK, Kridel MM (2015) College male sexual assault of women and the psychology of men: past, present, and future directions for research. Psychol Men Masc 16(4):355

    Article  Google Scholar 

  16. Morewitz S (2019) Kidnapping and Violence: New Research and Clinical Perspectives. Springer, New York

    Book  Google Scholar 

  17. van Dijk A, Wolswijk H (2017) Criminal liability for serious traffic offences: essays on causing death, injury and danger in traffic. Eleven International Publishing, Amsterdam

    Google Scholar 

  18. ToppiReddy HKR, Saini B, Mahajan G (2018) Crime prediction & monitoring framework based on spatial analysis. Procedia Comput Sci 132:696–705

    Article  Google Scholar 

  19. Shi Y, Tian Y, Kou G, Peng Y, Li J (2011) Optimization based data mining: theory and applications. Springer, Berlin

    Book  Google Scholar 

  20. Liao R, Wang X, Li L, Qin Z (2010) A novel serial crime prediction model based on Bayesian learning theory. In 2010 international conference on machine learning and cybernetics, 2010, vol 4. IEEE, pp 1757–1762

  21. Hosseinkhani J, Taherdoost H, Keikhaee S (2021) ANTON framework based on semantic focused crawler to support web crime mining using SVM. Ann Data Sci 8(2):227–240

    Article  Google Scholar 

  22. Keyvanpour MR, Javideh M, Ebrahimi MRJPCS (2011) Detecting and investigating crime by means of data mining: a general crime matching framework. Proc Procedia Comput Sci 3:872–880

    Article  Google Scholar 

  23. Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178

    Article  Google Scholar 

  24. Tayal et al (2015) (2015) Crime detection and criminal identification in India using data mining techniques. AI Soc 30(1):117–127

    Article  MathSciNet  Google Scholar 

  25. Awal MA, Rabbi J, Hossain SI, Hashem M (2016) Using linear regression to forecast future trends in crime of Bangladesh. In: 2016 5th international conference on informatics, electronics and vision (ICIEV), 2016. IEEE, pp 333–338

  26. Yadav S, Timbadia M, Yadav A, Vishwakarma R, Yadav N (2017) Crime pattern detection, analysis & prediction. In: 2017 International conference of electronics, communication and aerospace technology (ICECA), 2017, vol 1. IEEE, pp 225–230

  27. Kim S, Joshi P, Kalsi PS, Taheri P (2018) Crime analysis through machine learning. In 2018 IEEE 9th annual information technology, electronics and mobile communication conference (IEMCON), 2018. IEEE, pp 415–420

  28. Kumar H, Sainia B, Mahajana G (2018) Crime prediction & monitoring framework based on spatial analysis. In: International conference on computational intelligence and data science, Jaipur

  29. Rastogi I et al (2020) Knowledge discovery in databases for prediction of future crimes. Turk J Physiother Rehabil 32:3

    Google Scholar 

  30. Mittal M, Goyal LM, Sethi JK, Hemanth DJ (2019) Monitoring the impact of economic crisis on crime in India using machine learning. Comput Econ 53(4):1467–1485

    Article  Google Scholar 

  31. Das P, Das AK (2019) Application of classification techniques for prediction and analysis of crime in India. In: Computational intelligence in data mining. Springer, pp 191–201

  32. Hossain S, Abtahee A, Kashem I, Hoque MM, Sarker IH (2020) Crime prediction using spatio-temporal data. In: International conference on computing science, communication and security, 2020. Springer, pp 277–289

  33. Pinto M, Wei H, Konate K, Touray I (2020) Delving into factors influencing New York crime data with the tools of machine learning. J Comput Sci Coll 36(2):61–70

    Google Scholar 

  34. Wheeler AP, Steenbeek W (2021) Mapping the risk terrain for crime using machine learning. J Quant Criminol 37(2):445–480

    Article  Google Scholar 

  35. Hatcher WG, Yu WJIA (2018) A survey of deep learning: platforms, applications and emerging research trend. IEEE Access 6:24411–24432

    Article  Google Scholar 

  36. Aziz RM, Baluch MF, Patel S, Kumar P (2022) A machine learning based approach to detect the Ethereum fraud transactions with limited attributes. Karbala Int J Mod Sci 8(2):139–151

    Article  Google Scholar 

  37. Aziz RM, Hussain A, Sharma P, Kumar P (2022) Machine learning-based Soft Computing regression analysis approach for crime data prediction. Karb Int J Mod Sci 8(1):1–9

    Article  Google Scholar 

  38. Aziz RM, Baluch MF, Patel S, Ganie AH (2022) LGBM: a machine learning approach for Ethereum fraud detection. Int J Inf Technol 29:1–1

    CAS  Google Scholar 

  39. Safat W, Asghar S, Gillani SA (2021) Empirical analysis for crime prediction and forecasting using machine learning and deep learning techniques. IEEE Access 9(2021):70080–70094

    Article  Google Scholar 

  40. Berger PD, Maurer RE, Cell GB (2018) Multiple linear regression. In: Experimental design. Springer, pp 505–532

  41. Aziz RM (2022) Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 60(6):1627–1646

    Article  PubMed  Google Scholar 

  42. Vural MS, Gök M (2017) Criminal prediction using Naive Bayes theory. Neural Comput Appl 28(9):2581–2592

    Article  Google Scholar 

  43. Aziz R, Verma CK, Srivastava N (2018) Artificial neural network classification of high dimensional data with novel optimization approach of dimension reduction. Ann Data Sci 5(4):615–635

    Article  Google Scholar 

  44. Cootes TF, Ionita MC, Lindner C, Sauer P (2012) Robust and accurate shape model fitting using random forest regression voting. In: European conference on computer vision, 2012. Springer, pp 278–291

  45. Xia Z, Stewart K, Fan J (2021) Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major us metropolitan area. Comput Environ Urban Syst 87:101599

    Article  PubMed  PubMed Central  Google Scholar 

  46. Aziz RM (2022) Application of nature inspired Soft Comput. techniques for gene selection: a novel frame work for classification of cancer. Soft Comput. https://doi.org/10.1007/s00500-022-07032-9

    Article  Google Scholar 

  47. Aziz R, Verma C, Srivastava N (2015) A weighted-SNR feature selection from independent component subspace for NB classification of microarray data. Int J Adv Biotech Res 6(2015):245–255

    Google Scholar 

  48. Desai NP, Baluch MF, Makrariya A, MusheerAziz R (2022) Image processing model with deep learning approach for fish species classification. Turk. J. Comput. Math. Educ. 13(1):85–99

    Google Scholar 

  49. Lakovic V (2020) Modeling of entrepreneurship activity crisis management by support vector machine. Ann Data Sci 7(4):629–638

    Article  Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Material preparation, data collection, and data analysis were performed by Prajwal Sharma and Aftab Hussian. Manuscript writing and all other work performed by Dr. Rabia Musheer Aziz.

Corresponding author

Correspondence to Rabia Musheer Aziz.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical statements

This material is the authors' own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors' own research and analysis in a truthful and complete manner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aziz, R.M., Sharma, P. & Hussain, A. Machine Learning Algorithms for Crime Prediction under Indian Penal Code. Ann. Data. Sci. 11, 379–410 (2024). https://doi.org/10.1007/s40745-022-00424-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-022-00424-6

Keywords

Navigation