Predicting Metropolitan Crime Rates Using Machine Learning Techniques

  • Saba Moeinizade
  • Guiping HuEmail author
Conference paper
Part of the Springer Proceedings in Business and Economics book series (SPBE)


The concept of smart city has been gaining public interests with the considerations of socioeconomic development and quality of life. Smart initiatives have been proposed in multiple domains, such as health, energy, and public safety. One of the key factors that impact the quality of life is the crime rate in a metropolitan area. Predicting crime patterns is a significant task to develop more efficient strategies either to prevent crimes or to improve the investigation efforts. In this research, we use machine learning techniques to solve a multinomial classification problem where the goal is to predict the crime categories with spatiotemporal data. As a case study, we use San Francisco crime data from San Francisco Police Department (SFPD). Various classification methods such as Multinomial Logistic Regression, Random Forests, Lightgbm, and Xgboost have been adopted to predict the category of crime. Feature engineering was employed to boost the model performance. The results demonstrate that our proposed classifier outperforms other published models.


Machine learning Multinomial classification Crime prediction 


  1. 1.
    G. Alperovich, Multi-class Classification Problem: Crimes in San-Francisco (2016), pp. 1–5Google Scholar
  2. 2.
    M. Aly, Survey on multiclass classification methods.pdf. no. November (2005), pp. 1–9Google Scholar
  3. 3.
    L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (Taylor & Francis, 1984)Google Scholar
  4. 4.
    S.D. Bay, Combining nearest neighbor classifiers through multiple feature subsets, in Proceedings of the Fifteenth International Conference on Machine Learning (1998), pp. 37–45Google Scholar
  5. 5.
    T.J. Watson, An empirical study of the Naive Bayes classifier (2001)Google Scholar
  6. 6.
    J. Engel, Polytomous logistic regression. Stat. Neerl. 42(4), 233–252 (1988)CrossRefGoogle Scholar
  7. 7.
    C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press Inc., New York, NY, USA, 1995)Google Scholar
  8. 8.
    C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)Google Scholar
  9. 9.
    T.M. Choi, J. Gao, J.H. Lambert, C.K. Ng, J. Wang, Optimization and Control for Systems in the Big Data Era: An Introduction, vol. 252 (2017)Google Scholar
  10. 10.
    T.G. Dietterich, Ensemble methods in machine learning, in Proceedings of the First International Workshop on Multiple Classifier Systems (2000), pp. 1–15Google Scholar
  11. 11.
    L.E.O. Breiman, Random forest(LeoBreiman).pdf (2001), pp. 5–32Google Scholar
  12. 12.
    Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)CrossRefGoogle Scholar
  13. 13.
    J. Friedman, Greedy Function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)CrossRefGoogle Scholar
  14. 14.
    S. Darekar, Predicting and Analysis of Crime in San Francisco pp. 1–25Google Scholar
  15. 15.
    J. Ke, X. Li, J. Chen, San Francisco Crime Classification. no. November (2015), pp. 1–7Google Scholar
  16. 16.
    C. Hale, F. Liu, CS 229 Project Report : San Francisco Crime Classification.Google Scholar
  17. 17.
    G.H. Larios, Case Study Report San Francisco Crime Classification (2016)Google Scholar
  18. 18.
    P. Date, UCLA UCLA Electronic Theses and Dissertations An Informative and Predictive Analysis of the San Francisco Police Department Crime Data (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Industrial and Manufacturing Systems EngineeringIowa State UniversityAmesUSA

Personalised recommendations