Analysis of street crime predictors in web open data


Crime predictors have been sought after by governments and citizens alike for preventing or avoiding crimes. In this paper, we attempt to thoroughly analyze crime predictors from three Web open data sources: Google Street View (GSV), Twitter, and Foursquare, which provides visual, textual, and human behavioral data respectively. In contrast to existing works that attempt crime prediction at zip-code level or coarser granularity, we focus on street-level crime prediction. We transform data assigned to street-segments, and extract and determine strong predictors correlated with crime. Particularly, we are the first to discover visual clues on street outlooks that are predictive for crime. We focus on the city of San Francisco, and our extensive experiments show the effectiveness of predictors in a range of tests. We show that by analyzing and selecting strong predictors in Web open data, one could achieve significantly better crime prediction accuracy, comparing to traditional demographic data-based prediction.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

    The entire data used for the analysis is available upon request.

  6. 6.

  7. 7.

  8. 8.

  9. 9.

  10. 10.

  11. 11.

  12. 12.

  13. 13.

  14. 14.

  15. 15.

  16. 16.

  17. 17.

  18. 18.

  19. 19.


  1. Aghababaei, S., & Makrehchi, M. (2016). Mining social media content for crime prediction. In 2016 IEEE/WIC/ACM international conference on web intelligence (WI) (pp. 526–531): IEEE.

  2. Barker, M., Page, S.J., Meyer, D. (2002). Modeling tourism crime: the 2000 America’s cup. Annals of Tourism Research, 29(3), 762–782.

    Article  Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    MATH  Google Scholar 

  4. Camacho-Collados, M., & Liberatore, F. (2015). A decision support system for predictive police patrolling. Decision Support Systems, 75, 25–37.

    Article  Google Scholar 

  5. Chen, T., Borth, D., Darrell, T., Chang, S.F. (2014). Deepsentibank: visual sentiment concept classification with deep convolutional neural networks. arXiv:1410.8586.

  6. Chen, X., Cho, Y., Jang, S.Y. (2015). Crime prediction using twitter sentiment and weather. In Systems and information engineering design symposium (SIEDS), 2015 (pp. 63–68): IEEE.

  7. De Nadai, M., Vieriu, R.L., Zen, G., Dragicevic, S., Naik, N., Caraviello, M., Hidalgo, C.A., Sebe, N., Lepri, B. (2016). Are safer looking neighborhoods more lively?: a multimodal investigation into urban life. In Proceedings of the international multimedia conference (pp. 1127–1135).

  8. Diebold, F.X., & Mariano, R.S. (2002). Comparing predictive accuracy. Journal of Business & Economic Statistics, 20(1), 134–144.

    MathSciNet  Article  Google Scholar 

  9. Du, B., Liu, C., Zhou, W., Hou, Z., Xiong, H. (2016). Catch me if you can: detecting pickpocket suspects from large-scale transit records. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 87–96): ACM.

  10. Eck, J.E., Clarke, R.V., Guerette, R.T. (2007). Risky facilities: crime concentration in homogeneous sets of establishments and facilities. Crime Prevention Studies, 21, 225.

    Google Scholar 

  11. Gerber, M.S. (2014). Predicting crime using twitter and kernel density estimation. Decision Support Systems, 61, 115–125.

    Article  Google Scholar 

  12. Gill, C., Wooditch, A., Weisburd, D. (2017). Testing the law of crime concentration at place in a suburban setting: implications for research and practice. Journal of Quantitative Criminology, 33(3), 519– 545.

    Article  Google Scholar 

  13. Graif, C., Gladfelter, A.S., Matthews, S.A. (2014). Urban poverty and neighborhood effects on crime: incorporating spatial and network perspectives. Sociology Compass, 8(9), 1140–1155.

    Article  Google Scholar 

  14. Haklay, M., & Weber, P. (2008). OpenStreetMap: user-generated street maps. IEEE Pervasive Computing, 7(4), 12–18.

    Article  Google Scholar 

  15. Kadar, C., Iria, J., Cvijikj, I.P. (2016). Exploring foursquare-derived features for crime prediction in new york city. In The international workshop on urban computing.

  16. Kang, H.W., & Kang, H.B. (2017). Prediction of crime occurrence from multi-modal data using deep learning. PloS one, 12(4), e0176244.

    MathSciNet  Article  Google Scholar 

  17. Khan, R., Van de Weijer, J., Khan, F.S., Muselet, D., Ducottet, C., Barat, C. (2013). Discriminative color descriptors. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2866–2873): IEEE.

  18. Khosla, A., Das Sarma, A., Hamid, R. (2014). What makes an image popular?. In Proceedings of the international conference on world wide web (pp. 867–876): ACM.

  19. Kim, J., Cha, M., Sandholm, T. (2014). SocRoutes: safe routes based on tweet sentiments. In Proceedings of the international conference on world wide web (pp. 179–182): ACM.

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  21. Liao, R., Wang, X., Li, L., Qin, Z. (2010). A novel serial crime prediction model based on bayesian learning theory. In 2010 international conference on machine learning and cybernetics (ICMLC), (Vol. 4 pp. 1757–1762): IEEE.

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

  23. Ojala, T., Pietikainen, M., Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

    Article  Google Scholar 

  24. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42 (3), 145–175.

    Article  Google Scholar 

  25. Peng, H., Long, F., Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  26. Pennington, J., Socher, R., Manning, C. (2014). Glove: global vectors for word representation. In Proceedings of the conference on empirical methods in natural language processing (pp. 1532–1543).

  27. Ristea, A., Kurland, J., Resch, B., Leitner, M., Langford, C. (2018). Estimating the spatial distribution of crime events around a football stadium from georeferenced tweets. ISPRS International Journal of Geo-Information, 7(2), 43.

    Article  Google Scholar 

  28. Smola, A.J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.

    MathSciNet  Article  Google Scholar 

  29. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  30. Taylor, R.B., Shumaker, S.A., Gottfredson, S.D. (1985). Neighborhood-level links between physical features and local sentiments: deterioration, fear of crime, and confidence. Journal of Architectural and Planning Research, 2(4), 261–275.

    Google Scholar 

  31. Utamima, A., & Djunaidy, A. (2017). Be-safe travel, a web-based geographic application to explore safe-route in an area. In AIP conference proceedings, (Vol. 1867 p. 020023): AIP Publishing.

  32. Wang, H., Kifer, D., Graif, C., Li, Z. (2016). Crime rate inference with big data. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 635–644): ACM.

  33. Wang, X., Brown, D.E., Gerber, M.S. (2012). Spatio-temporal modeling of criminal incidents using geographic, demographic, and twitter-derived information. In Proceedings of the IEEE international conference on intelligence and security informatics (pp. 36–41): IEEE.

  34. Weisburd, D. (2015). The law of crime concentration and the criminology of place. Criminology, 53(2), 133–157.

    Article  Google Scholar 

  35. Wilson, J.Q., & Kelling, G.L. (1982). Broken windows. Atlantic Monthly, 249 (3), 29–38.

    Google Scholar 

  36. Yang, D., Heaney, T., Tonon, A., Wang, L., Cudré-Mauroux, P. (2017). Crimetelescope: crime hotspot prediction based on urban and social media data fusion. World Wide Web: 1–25.

  37. Zhao, X., & Tang, J. (2017). Modeling temporal-spatial correlations for crime prediction. In Proceedings of the international conference on information and knowledge management (pp. 497–506).

Download references

Author information



Corresponding author

Correspondence to Yihong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Siriaraya, P., Kawai, Y. et al. Analysis of street crime predictors in web open data. J Intell Inf Syst 55, 535–559 (2020).

Download citation


  • Crime prediction
  • Web open data
  • Image and text analysis