Skip to main content
Log in

A supervised machine learning framework with combined blocking for detecting serial crimes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Detecting serial crimes is to find criminals who have committed multiple crimes. A classification technique is often used to process serial crime detection, but the pairwise comparison of crimes is of quadratic complexity, and the number of nonserial case pairs far exceeds the number of serial case pairs. The blocking method can play a role in reducing pairwise calculation and eliminating nonserial case pairs. But the limitation of previous studies is that most of them use a single criterion to select blocks, which is difficult to guarantee an excellent blocking result. Some studies integrate multiple criteria into one comprehensive index. However, the performance is easily affected by the weighting method. In this paper, we propose a combined blocking (CB) approach. Each criminal behaviour is defined as a behaviour key (BHK) and used to form a block. CB learns several weak blocking schemes by different blocking criteria and then combines them to form the final blocking scheme. The final blocking scheme consists of several BHKs. Because rare behaviour can better identify crime series, each BHK is assigned a score according to its rarity. BHKs and their scores are used to determine whether a case pair need to be compared. After comparing with multiple blocking methods, CB can effectively guarantee the number of serial case pairs while greatly reducing unnecessary nonserial case pairs. The CB is embedded in a supervised machine learning framework. Experiments on real-world robbery cases demonstrate that it can effectively reduce pairwise comparison, alleviate the class imbalance problem and improve detection performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Tonkin M et al (2017) Using offender crime scene behavior to link stranger sexual assaults: a comparison of three statistical approaches. J Crim Just 50:19–28. https://doi.org/10.1016/j.jcrimjus.2017.04.002

    Article  Google Scholar 

  2. C. M. d. M. Mota, C. J. J. d. Figueiredo, D. V. e. S. Pereira (2020) Identifying areas vulnerable to homicide using multiple criteria analysis and spatial analysis. Omega 102211. https://doi.org/10.1016/j.omega.2020.102211

  3. Chohlas-Wood A, Levine ES (2019) A recommendation engine to aid in identifying crime patterns. INFORMS Journal on Applied Analytics. https://doi.org/10.1287/inte.2019.0985

  4. Isafiade OE, Bagula AB (2020) Series mining for public safety advancement in emerging smart cities. Future Generation Computer Systems 108:777–802. https://doi.org/10.1016/j.future.2020.03.002

    Article  Google Scholar 

  5. Porter MD (2016) A statistical approach to crime linkage. Am Stat 70(2):152–165. https://doi.org/10.1080/00031305.2015.1123185

    Article  MathSciNet  Google Scholar 

  6. Hazelwood RR, Warren JI (2004) Linkage analysis: modus operandi, ritual, and signature in serial sexual crime. Aggress Violent Behav 9(3):307–318. https://doi.org/10.1016/j.avb.2004.02.002

    Article  Google Scholar 

  7. Woodhams J et al (2018) Linking serial sexual offences: Moving towards an ecologically valid test of the principles of crime linkage. Legal and Criminological Psychology 24:12S–140S

    Google Scholar 

  8. Canter D, Hammond L A comparison of the efficacy of different decay functions in geographical profiling for a sample of US serial killers. Journal of Investigative Psychology and Offender Profiling 3(2):91–103. https://doi.org/10.1002/jip.45

  9. Wang T, Rudin C, Wagner D, Sevieri R (Mar 2015) Finding patterns with a rotten core: data mining for crime series with cores. Big Data 3(1):3–21. https://doi.org/10.1089/big.2014.0021

    Article  Google Scholar 

  10. Markson L, Woodhams J, Bond JW (2010) Linking serial residential burglary: comparing the utility of modus operandi behaviours, geographical proximity, and temporal proximity. Journal of Investigative Psychology and Offender Profiling. https://doi.org/10.1002/jip.120

  11. Woodhams J, Hollin CR, Bull R (2007) The psychology of linking crimes: a review of the evidence. Leg Criminol Psychol 12(2):233–249. https://doi.org/10.1348/135532506x118631

    Article  Google Scholar 

  12. Burrell A, Bull R, Bond J (2012) Linking personal robbery offences using offender behaviour. J Investig Psychol Offender Profiling 9(3):201–222. https://doi.org/10.1002/jip.1365

    Article  Google Scholar 

  13. Chi H, Lin Z, Jin H, Xu B, Qi M (2017) A decision support system for detecting serial crimes. Knowl-Based Syst 123:88–101. https://doi.org/10.1016/j.knosys.2017.02.017

    Article  Google Scholar 

  14. Phua C, Gayler R, Lee V, Smith-Miles K (2009) On the communal analysis suspicion scoring for identity crime in streaming credit applications. Eur J Oper Res 195(2):595–612. https://doi.org/10.1016/j.ejor.2008.02.015

    Article  MATH  Google Scholar 

  15. Gee D, Belofastov A (2007) Profiling sexual fantasy. In: Kocsis RN (ed) Criminal profiling: international theory, research, and practice. Humana Press, Totowa, NJ, pp 49–71. https://doi.org/10.1007/978-1-60327-146-2_3

    Chapter  Google Scholar 

  16. Borg A, Boldt M, Lavesson N, Melander U, Boeva V (2014) Detecting serial residential burglaries using clustering. Expert Syst Appl 41(11):5252–5266. https://doi.org/10.1016/j.eswa.2014.02.035

    Article  Google Scholar 

  17. Chen L, Gu W, Tian X, Chen G (2019) AHAB: aligning heterogeneous knowledge bases via iterative blocking. Inf Process Manag 56(1):1–13. https://doi.org/10.1016/j.ipm.2018.08.006

    Article  Google Scholar 

  18. O’Hare K, Jurek A, de Campos C (2018) A new technique of selecting an optimal blocking method for better record linkage. Inf Syst 77:151–166. https://doi.org/10.1016/j.is.2018.06.006

    Article  Google Scholar 

  19. Christen P (2011) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555

    Article  Google Scholar 

  20. Lin S, Brown DE (2006) An outlier-based data association method for linking criminal incidents. Decis Support Syst 41(3):604–615. https://doi.org/10.1016/j.dss.2004.06.005

    Article  Google Scholar 

  21. Borg A, Boldt M (2016) Clustering residential burglaries using modus operandi and spatiotemporal information. International Journal of Information Technology & Decision Making 15(01):23–42. https://doi.org/10.1142/s0219622015500339

    Article  Google Scholar 

  22. Zhu S, Xie Y (2019) Crime event embedding with unsupervised feature selection. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 3922–3926

    Chapter  Google Scholar 

  23. Bennell C, Canter DV (2002) Linking commercial burglaries by modus operandi: tests using regression and ROC analysis. Sci Justice 42(3):153–164. https://doi.org/10.1016/s1355-0306(02)71820-0

    Article  Google Scholar 

  24. Tonkin M, Grant T, Bond JW (2008) To link or not to link: a test of the case linkage principles using serial car theft data. 5(1–2):59–77. https://doi.org/10.1002/jip.74

  25. Tonkin M, Woodhams J, Bull R, Bond JW, Santtila P (2012) A comparison of logistic regression and classification tree analysis for Behavioural case linkage. J Investig Psychol Offender Profiling 9(3):235–258. https://doi.org/10.1002/jip.1367

    Article  Google Scholar 

  26. Ku C-H, Leroy G (2014) A decision support system: automated crime report analysis and classification for e-government. Gov Inf Q 31(4):534–544. https://doi.org/10.1016/j.giq.2014.08.003

    Article  Google Scholar 

  27. Reich BJ, Porter MD (2015) Partially supervised spatiotemporal clustering for burglary crime series identification. Journal of the Royal Statistical Society Series a Statistics in Society 178(2):465–480. https://doi.org/10.1111/RSSA.12076

    Article  MathSciNet  Google Scholar 

  28. Goala S, Dutta P (2018) A fuzzy multicriteria decision-making approach to crime linkage. International Journal of Information Technologies and Systems Approach 11(2):31–50. https://doi.org/10.4018/ijitsa.2018070103

    Article  Google Scholar 

  29. Albertetti F, Cotofrei P, Grossrieder L, Ribaux O, Stoffel K (2013) The CriLiM methodology: crime linkage with a fuzzy mcdm approach. In: Proceedings - 2013 European intelligence and security informatics conference, EISIC, vol 2013, pp 67–74. https://doi.org/10.1109/EISIC.2013.17

    Chapter  Google Scholar 

  30. Qazi N, Wong BLW (2019) An interactive human centered data science approach towards crime pattern analysis. Information Processing & Management 56(6):102066. https://doi.org/10.1016/j.ipm.2019.102066

    Article  Google Scholar 

  31. Brown DE, Hagen S (2003) Data association methods with applications to law enforcement. Decis Support Syst 34(3):369–378

    Article  Google Scholar 

  32. Boriah S, Chandola V, Kumar V (2008) Similarity Measures for Categorical Data: A Comparative Evaluation. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp 243–254. https://doi.org/10.1137/1.9781611972788.22

    Chapter  Google Scholar 

  33. Bennell C, Jones NJ, Melnyk T (2009) Addressing problems with traditional crime linking methods using receiver operating characteristic analysis. Leg Criminol Psychol 14(2):293–310. https://doi.org/10.1348/135532508x349336

    Article  Google Scholar 

  34. Mikolov T, Chen K, Corrado G, Dean J Efficient Estimation of Word Representations in Vector Space. In: arXiv e-prints Accessed on: January 01, 2013Available: https://ui.adsabs.harvard.edu/\#abs/2013arXiv1301.3781M

  35. Tonkin M, Lemeire J, Santtila P, Winter JM (2019) Linking property crime using offender crime scene behaviour: A comparison of methods. Journal of Investigative Psychology and Offender Profiling. https://doi.org/10.1002/jip.1525

  36. Papadakis G, Skoutas D, Thanos E, Palpanas T (2020) Blocking and Filtering Techniques for Entity Resolution: A Survey. ACM Computing Surveys 53(2):1–42. https://doi.org/10.1145/3377455

    Article  Google Scholar 

  37. I. Fellegi and A. Sunter, "A Theory for Record Linkage," Journal of the American Statistical Association, vol. 64, pp. 1183–1210, . doi: https://doi.org/10.1080/01621459.1969.10501049

  38. Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Presented at the international conference on Management of Data. https://doi.org/10.1145/1559845.1559870

    Chapter  Google Scholar 

  39. Gravano L (2001) Approximate string joins in a database (almost) for free. In: Vldb 01: international conference on very large data bases

    Google Scholar 

  40. Jin L, Li C, Mehrotra S (2003) Efficient record linkage in large data sets. In: Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings, pp 137–146

    Google Scholar 

  41. Hernández MA, Stolfo SJ (1995) The merge/purge problem for large databases. ACM SIGMOD Rec 24(2):127–138

    Article  Google Scholar 

  42. Aizawa A, Oyama K (2005) A fast linkage detection scheme for multi-source information integration. In: International Workshop on Challenges in Web Information Retrieval and Integration, pp 30–39

    Chapter  Google Scholar 

  43. Allam A, Skiadopoulos S, Kalnis P (2018) Improved suffix blocking for record linkage and entity resolution. Data Knowl Eng 117:98–113. https://doi.org/10.1016/j.datak.2018.07.005

    Article  Google Scholar 

  44. O'Hare K, Jurek-Loughrey A, Campos C (2019) A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage. In: Linking and Mining Heterogeneous and Multi-view Data. Springer, pp 79–105. https://doi.org/10.1007/978-3-030-01872-6_4

    Chapter  Google Scholar 

  45. Bilenko M, Kamath B, Mooney RJ (2006) Adaptive blocking: Learning to scale up record linkage. In: Sixth International Conference on Data Mining (ICDM'06). IEEE, pp 87–96

    Chapter  Google Scholar 

  46. Kejriwal M, Miranker DP (2013) An unsupervised algorithm for learning blocking schemes. In: 2013 IEEE 13th International Conference on Data Mining. IEEE, pp 340–349

    Chapter  Google Scholar 

  47. Nascimento DC, Pires CES, Mestre DG (2019) Exploiting block co-occurrence to control block sizes for entity resolution. Knowl Inf Syst 62(1):359–400. https://doi.org/10.1007/s10115-019-01347-0

    Article  Google Scholar 

  48. O’Hare K, Jurek-Loughrey A, de Campos C (2019) An unsupervised blocking technique for more efficient record linkage. Data Knowl Eng 122:181–195

    Article  Google Scholar 

  49. Michelson M, Knoblock CA (2006) Learning blocking schemes for record linkage. In: AAAI, vol 6, pp 440–445

    Google Scholar 

  50. Ramadan B, Christen P (2015) Unsupervised blocking key selection for real-time entity resolution. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 574–585

    Chapter  Google Scholar 

  51. Song D, Luo Y, Heflin J (2017) Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection. IEEE Trans Knowl Data Eng 29(1):143–156. https://doi.org/10.1109/tkde.2016.2606399

    Article  Google Scholar 

  52. Carr RD, Doddi S, Konjevod G, Marathe M (2000) C. Association For Computing Machinery Inc; Association For, and I. N. C. Machinery. In: On the red-blue set cover problem (Proceedings of the Eleventh Annual Acm-Siam Symposium on Discrete Algorithms), pp 345–353

    MATH  Google Scholar 

  53. Li Y-S, Qi M-L (2019) An approach for understanding offender modus operandi to detect serial robbery crimes. Journal of Computational Science 36:101024. https://doi.org/10.1016/j.jocs.2019.101024

    Article  Google Scholar 

  54. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  Google Scholar 

  55. De Caigny A, Coussement K, De Bock KW (2018) A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur J Oper Res 269(2):760–772. https://doi.org/10.1016/j.ejor.2018.02.009

    Article  MathSciNet  MATH  Google Scholar 

  56. Pedregosa F et al (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12(85):2825–2830

    MathSciNet  MATH  Google Scholar 

  57. Su C, Ju S, Liu Y, Yu Z (2015) Improving random Forest and rotation Forest for highly imbalanced datasets. Intelligent Data Analysis 19(6):1409–1432. https://doi.org/10.3233/ida-150789

    Article  Google Scholar 

  58. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueyan Shao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Shao, X. A supervised machine learning framework with combined blocking for detecting serial crimes. Appl Intell 52, 11517–11538 (2022). https://doi.org/10.1007/s10489-021-02942-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02942-x

Keywords

Navigation