Abstract
Detecting serial crimes is to find criminals who have committed multiple crimes. A classification technique is often used to process serial crime detection, but the pairwise comparison of crimes is of quadratic complexity, and the number of nonserial case pairs far exceeds the number of serial case pairs. The blocking method can play a role in reducing pairwise calculation and eliminating nonserial case pairs. But the limitation of previous studies is that most of them use a single criterion to select blocks, which is difficult to guarantee an excellent blocking result. Some studies integrate multiple criteria into one comprehensive index. However, the performance is easily affected by the weighting method. In this paper, we propose a combined blocking (CB) approach. Each criminal behaviour is defined as a behaviour key (BHK) and used to form a block. CB learns several weak blocking schemes by different blocking criteria and then combines them to form the final blocking scheme. The final blocking scheme consists of several BHKs. Because rare behaviour can better identify crime series, each BHK is assigned a score according to its rarity. BHKs and their scores are used to determine whether a case pair need to be compared. After comparing with multiple blocking methods, CB can effectively guarantee the number of serial case pairs while greatly reducing unnecessary nonserial case pairs. The CB is embedded in a supervised machine learning framework. Experiments on real-world robbery cases demonstrate that it can effectively reduce pairwise comparison, alleviate the class imbalance problem and improve detection performance.
Similar content being viewed by others
References
Tonkin M et al (2017) Using offender crime scene behavior to link stranger sexual assaults: a comparison of three statistical approaches. J Crim Just 50:19–28. https://doi.org/10.1016/j.jcrimjus.2017.04.002
C. M. d. M. Mota, C. J. J. d. Figueiredo, D. V. e. S. Pereira (2020) Identifying areas vulnerable to homicide using multiple criteria analysis and spatial analysis. Omega 102211. https://doi.org/10.1016/j.omega.2020.102211
Chohlas-Wood A, Levine ES (2019) A recommendation engine to aid in identifying crime patterns. INFORMS Journal on Applied Analytics. https://doi.org/10.1287/inte.2019.0985
Isafiade OE, Bagula AB (2020) Series mining for public safety advancement in emerging smart cities. Future Generation Computer Systems 108:777–802. https://doi.org/10.1016/j.future.2020.03.002
Porter MD (2016) A statistical approach to crime linkage. Am Stat 70(2):152–165. https://doi.org/10.1080/00031305.2015.1123185
Hazelwood RR, Warren JI (2004) Linkage analysis: modus operandi, ritual, and signature in serial sexual crime. Aggress Violent Behav 9(3):307–318. https://doi.org/10.1016/j.avb.2004.02.002
Woodhams J et al (2018) Linking serial sexual offences: Moving towards an ecologically valid test of the principles of crime linkage. Legal and Criminological Psychology 24:12S–140S
Canter D, Hammond L A comparison of the efficacy of different decay functions in geographical profiling for a sample of US serial killers. Journal of Investigative Psychology and Offender Profiling 3(2):91–103. https://doi.org/10.1002/jip.45
Wang T, Rudin C, Wagner D, Sevieri R (Mar 2015) Finding patterns with a rotten core: data mining for crime series with cores. Big Data 3(1):3–21. https://doi.org/10.1089/big.2014.0021
Markson L, Woodhams J, Bond JW (2010) Linking serial residential burglary: comparing the utility of modus operandi behaviours, geographical proximity, and temporal proximity. Journal of Investigative Psychology and Offender Profiling. https://doi.org/10.1002/jip.120
Woodhams J, Hollin CR, Bull R (2007) The psychology of linking crimes: a review of the evidence. Leg Criminol Psychol 12(2):233–249. https://doi.org/10.1348/135532506x118631
Burrell A, Bull R, Bond J (2012) Linking personal robbery offences using offender behaviour. J Investig Psychol Offender Profiling 9(3):201–222. https://doi.org/10.1002/jip.1365
Chi H, Lin Z, Jin H, Xu B, Qi M (2017) A decision support system for detecting serial crimes. Knowl-Based Syst 123:88–101. https://doi.org/10.1016/j.knosys.2017.02.017
Phua C, Gayler R, Lee V, Smith-Miles K (2009) On the communal analysis suspicion scoring for identity crime in streaming credit applications. Eur J Oper Res 195(2):595–612. https://doi.org/10.1016/j.ejor.2008.02.015
Gee D, Belofastov A (2007) Profiling sexual fantasy. In: Kocsis RN (ed) Criminal profiling: international theory, research, and practice. Humana Press, Totowa, NJ, pp 49–71. https://doi.org/10.1007/978-1-60327-146-2_3
Borg A, Boldt M, Lavesson N, Melander U, Boeva V (2014) Detecting serial residential burglaries using clustering. Expert Syst Appl 41(11):5252–5266. https://doi.org/10.1016/j.eswa.2014.02.035
Chen L, Gu W, Tian X, Chen G (2019) AHAB: aligning heterogeneous knowledge bases via iterative blocking. Inf Process Manag 56(1):1–13. https://doi.org/10.1016/j.ipm.2018.08.006
O’Hare K, Jurek A, de Campos C (2018) A new technique of selecting an optimal blocking method for better record linkage. Inf Syst 77:151–166. https://doi.org/10.1016/j.is.2018.06.006
Christen P (2011) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555
Lin S, Brown DE (2006) An outlier-based data association method for linking criminal incidents. Decis Support Syst 41(3):604–615. https://doi.org/10.1016/j.dss.2004.06.005
Borg A, Boldt M (2016) Clustering residential burglaries using modus operandi and spatiotemporal information. International Journal of Information Technology & Decision Making 15(01):23–42. https://doi.org/10.1142/s0219622015500339
Zhu S, Xie Y (2019) Crime event embedding with unsupervised feature selection. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 3922–3926
Bennell C, Canter DV (2002) Linking commercial burglaries by modus operandi: tests using regression and ROC analysis. Sci Justice 42(3):153–164. https://doi.org/10.1016/s1355-0306(02)71820-0
Tonkin M, Grant T, Bond JW (2008) To link or not to link: a test of the case linkage principles using serial car theft data. 5(1–2):59–77. https://doi.org/10.1002/jip.74
Tonkin M, Woodhams J, Bull R, Bond JW, Santtila P (2012) A comparison of logistic regression and classification tree analysis for Behavioural case linkage. J Investig Psychol Offender Profiling 9(3):235–258. https://doi.org/10.1002/jip.1367
Ku C-H, Leroy G (2014) A decision support system: automated crime report analysis and classification for e-government. Gov Inf Q 31(4):534–544. https://doi.org/10.1016/j.giq.2014.08.003
Reich BJ, Porter MD (2015) Partially supervised spatiotemporal clustering for burglary crime series identification. Journal of the Royal Statistical Society Series a Statistics in Society 178(2):465–480. https://doi.org/10.1111/RSSA.12076
Goala S, Dutta P (2018) A fuzzy multicriteria decision-making approach to crime linkage. International Journal of Information Technologies and Systems Approach 11(2):31–50. https://doi.org/10.4018/ijitsa.2018070103
Albertetti F, Cotofrei P, Grossrieder L, Ribaux O, Stoffel K (2013) The CriLiM methodology: crime linkage with a fuzzy mcdm approach. In: Proceedings - 2013 European intelligence and security informatics conference, EISIC, vol 2013, pp 67–74. https://doi.org/10.1109/EISIC.2013.17
Qazi N, Wong BLW (2019) An interactive human centered data science approach towards crime pattern analysis. Information Processing & Management 56(6):102066. https://doi.org/10.1016/j.ipm.2019.102066
Brown DE, Hagen S (2003) Data association methods with applications to law enforcement. Decis Support Syst 34(3):369–378
Boriah S, Chandola V, Kumar V (2008) Similarity Measures for Categorical Data: A Comparative Evaluation. In: Proceedings of the 2008 SIAM International Conference on Data Mining, pp 243–254. https://doi.org/10.1137/1.9781611972788.22
Bennell C, Jones NJ, Melnyk T (2009) Addressing problems with traditional crime linking methods using receiver operating characteristic analysis. Leg Criminol Psychol 14(2):293–310. https://doi.org/10.1348/135532508x349336
Mikolov T, Chen K, Corrado G, Dean J Efficient Estimation of Word Representations in Vector Space. In: arXiv e-prints Accessed on: January 01, 2013Available: https://ui.adsabs.harvard.edu/\#abs/2013arXiv1301.3781M
Tonkin M, Lemeire J, Santtila P, Winter JM (2019) Linking property crime using offender crime scene behaviour: A comparison of methods. Journal of Investigative Psychology and Offender Profiling. https://doi.org/10.1002/jip.1525
Papadakis G, Skoutas D, Thanos E, Palpanas T (2020) Blocking and Filtering Techniques for Entity Resolution: A Survey. ACM Computing Surveys 53(2):1–42. https://doi.org/10.1145/3377455
I. Fellegi and A. Sunter, "A Theory for Record Linkage," Journal of the American Statistical Association, vol. 64, pp. 1183–1210, . doi: https://doi.org/10.1080/01621459.1969.10501049
Whang SE, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H (2009) Entity resolution with iterative blocking. In: Presented at the international conference on Management of Data. https://doi.org/10.1145/1559845.1559870
Gravano L (2001) Approximate string joins in a database (almost) for free. In: Vldb 01: international conference on very large data bases
Jin L, Li C, Mehrotra S (2003) Efficient record linkage in large data sets. In: Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings, pp 137–146
Hernández MA, Stolfo SJ (1995) The merge/purge problem for large databases. ACM SIGMOD Rec 24(2):127–138
Aizawa A, Oyama K (2005) A fast linkage detection scheme for multi-source information integration. In: International Workshop on Challenges in Web Information Retrieval and Integration, pp 30–39
Allam A, Skiadopoulos S, Kalnis P (2018) Improved suffix blocking for record linkage and entity resolution. Data Knowl Eng 117:98–113. https://doi.org/10.1016/j.datak.2018.07.005
O'Hare K, Jurek-Loughrey A, Campos C (2019) A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage. In: Linking and Mining Heterogeneous and Multi-view Data. Springer, pp 79–105. https://doi.org/10.1007/978-3-030-01872-6_4
Bilenko M, Kamath B, Mooney RJ (2006) Adaptive blocking: Learning to scale up record linkage. In: Sixth International Conference on Data Mining (ICDM'06). IEEE, pp 87–96
Kejriwal M, Miranker DP (2013) An unsupervised algorithm for learning blocking schemes. In: 2013 IEEE 13th International Conference on Data Mining. IEEE, pp 340–349
Nascimento DC, Pires CES, Mestre DG (2019) Exploiting block co-occurrence to control block sizes for entity resolution. Knowl Inf Syst 62(1):359–400. https://doi.org/10.1007/s10115-019-01347-0
O’Hare K, Jurek-Loughrey A, de Campos C (2019) An unsupervised blocking technique for more efficient record linkage. Data Knowl Eng 122:181–195
Michelson M, Knoblock CA (2006) Learning blocking schemes for record linkage. In: AAAI, vol 6, pp 440–445
Ramadan B, Christen P (2015) Unsupervised blocking key selection for real-time entity resolution. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 574–585
Song D, Luo Y, Heflin J (2017) Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection. IEEE Trans Knowl Data Eng 29(1):143–156. https://doi.org/10.1109/tkde.2016.2606399
Carr RD, Doddi S, Konjevod G, Marathe M (2000) C. Association For Computing Machinery Inc; Association For, and I. N. C. Machinery. In: On the red-blue set cover problem (Proceedings of the Eleventh Annual Acm-Siam Symposium on Discrete Algorithms), pp 345–353
Li Y-S, Qi M-L (2019) An approach for understanding offender modus operandi to detect serial robbery crimes. Journal of Computational Science 36:101024. https://doi.org/10.1016/j.jocs.2019.101024
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
De Caigny A, Coussement K, De Bock KW (2018) A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur J Oper Res 269(2):760–772. https://doi.org/10.1016/j.ejor.2018.02.009
Pedregosa F et al (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12(85):2825–2830
Su C, Ju S, Liu Y, Yu Z (2015) Improving random Forest and rotation Forest for highly imbalanced datasets. Intelligent Data Analysis 19(6):1409–1432. https://doi.org/10.3233/ida-150789
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Shao, X. A supervised machine learning framework with combined blocking for detecting serial crimes. Appl Intell 52, 11517–11538 (2022). https://doi.org/10.1007/s10489-021-02942-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02942-x