1 Introduction

A crime is a form of violence or illegal act done by a perpetrator against another person that can cause harm or property damage and is punishable by the law of the governing state of authority in which the crime was carried out. Law authorities apply crime-solving techniques to take preventive measures but in many cases, they cannot deliver effective results (Dakalbab et al. 2022). Over the years, crimes have continued to increase within countries. In another study, it is stated that the top three countries with the highest crimes as of 2021 were South Africa, Venezuela, and Papua New Guinea (Matereke et al. 2021). As the population of a country increases, crime rates within that country increase (Walczak 2021), this also impacts the accurate prediction of crime (Pratibha et al. 2020). The safety and security of a country are a very important part of its growth and economic development, crime prediction will decrease economic loss and increase public safety in exchange (ToppiReddy et al. 2018; Rumi et al. 2018). Governments are responsible for ensuring the safety of their citizens to control and maintain crime incidents, threats, and data being collected to better the lives of their citizens (Butt et al. 2020).

Data mining and ML are both versatile fields that involve the use of computers and mathematics where the programming is completed for the system to perform certain tasks, these are both important parts of crime prevention and detection (Bandekar and Vijayalakshmi 2020). Data mining can be considered as the process where discovers of new patterns from large data sets involving methods from statistics and AI, but also database management.

Different techniques have been developed for crime prediction and in recent years, more researchers are publishing papers and finding interest in the topic, which can be seen in the upward trend of publications between 2021 and 2022 in Fig. 2. Crime is a growing pandemic that contributes to social and economic issues which in turn negatively affect the development of a country and its communities (Matereke et al. 2021). Both social and economic elements, a.k.a., features, contribute to a perpetrator committing a crime or a victim being the subject of a crime, these features can be used to predict the future occurrence of future crimes and help decrease crime rates across many communities (Hajela et al. 2020).

The availability of enormous volume of data being made available by certain governments has given motivation to researchers to further pursue research in the field of crime. Historical data has made it an interesting subject that sparked attention in research, many researchers have proposed several different models for predicting the future occurrence of crimes (Pratibha et al. 2020). In some areas law authorities have restrictions over their data and may not make this available to researchers in the area, causing further frustration and disappointment.

Governments and police have access to a large amount of data that could be used to help reduce the crime rate (Yuki et al. 2019). Crime pattern theory also suggests that offenders prefer not to venture into unknown territories or areas and that they would rather commit opportunistic and violent crimes by taking advantage of familiar areas they have previously encountered (Jalil et al. 2017). Crimes are also not distributed evenly and uniformly nor are they random in an area or city. A hotspot is defined as an area or region where crimes would commonly take place. Based on this knowledge, it is worth noting that mapping crime hotspots can help researchers understand the reasons behind frequent crimes in a specific area (Kadar and Pletikosa 2018; Kadar et al. 2019).

Machine learning (ML) is a subfield of Artificial Intelligence being used across many different fields today to predict the future occurrence of certain events as well as better decision making. ML can be understood as the study of computer algorithms that can automatically improve on their own through experience/learning and by the use of data. Deep Learning (DL) is a subset of machine learning that is inspired by how our brains function, this technique is an artificial neural network that includes many different layers and layer types (e.g., pooling layer, convolution layer, fully-connected layer, dropout layer) that attempt to replicate the behavior of our brains. There exist four types of learning types, which are supervised, semi-supervised, unsupervised, and reinforcement learning. AI comprises both computer and mathematics (i.e., statistics) aspects where the programming is performed for the system to perform a certain action, commonly associated with humans (He and Zheng 2021).

To develop a highly accurate crime prediction model, it is important to understand the nature of a crime (Elluri et al. 2019). The nature of a crime could include features relating to the crime such as the offender(s) age, gender, location, number of offenders, education status, income, the weapon used, victim(s) age, gender, location, economic status, education status, time, date, day of the week, year, month, to name a few.

In Fig. 1, we have identified the objectives for each of the articles in our paper, the articles have been categorized into 6 main objectives namely: Social media crime prediction, Novel crime prediction, Suspect or Offender prediction, Spatial–temporal crime hotspot prediction, Feature selection, and Crime patterns and mapping. We need to understand the objectives of the articles to draw accurate findings from our research, understand the direction, gaps, and challenges faced by other researchers in the study of crime prediction, and motivate the research taken.

Fig. 1
figure 1

Distribution of objectives

Researchers have made numerous amounts of contributions to crime investigation and prediction. Unlike most industries; health care, transportation, agriculture, finance, retail, and customer services, crime prediction has a lack of comprehensive and systematic literature reviews, which can help to organize, and summarize existing literature, evidence, and challenges they encounter. This SLR study focuses on the prediction of future occurrences of crime using machine learning techniques between the years 2010 and 2022, to answer the identified research questions and provide further research on the gaps that have been recognized in the field. We seek to broaden the opportunity for further research on crime investigation and machine learning and our motivation for this work was not to only publish research on previous and current studies on crime prediction, but to:

  • Demonstrate the need for the availability of crime data to researchers

  • Help to decrease the crime rate in our communities by synthesizing the existing knowledge

  • Identify the gaps in current findings

  • Help to improve existing models

The purpose of this paper is to provide a better understanding of ML algorithms used in crime prediction and analysis. Contributions to crime prediction are ongoing. This SLR aims to cover literature between the Jan 2010 and August 2022 and is guided by the work (Catal 2012; Ligthart et al. 2021; Kitchenham and Charters 2007). These guidelines are used widely in literature for conducting SLR.

The rest of the paper is organized into the following sections: Sect. 2 presents the related work. Section 3 discusses the adopted research methodology followed throughout the paper. Section 4 presents the results found in the SLR for the research questions we have identified. Section 5 discusses our findings and results in detail and finally, in Sect. 6 we present our conclusion and future work.

2 Related work

This study seeks to explore different machine learning algorithms and techniques used in crime prediction. Our findings are reported together with the challenges identified by the researchers to help gain a better understanding of the current state-of-the-art that is available to conduct further research and build better performing and more accurate models to fight crime.

A review (Falade et al. 2019) is conducted where they focused on crime prediction using data mining and further concluded that crime prediction using data mining is a hot research topic due to the impact of crime on the socio-economic development of a nation (Kounadi et al. 2020). They stated that the data being used could be statistical reports of crimes in an area or region. They also note that crime prediction research will positively benefit society in assisting law enforcement agencies and governments understand the multiple layers that contribute to the cause of a crime. This research may also aid governments in making better decision-making approaches for the security and safety of their citizens and provide a more proactive approach towards the improvement of communities and decrease in crime.

In 2020, a review (Butt et al. 2020) is published that discusses the approaches of Spatio-temporal crime hotspot detection and prediction. The researchers state that the availability of large amounts of data that are being collected and made available to the public has made it more possible to conduct and pursue further research in the field of crime and crime investigation. The availability of historical data allows for the forecasting of future crimes, with it being a growing point of interest to develop significant machine learning models to assist in discovering different features that are related to crime prediction (Butt et al. 2020). It is also worth noting that the researchers proposed in their future work to explore the enhancement of clustering approaches for crime hotspot detection (DBSCAN), enhancement of time series analysis for crime forecasting (ARIMA), exploration of transfer learning in crime prediction, and the use of LSTM with exponential smoothing for crime prediction. The researchers concluded by stating, that for a novel spatial–temporal model, datasets should be produced to improve the proposed methods and that regions or countries should have datasets available or should collect data to help scientists in the development of better crime prediction methods to decrease crime and grow the country.

There is another a paper on spatial crime forecasting, spatial referring to space and time (Kounadi et al. 2020). The researchers investigated crime both in time and space in this paper. They followed the “PRISMA” (Liberati et al. 2009) reporting process in their review to report their findings and gain knowledge and analyze the state-of-the-art techniques and methods in current studies of crime prediction. Some of the limitations outlined by the researchers in the study included the terminology, which is not consistent and could be due to the different academic backgrounds of the stakeholders (e.g., criminology, data analyst, computer science, geoscience, and public policy) and those significant details of some experiments are partly or not at all reported.

Our last related work-study (Ippolito and Lozano 2020) focuses on tax crime prediction using machine learning. It was noted as a significant paper as this crime is known as a white-collar crime, most papers in the study focused on violent crimes. The researchers seek to better decision-making in fiscal audit plans relating to service taxes in the municipality of São Paulo. In the study, the researchers applied Neural Networks, Naïve Bayes, Decision trees, Ensemble learning, Random forests, and Logistic regression. The researchers also conducted manual face-to-face data collection of fiscal audits and plans from 2016 to 2019. The results obtained showed that Random Forest provides the highest accuracy of 66.2%. Machine learning can better assist governments in making better decisions and plans around tax audits. Tax crime prediction allows governments to prepare for these crimes before they are committed. They conclude by stating that machine learning helps make predictions on crimes against law systems, and predictions can guide better decision-making and planning of fiscal audits more assertively.

Governments and law enforcement agencies are the gatekeepers to crime datasets (historical and present) and ensure their maintenance thereof (Falade et al. 2019). Accurate predictions of future crimes can have a positive impact on society and the economy (Butt et al. 2020). It is seen from Ippolito et al. that a crime detection system can alert users and make them cautious of the crime they are about to commit and the likelihood of them being caught, the implementation of such a system in society could leverage the same impact on violent crime (Ippolito and Lozano 2020). Safer and more secure conditions lead to economic growth and sustainable development. The literature presents many different approaches to crime prediction using machine learning, compared to other review studies, to our knowledge this study includes recent papers and has different research questions applied. As such, the observations and suggestions are different than those of previous studies (Table 1).

Table 1 Related review studies

3 Research methodology

Systematic literature reviews (SLRs) are and have been used in many different fields of academia from engineering to medicine to gather and summarize data on a certain research topic, we employed guidelines (Kitchenham et al. 2010) to follow a structured approach for our study. By using an SLR, we can also identify the challenges and possible solutions that can be employed. It defines the guidelines for the following three (3) main phases:

  1. 1.

    Planning the review: Gathering related data and research work related to our research topic (i.e., the use of machine learning in crime prediction), defining research questions and systematic search protocols that include the selection of keyword strings to be used for related papers, and how this criterion will be applied to the papers. The planning phase also includes the following phases:

    1. a.

      Identification of the need for a review

    2. b.

      Commissioning a review

    3. c.

      Specifying the research question(s)

    4. d.

      Developing a review protocol

    5. e.

      Evaluating the review protocol

  2. 2.

    Conducting the review: Putting a classification schema in place, details on how the data and papers will be separated for analysis where features are grouped based on similar or common attributes. The papers are subject to inclusion and exclusion criteria. The papers that pass the criteria should meet the minimum quality assessment threshold, which is mostly selected as the mean value. Critical data from these papers are extracted and synthesized to present a general overview of the understanding of how machine learning can be used for crime prediction to identify possible gaps and opportunities in the selected field. The phases in the conducting phase include:

    1. a.

      Identification of research

    2. b.

      Selection of primary studies

    3. c.

      Study quality assessment

    4. d.

      Data extraction and monitoring

    5. e.

      Data synthesis

  3. 3.

    Reporting the review: The final step discusses and presents our findings, the research questions which were identified in step 1 are addressed and visually presented with graphs, figures, and tablesß if needed and may include:

    1. a.

      Specifying dissemination mechanisms

    2. b.

      Formatting the main report

    3. c.

      Evaluating the report

3.1 Research questions

The research questions that are investigated are presented as follows:

  1. 1.

    What are the objectives of the paper? To answer this question, we need to understand what the researcher aims to achieve. What is the desired output variable? (e.g., suspect detection, crime hotspot detection, etc.).

  2. 2.

    What data source types have been used to collect data?

  3. 3.

    What independent variables are (a.k.a., features) used in the publications?

  4. 4.

    Which ML algorithm(s) performed best in the study?

  5. 5.

    Which public datasets have been used?

  6. 6.

    Which evaluation metrics have been applied?

  7. 7.

    What are the top 5 ML categories?

  8. 8.

    What challenges have the authors identified?

3.2 Databases

We used five top reliable and trustworthy scientific databases to find relevant papers for our study. We did not select Google Scholar because it indexes non-peer-reviewed papers and sometimes non-reputable journals as well. For searching the databases, we applied search strings, which are defined in Sect. 3.4.

As shown in Table 4, IEEE Xplore provides the largest number of papers before and after the exclusion criteria, ACM Digital library being the lowest. Forward and backward snowballing was also applied in this SLR study to find more relevant papers. The papers that were collected using these methods were included in the study selection criteria and those that passed were added to the list of selected publications (Fig. 2).

Fig. 2
figure 2

Yearly trend of publications

3.3 Search strategy

We applied search parameters to the databases to collect the relevant papers for the study. An advanced search was applied to the selected databases, which included the following terms: “crime”, “crime prediction”, and “machine learning”. To help broaden our search string parameters, we read the abstract, introduction, and conclusion sections of the recent papers to find synonyms that could be included in the search string. Some of the identified words included “Neural Networks”, “Artificial Intelligence”, “Data mining” and/or “Crime patterns”. The search string that was used within each database is presented in the following subsection.

3.4 Search strings

We had to adjust the search string for each database to gather a more detailed search on our related papers for the SLR.

These search strings shown in Table 2 were applied to the databases in the title, abstract, and keywords fields. In Science Direct, we searched for “Crime Prediction” AND “Machine Learning” with Year between 2010 – 2022, this would ensure that we retrieved all the latest papers for the last 10 years. IEEE Xplore returned the largest amount of papers that we could use for our study, it is also one of the largest online scientific research databases, we applied “Crime Prediction” to all metadata, and “Crime Prediction” AND “Machine Learning” were applied to the document title. Finally, Springer Link was used and the search string applied with the title “Crime Prediction” AND “Machine learning” and a year range of 2010 and before 2023. Figure 3 shows the yearly trend of the selected papers.

Table 2 Databases and the search strings used for the study
Fig. 3
figure 3

Search methodology applied to identify relevant papers

3.5 Selection criteria

We define the selection criteria in the first phase of the guidelines to decrease any possibility of bias in the selection of the publications for the study. It is also employed to ensure that we process relevant studies, in our case, we would define these studies as those that can assist us in answering the research questions. We establish a criterion before we begin with the study to avoid and reduce the chance of a biased criterion (Kitchenham et al. 2007). Papers should respond false to all the exclusion criteria and true to all the inclusion criteria.

The following criteria have been defined as the exclusion criteria:

  1. 1.

    Only abstract is available

  2. 2.

    The paper is not in English

  3. 3.

    The publication is a review/survey paper

  4. 4.

    The publication is duplication and already retrieved from another database

  5. 5.

    The paper does not explain in detail how Machine Learning was applied

3.6 Collecting and filtering

After collecting and finding various publications by applying the defined search strings to each database, we also used snowballing, the results obtained were then gathered in an excel spreadsheet to analyze and synthesize the data. All the papers that passed the selection criteria were subject to a quality assessment, which would enable us to narrow down the publications further, for high-quality papers to be included as primary studies. It defines eight (8) quality assessment questions which we applied and then further scored a paper with 1 (yes/non-compliant), 0 (no/comply), and 0.5 (somewhat) against each question. If the score of the related paper is lower than 4.5 that paper was excluded, if that paper scored above 4.5, it was included. The quality assessment questions were defined as follows:

  1. 1.

    Are the aims of the study clearly defined?

  2. 2.

    Are the scope and experimental design of the study defined?

  3. 3.

    Is the technology assessed or used clearly defined?

  4. 4.

    Is the research process clearly defined?

  5. 5.

    Are all the study questions answered?

  6. 6.

    Are the challenges, limitations, and negative findings clearly defined?

  7. 7.

    Are the main findings on the creditability, validity, and reliability stated?

  8. 8.

    Does the conclusion relate to the aim of the purpose of the study?

3.7 Data extraction, synthesis, and reporting

To answer the research questions with the most reliable data, all publications needed to pass the quality assessment to ensure that only reliable and trustworthy publications had been included in this next phase. All the relevant data were collected from primary studies we read the articles in full to gather all the relevant data needed. We assigned each publication to a specific row and each answer was assigned to a different column in the extraction process. We found that with some of the research questions, some of the answers and features which returned were categorized into fewer groups to allow for better handling of the data. For research question 1 we identified the following categories: Novel crime prediction model, suspect/offender prediction, Spatio-temporal crime hotspot prediction, feature selection, crime patterns, and mapping, and social media crime prediction. In research question 2, we found that some answers were synonyms and we then combined them (e.g., meteorological and weather are both weather data features, social media, websites, and newspapers also are categorized into one column. In research question 3, we broke down the independent variables into crime parameters, data parameters, time parameters, offender parameters, victim parameters, and location parameters. The algorithms in research question 4 were categorized into decision-tree algorithms, regression algorithms, neural network algorithms, and others (e.g., Naive Bayes, and Support Vector Machines). Lastly, the final step in the SLR process is to report on our findings and answer all of the research questions.

4 Results

In this section, we present our findings using various visual aid graphs, and tables to illustrate all findings effectively. In total, 353 publications were found across all databases, after we applied the study selection criteria, and the quality assessment, 68 publications remained as relevant publications that we could then use throughout our study. Table 3 represents all the 68 primary studies found. Table 4 shows the number of publications retrieved after the exclusion criteria. The research questions that were identified are addressed in the following section in detail one after the other. In Fig. 3, we have applied the Prisma flow process, which is a technique used for reporting systematic reviews and meta-analyses (Kounadi et al. 2020).

Table 3 Primary studies
Table 4 Number of publications retrieved before and after exclusion criteria per database

|

4.1 Research Question 1—what are the objectives of the paper?

We first need to understand what the researchers are trying to achieve through the study. What is their desired output variable? What do they intend to predict? (e.g., suspect detection, developing a novel crime prediction model). We found that several different researchers would have different objectives and to manage this data efficiently, we would need to group our findings into more narrow / detailed categories. We then identified 6 categories in which the publication could be grouped as the researchers found a common objective within their target variable. To find these categories, we need to read all the publications. The objectives have been categorized into novel crime category prediction model included all papers that sought to find a crime prediction model to detect the future occurrence of crimes, the researcher did not focus on a single or common crime type as stated by (Yu et al. 2011) crime analysis is done by using historical data to predict the time and place where a crime could take place. Suspect / offender prediction used technology to predict the future occurrence of an offender committing a crime using location data based on the fact that offenders or crime committers do not venture into crimes in new locations and thus would repeat crimes in common areas. Only three papers focused on offender prediction. Spatio-temporal crime hotspot is the study of crime using space and time, researchers focused on space (i.e., location parameters) and time (i.e., date parameters) to build models that could predict crime using space and time, the likely location of the potential crime (Zhang et al. 2020), hotspots and cold spots have an unbalance in data and that is because cold spots are much more prevalent than hotspots. Feature selection, knowing the right type of features and variables to use is a vital part of the machine learning process and could aid in the development of an accurate predictive model. The crime patterns and mapping category focused on understanding different crime types and how certain crimes occurred in various areas or locations to aid in the prediction of future crimes. Social media crime prediction uses Natural Language Processing (NLP) methods and textual data from social platforms, newspapers, etc. to aid in predicting crimes. Figure 3 shows the distribution of these objectives.

4.2 Research Question 2—what data source types have been used to collect data?

Due to the sensitivity of the data and the information held within the datasets, five authors did not report on the data source type that they used to collect the data resulting in five Not Applicable (NA) values. We further divided the rest of our findings into five categories: Actual crime records (datasets collected and maintained by governments and law enforcement agencies), traffic data (data from traffic segments or taxi flow data), location data (data from points of interest locations), visual data (data collected by using CCTV surveillance systems), text data (data from social media platforms, newspapers and other text data sources). All categories are presented in Fig. 4.

Fig. 4
figure 4

Data source types used

4.3 Research Question 3—what independent variables are (a.k.a features) used in the publications?

A total of 44 independent variables were found, 5 of them were identified as common variables: Crime ID, date, crime type, latitude, and longitude. 23 of the publications did not clarify the independent variables they used to achieve their objectives. The remaining independent variables were then grouped into the following categories to better understand and easily separate them across the study: (1) Crime parameters that contain information regarding the crime that had occurred, (2) Date parameters that contain information regarding the date like the day of the week, week, year, month, etc., (3) Time parameters that contain information on the time the crime took place, (4) Offender parameters that contain information regarding the offender of the crime age, gender, address, income, etc., and (5) Victim parameters that represent information regarding the offender of the crime age, gender, address, income, etc. and (6) Location parameters that present information regarding the location of the crime. Table 5 shows the distribution of these categories. Based on this table, the most-used category is the location and the second most used one is related to the crime parameters category. The % of papers column represents the % of how many times the variables appeared in the papers, the % will be above 100% due to the fact the researchers use more than one variable in some studies. E.g. Crime parameters appeared 75% of the time in the papers, however, had an overall appearance of 24% out of 100% (Fig. 5).

Table 5 Independent variables by category and their distribution
Fig. 5
figure 5

Features by categories

4.4 Research question 4—which ML algorithm(s) performed best in the study?

The study identified 14 different machine learning algorithms. Each publication stated the best algorithms they used and various evaluation metrics to conclude the best-performing algorithm. If a publication only applied one algorithm, that algorithm was then noted as the best-performing algorithm from the study. We grouped the best-performing algorithms into decision tree-based algorithms, artificial neural networks, regression algorithms, and others. However, the most commonly used algorithms were Artificial Neural Networks, Random Forest, and KNN algorithms. Table 6 shows the algorithms by category and the number of times they appeared in papers (Fig. 6).

Table 6 Best performing algorithms by category
Fig. 6
figure 6

Algorithms grouped by categories

4.5 Research Question 5—which public datasets have been used?

The majority of the datasets used throughout the study were collected from actual crime records datasets collected and maintained by governments or law enforcement agencies that made their data available to the general public via their online platform where web scrapping can be performed or third-party platforms that have the datasets available for the public such as kaggle.com or UCI. The majority of the datasets being used today show that governments and law enforcement are taking an interest in the collection and maintenance of these datasets. Figure 7 shows the distribution of public and open datasets that have been used it also indicates that most of the studies preferred public datasets, which are very useful for repeatability, verifiability, and even refutability of the experiments., and we have also grouped the commonly used datasets with links in Table 7.

Fig. 7
figure 7

Distribution of Public & Private datasets used

Table 7 Used datasets in the study and their distribution

4.6 Research Question 6—which evaluation metrics have been applied?

14 different evaluation metrics were identified in the study that researchers applied to evaluate the model performance. Some of the researchers used more than one evaluation metric, which resulted in a higher % value than 100%. In some of the papers, the researchers did not clearly state the evaluation method they had applied and in others, they did not state it at all, we found that 21 papers did not state their evaluation metrics in detail. In most cases, these papers were those where only one machine learning algorithm was applied in the study. The following evaluation metrics were used more than 5 times (Joshi 2016): Accuracy, Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), Precision, Recall, Area Under the ROC curve (AUC), F1 score, Hit rate, and R2 score and the rest of the metrics were applied 4 times or less (Fig. 8).

Fig. 8
figure 8

Distribution of evaluation metrics applied

4.7 Research Question 7—what are the top 5 ML categories?

In research question 4, we identified the machine learning algorithms used in the study. We deduced this information further to understand the type of machine learning categories. As data is fed into the algorithms, the machines learn from this data and optimize their tasks to better their performance and intelligence over time. Several papers used multiple machine learning algorithms. The most applied was supervised machine learning, these tasks are those which contain labeled data for the algorithms to learn from. In semi-supervised learning, a mixture of both labeled and unlabeled data is fed into the algorithms, unsupervised algorithms learn from unlabeled data, and reinforcement algorithms are those, which are rewarded based on the desired behaviors/target outcome or punished for undesired outcomes. Reinforcement learning algorithms learn through a trial and error basis (Fig. 9).

Fig. 9
figure 9

Distribution of machine learning categories

4.8 Research Question 8—what are the challenges / limitations and possible solutions?

Researchers need to pay careful attention to the challenges they face when implementing machine learning algorithms or when seeking to develop a robust crime prediction model due to the sensitivity of the outcome variable and data contained. We picked up one primary and common challenge when conducting crime prediction, which is the accuracy and trustworthiness of the available data. Just above 50% of the papers failed to provide or detail the challenges they had faced, we then indicated these as “Challenge NA”. The same applied when looking at the possible solutions, we found that more than 50% also lacked to identify further solutions and we deduced this as "Solution NA". Several internal and external factors can contribute to the limitations of crime data and prediction. It is worth noting that one of the major challenges across the world is the reporting of crimes within communities. Underreporting is when people do not come forward with information on a crime or report a crime that has taken place to them due to fear of life and or any other threatening factors, and due to this, these statistics cannot be added or included to official crime statistics. In the work (Butt et al. 2020) it is stated that a survey by Malaysian and British police concluded that about 50% failed to report a crime. The challenges presented forward by the researchers have been narrowed down into 4 commonly faced challenges in machine learning: (1) Data collection, (2) data storage and security, (3) data pre-processing, and (4) performance issues. We have not found any issue with visualization (Table 8).

Table 8 Challenges and Solutions proposed by researchers in primary studies

5 Discussion

In this section, we present our discussion and feedback on each research question in Sect. 5.1, and we also discuss the potential threats to validity in Sect. 5.2.

5.1 General discussion

Firstly, we identified 350 + publications in total when conducting our general search into crime prediction publications. After we applied our exclusion criteria and quality assessment to our results, our total number of publications narrowed down to 68 primary studies. From our results, we have managed to identify different machine learning approaches, challenges, and possible solutions in current and past crime prediction trends. Crime prediction is a broad term that can be used by different fields of academia such as criminology and social development. We further discuss our responses to each research question in the following subsections.

5.2 Research Question 1—what are the objectives of the paper?

The results obtained from the data concluded that the objective of 68 of the publications was to produce a novel crime prediction model, it is also worth noting that the results would exceed the number of publications as some publications had more than one objective in their study (e.g., some included feature selection and a novel crime prediction model). We can conclude that an interesting point of interest would be to narrow down a crime prediction model to focus on common crimes in a specific area and develop a novel crime prediction model from this data.

5.3 Research Question 2—what data source types have been used to collect data?

To understand the approach and methodology of the researcher, the data source used was an important part of the study, also to learn the various types of data sources being used currently in crime prediction. We noted that there are several different data source types such as actual crime records, Twitter tweets, Facebook posts, census data, weather data, cellphone tower data, and point of interest data. As a result, we further narrowed down our results, most publications have made use of actual crime records in their studies. The use of actual crime data could improve crime prediction models, decrease crime rates within communities, and help researchers gain a better understanding of crime patterns (Yuki et al. 2019).

5.4 Research Question 3—what independent variables are (a.k.a features) used in the publications?

Crime is affected by both social and economic factors (Matereke et al. 2021); features from both the offenders and victims can contribute to the occurrence of a crime. Many of the researchers proposed that crime is a time and space problem and looked into pursuing prediction models about Spatio-temporal crime prediction. As an indication of this, researchers used a lot of location data, it is also suspected that criminals or offenders repeat a crime in places they are most familiar with (Bandekar and Vijayalakshmi 2020).

5.5 Research Question 4—which ML algorithm(s) performed best in the study?

Artificial Neural Networks came up as the most used machine learning algorithm for crime prediction. More complex models can be built around neural networks with ensemble algorithms combined, adding boosting parameters can help improve performance and accuracy in the models for real-time predictions.

5.6 Research Question 5—Which public datasets have been used?

80% of the publications retrieved are from public datasets, which are openly accessible to the public, we have also shown this information in Table 7. Most of the data concluded that a lot of research around crime prediction is being done in India, and one paper relieved a study in Cape Town, South Africa (Matereke et al. 2021) where the Chicago data portal was used to obtain the data. Some publications noted that due to the sensitivity of the data and how it could be used to public advantage in the wrongful hands of others, they could not disclose who the data source owner was nor where the dataset was collected from.

5.7 Research Question 6—which evaluation metrics have been applied?

It was difficult to gather some information and data from some of the publications due to the researchers not giving full or detailed information on their study, some 20 + did not disclose the type of evaluation metrics they used within the study, and only detailed their results and findings during the study process, and accuracy percentage was given in these sections in replacing. Accuracy, area under the ROC curve and precision came as the top three evaluation metrics identified in the study. Equation 1 represents how to calculate the precision parameter and also, shows how to calculate the recall metric. Area under the ROC curve (AUC) value can be understood as the ability to differentiate between classes by a classifier, and accuracy refers to the % out of 100 for the prediction made.

$${\varvec{P}}{\varvec{r}}{\varvec{e}}{\varvec{c}}{\varvec{i}}{\varvec{s}}{\varvec{i}}{\varvec{o}}{\varvec{n}}=\frac{True \, Positive }{Actual \, Results} or\frac{True \, Positive}{True \, Positive+False \, Positive}$$
(1)
$${\varvec{R}}{\varvec{e}}{\varvec{c}}{\varvec{a}}{\varvec{l}}{\varvec{l}}=True \, Positive\frac{True \, Positive }{Predicted \, results} or\frac{True \, Positive}{True \, Positive+False \, Negative}$$
(2)
$${\varvec{A}}{\varvec{c}}{\varvec{c}}{\varvec{u}}{\varvec{r}}{\varvec{a}}{\varvec{c}}{\varvec{y}}=\frac{True \, Postive+True \, Negative}{Total}$$
(3)

The precision is the ratio of True Positive (TP) over True Positive (TP) + False Positive (FP). The recall is the ratio of True Positive (TP) over True Positive (TP) + False Negative (FP) (Rumi et al. 2018).

5.8 Research Question 7—what are the top 5 machine learning categories?

In this study, the authors used supervised machine learning as the common machine learning category (Sharma et al. 2021). This means that more research can be performed for the other categories, particularly semi-supervised learning, and unsupervised learning because sometimes the number of labeled data points is quite limited or does not exist, therefore, we need models that can be used in these cases.

5.9 Research Question 8—what are the challenges/limitations and possible solutions?

We categorized our results into the following categories: Data collection, data pre-processing, data storage and security, and performance issues. It is worth noting that during our study we did not find any publication which had data visualization issues, this does not mean the challenge doesn’t exist within the domain however, it is something that could be investigated in the future. Data collection in this domain remains a general issue with data owners, governments, or law enforcement agencies not making this data available to the public, not properly maintaining this data, or not keeping such data in reliable and safe storage locations. As we can see from Fig. 2, the rise in interest in this domain as a research topic is primarily due to the increased availability of data by governments in some parts of the world.

5.10 Available commercial tools

At present, ML is being used by law enforcement and other government agencies to predict crime. These known predictive policing software are Crime anticipation system, PreCobs, PredPol, and Hunchlab, these systems also allow for more efficient allocation of resources for law enforcement agencies and provide crime prevention strategies (Carvalho and Pedrosa 2021). I is also worth noting that these systems are fairly new and may come with limitations making the evaluation of its impact in crime rates more difficult.

5.11 Potential threats to validity

To ensure that quality research in software engineering is conducted, it is important to analyze threats to validity. Our current research has some limitations to other secondary studies (Dinter et al. 2021), and is discussed as follows. Regarding construct validity, we performed an automated search instead of manually reading titles in electronic journals, this means that we might have missed some relevant papers due to our automated search. Also, query phrasing is an important construct validity threat, and each electronic database has different options for executing the corresponding query. To minimize the potential risks, query design was discussed among authors before executing it in the databases. Another potential construct validity threat is related to the data extraction forms; we might have missed some useful data fields although we have updated these fields several times during our review process. Since we carefully specified the research questions, we reached our objectives adequately and we consider that there is no internal validity threat. Regarding the conclusion validity, we followed a well-defined SLR protocol, and the process was discussed among authors before performing this research. Conclusions were derived from the collected data and personal/subjective opinions were not included.

5.12 Limitations

Even though some of these methods may present many benefits, there are also limitations. Some of these limitations may be noted as technical limitations to ML models, which may come with some of these benefits in crime prediction:

  • ML techniques do not produce accurate results or predictions immediately, because they need to learn from previous data.

  • The relationship between urban metrics and population size is not linear (Alves et al. 2018)

  • Data availability and a limited amount of resources

  • System performance issues (technical)

  • Data storage (technical)

  • Data sparsity (pre-processing of data)

5.13 Future research outlooks

We identified the following research directions to pave the way for further research:

  1. 1.

    The development of Explainable Artificial Intelligence (XAI) models / interpretable machine learning models

  2. 2.

    The use of new deep learning models such as transformers to improve the performance of models

  3. 3.

    The development of unsupervised learning-based crime prediction models

  4. 4.

    The design and implementation of a benchmarking tool that can evaluate the performance of different machine learning models on public datasets

  5. 5.

    The development of semi-supervised learning-based crime prediction models

  6. 6.

    The development of new publicly available datasets for researchers

  7. 7.

    The use of new features to build crime prediction models

  8. 8.

    The analysis of cross-country crime prediction models to analyze the commonalities between models

  9. 9.

    The development of open source tools for crime prediction

6 Conclusion and future work

This SLR provided an overview of crime prediction analysis and the various machine learning algorithms used in the field. Based on the primary selected studies, we have found that crime is affected by many internal and external factors. In this study, we systematically reviewed the critical aspects of crime prediction by following the guidelines of the work (Kitchenham et al. 2007). Researchers have made numerous amounts of contributions to the study of crime investigation and prediction. Unlike most industries; health care, transportation, agriculture, finance, retail, and customer services crime prediction has a lack of comprehensive and systematic literature reviews that can help to organize, and summarize existing literature, evidence, and potential challenges they encounter. To our knowledge, this is the most up-to-date review on the use of machine learning in crime prediction between 2010 and 2022. The study showed that the researchers in the field are interested in crime prediction and there is a growing need for interest in the field. Secondly, we noted that most of the publications used crime data records from police stations/law enforcement agencies. These datasets contained a variety of variables that made up the data from crime type, crime case, victim, perpetrator, date, time, weather, and many more. This SLR concludes that crime prediction can help assist law enforcement, governments, and police department across the world to improve their communities and economy. A novel approach to crime prediction would be an optimal approach to solving crime prediction and for a safer economy.

In this SLR, the limitations are restricted to journal articles, and reviews between 2010 and 2022 related to ML and AI in crime prediction. A large number of irrelevant articles were omitted in the exclusion criteria phase of the search approach. This made it possible for us to only look at the papers that passed the criteria for the study. We believe that the quality of this study has been further improved and strengthened by the addition of more sources and articles.