In this paper, we divide the related work section into two different subsections. The first subsection describes the crime prediction problem from various aspects. The second subsection provides a review of the existing research on transfer learning approaches.
Existing scientific and criminology studies work on various factors such as spatiotemporal, historical, demographic, and human behavioral factors linked to different criminal activities and crime prediction tasks. We detail the works in such elements in the following subsections.
Temporal and historical aspect
Understanding temporal and historical aspects of different crime and criminal activities are of great importance for predictive policing. Temporal trends have been analyzed in several studies [17,18,19,20] for crime research. Bromley et al.  explored temporal characteristics of alcohol-related crime for Worcester, revealing a strong association of alcohol-related crime with night-time leisure zone and night-time revelers. Similarly, Cusimano et al.  found the relationship between ambulance dispatch and bar closing time to be from 12 am to 4 am in their study. Recently, temporal properties such as a month, year, weekday, and seasonal patterns of crime occurrence are included in [10, 22, 23], though the authors mainly focus on demographic features and mobile network activity. Several works also explore historical information and temporal knowledge to predict future crime incidents [23,24,25]. A proactive decision-making environment is proposed in  by employing the seasonal trend decomposition technique with historical knowledge.
As stated by the crime pattern theory of criminology, space is one of the influential factors which might significantly affect criminal occurrences . Crimes are not randomly distributed throughout the space, and the geographic area of a crime may vary from one place to another. Spatial pattern analysis aims to discover the spatial distribution and aggregation of crime. Several studies analyzed spatial patterns in conjunction with some other patterns while predicting crime occurrence [11, 27,28,29]. For example, Yu et al.  proposed a global spatio-temporal pattern for crime forecasting using the Cluster-Confidence-Rate-Boosting (CCRBoost) algorithm. The algorithm iteratively picks some local patterns with a minimum classification error. Later, in , the authors introduced Points-Of-Interest (POI) data to extract fine-grained knowledge about a geographic region and observed a positive correlation between geographical influence features and crime rate. In another study , a density-based clustering algorithm, HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise), is proposed to extract hotspots. Then, the shortest distance between each hotpoint and crime point is calculated to obtain the spatial feature. Recently, a study  investigated spatial signatures by utilizing home security technologies and observed a significant change in Vancouver’s spatial distribution of residential burglary crime.
Demographic and socioeconomic features have widely been used by researchers for crime prediction [10, 22]. Previous studies have applied various demographic factors, such as population, number of vacant houses, owner-occupied houses, number of people who are married or separated , population density, poverty, residential stability [31, 35, 36], type of premises , education, ages, income levels [10, 11, 27], property values , etc. All these works identified a significant correlation between these factors and criminal activities. Recently, Fatehkia et al.  proposed leveraging Facebook ‘interests’ data from the Facebook Advertising API with demographic data for crime rate prediction. Interests are analyzed based on four different groups such as movie, game, music, and relationship-related interests for specific age groups and gender. The study found that integrating Facebook interests data with demographic census data improves the models’ prediction power.
Human behavioral aspect
Recently, there has been increasing interest in exploring human mobility and behavioral patterns in crime research. For example, Bogomolov et al.  proposed a data-driven approach to address the crime prediction problem by integrating human behavioral data with human demographics. They computed human behavioral data by utilizing mobile network activity, which is referred as the Smartsteps data. In particular, this study estimated the number of people visited in a specified smallest geographic unit or cell by summing up the total number of unique phone calls in each hour. Later, in another study, the authors investigated the human dynamics feature evolved from the same mobile network infrastructure for crime hotspots classification . Wang et al.  introduced taxi flow data to understand city dynamics by connecting neighborhoods and non-adjacent locations. The study hypothesized that two non-adjacent communities might have a strong correlation, and the social interaction between those communities might propagate the crime rate. In , the authors extended this work  by combining both the taxi flow graph and spatial graph with learning region representations. In particular, the authors proposed a graph embedding method to uncover the relationship between urban dynamics and crime rate prediction. Similarly, several studies [10, 25, 40] incorporated dynamic city knowledge from transportation and human mobility data for crime prediction. Their data are mainly obtained from Open Street Map (OSM), Transport department, and location-based social network, i.e., Foursquare. A more recent data-driven approach has been proposed in  for crime occurrence prediction. Instead of focusing on mega-cities, the study mainly worked on smaller cities and analyzed human behavioral data’s impact with human demographics.
The majority of the current crime research is directed to large urban communities that exhibit dense and diverse characteristics. At the same time, urban planning paradigms and societal variables contrast by locale and size of the community. The existing research did not actualize their ideas from a cross-domain learning perspective. Aiming to learn a uniform model for all cities given different data distributions, we present a cross-domain transfer learning approach for crime occurrence prediction.
In this section, three different transfer learning approaches (e.g., instance-transfer, feature-representation-transfer, and model-transfer) based on what type of knowledge is transferred across domains  are discussed. In general, the instance-based transfer learning setting uses instance re-weighting and resampling techniques to obtain the relevant source instances, which can then be used with the labeled target data. In recent years, many extended boosting-based ensemble learning methods have been proposed for this setting. TrAdaBoost is a widely used boosting-based transfer learning algorithm that addresses the instance transfer learning problem . This method’s main goal is to train a classifier using both the old (source) and new (target) domain data and transfer knowledge between different distribution instance spaces. In this approach, old data, which is significantly dissimilar from new data and incorrectly classified, get reduced weight. On the other hand, new target data get higher weights for misclassified examples to intensify their impacts. In 2010, Yao et al.  proposed an extension of TrAdaBoost, called MultiSourceTrAdaBoost, by leveraging multiple sources of data for knowledge transfer. The author states that using a single source domain for knowledge transfer may lead to negative transfer and performance degradation due to the weak relationships between source and target. MultiSourceTrAdaBoost follows the same strategy as TrAdaBoost by applying weights to the source and target training data, except in the weak classifier selection. In each iteration of MultiSourceTrAdaBoost, a weak classifier is chosen based on the close relationships between source and target training data. Later, Liu et al.  designed a weighted resampling-based transfer learning framework (TrResampling) to improve the classification accuracy from TrAdaBoost. The algorithm resamples higher weights data in the source domain and adds this to the labeled target domain data. Then, the TrAdaBoost algorithm is applied for model building by adjusting source and target weights. Besides the resampling strategy, the study also assembled bagging-based  and MultiBoosting-based  transfer learning algorithms.
In addition to the boosting-based methods, various techniques exist to utilize the instances from source data. Tianyang et al.  proposed an instance-based deep transfer learning approach for image classification problems. The authors mainly pre-trained a model using source domain data and then applied it to labeled target training data. This strategy helps find the optimized target training set by estimating and removing the less influential target training data. Later, this optimized target data is used for building a new model or fine-tuning the previous pre-trained model. In 2016, Shuang et al.  proposed a source subset selection method by estimating the close relationships between source and target instances. The study employed an extension of Vovk’s conformity test for this purpose.
The Feature transfer learning setting assumes that there might be an inclusive relationship between source and target domains, and this approach tries to learn a new feature representation for the target domain. A cross-domain sentiment classification problem has been studied by Pan et al.  through the feature alignment approach. The authors first identify the domain-independent and mutually dependent features and then build a spectral feature alignment (SFA) algorithm to reduce the difference between domain-specific features. In another work, Xia et al.  presented a feature ensemble method for sentiment classification where domain-independent features get higher weights, and domain-specific features get lower weights. The work of Oquab  employed a convolution neural network (CNN) architecture to transfer image representations trained on labeled large-scale source data to target tasks with a limited amount of data. As the image distributions are different for different domains, the study added a new adaptation layer to the CNN architecture of the target task. The key point in feature representation transfer learning is finding a good feature representation between domains with a different distribution. Pan et al.  proposed such a learning method named Transfer Component Analysis (TCA) for cross-domain WiFi localization and text classification.
Model transfer learning is also referred to as parameter-transfer learning. This approach finds out some shared parameters of the model for related source and target domains. Parameter-transfer methods are mainly effective for multi-task learning, where the adapted model is employed for the target tasks. TaskTrAdaBoost  is an extension of the TrAdaBoost algorithm for parameter-transfer based settings. The model identifies the shared parameters from different sources and target training part and reuses them to learn the target classifier. Another parameter-transfer method was proposed by Chattopadhyay  for detecting muscle-fatigue in various stages. The proposed framework relies on conditional probability distribution differences of multi-source data named Conditional Probability-based Multi-Source Domain Adaptation (CP-MDA). Differently, Segev et al.  proposed two model transfer learning algorithms: structure expansion/reduction (SER) and structure transfer (STRUT), based on a local transformation of a decision tree structure.
In our study, we focus on instance-based knowledge transfer. This approach is mainly motivated by importance sampling where relevant source domain data are re-weighted and/or target training subset selected before training the model. To the best of our knowledge, our study is the first of its kind to utilize such a knowledge transfer approach in crime prediction.