Introduction

Crimes tend to form spatial concentrations. Studies suggest that around half of all crimes in a city are recorded in a small number of high-risk locations which amount to about 5% of the entire city (Weisburd and Amram 2014; Weisburd, 2015). These places are called crime hotspots and have been studied extensively from the criminological, geographical as well as the practical policing standpoints (Sherman et al., 1989; Sherman & Weisburd, 1995; Braga 2001; Weisburd et al. 2004; Braga et al., 2019, Shiode & Shiode, 2020a, b).

In the criminological literature, the presence of crime hotspots is often explained in relation to routine activity theory (Cohen & Felson, 1979) which suggests that crimes tend to occur when a favourable combination of crime opportunities emerge, usually in the form of a motivated offender, an attractive target and the absence of capable guardianship converging in a confined space and time. More generally, the crime opportunity theory (Felson and Clarke 1998) attributes the high volume of crimes in hotspots to the specific conditions and situational factors found in those places that enable offenders meet targets and make decisions to commit a crime. The idea to link crime with specific conditions of places has been explored at least as early as Shaw and McKay’s social disorganisation theory (1942) which considers the relationship between criminal behaviour and the physical, cultural and social environments in which offenders live. They argued that neighbourhoods characterised by residential instability, low income and ethnic diversity were subject to higher rates of crime and disorder.

Many studies in the spatial analysis literature refer to crime hotspots in relation to either (1) one or a selected group of crime(s) at a time (e.g. all narcotic-related arrests from manufacturing to possession); or (2) the aggregate hotspots derived across all crimes regardless of their nature. The focus on a particular group of crimes is usually prompted by the need to understand the specific nature of those crimes. However, studying a single crime type or a small set of crimes means there is a limit to how widely we understand the distribution of and the association between different crime types. The latter, on the other hand, allows us to account for the total crime counts in the area and resolves issues such as low crime counts and confidentiality that may otherwise restrict access to the data. At the same time, aggregating all crime counts indiscriminately across different types of crime raises some concern. Andresen and Linning (2012) argue that aggregating across different crime types is largely inappropriate and could lead to misleading findings. The key concern is that different types of crime may have different conditions of crime opportunities and, by extension, the spatial arrangement of resulting hotspots for the respective crime may be also different from one another. Yet, past research has often treated crime hotspots as a general place with high risks that may attract a range of different crime types.

There are some exceptions and these include, for instance, Andresen and Malleson’s work on the spatial point pattern test that measures the similarity in crime patterns between two crime types in the form of an index (S-index) (Andresen & Malleson, 2011). Also, De Melo et al. (2015) extend and apply this method to measure the spatial homogeneity between each pair of different crime types in Brazil and identify the extent of similarity at three different scales of aggregate areas. Kikuchi (2015) investigated the space–time linkage between a pair of crime-related incidents by checking the spatio-temporal proximities between reports of suspicious persons as a crime precursor, and sex crimes that occurred in the area afterwards. These studies clarify how similar the patterns of spatial or space–time concentrations can be between two types of crime-related events. While the focus on the pair-wise analysis can yield important insights into their association, their investigation is carried out for two at a time. Instead, this study will focus on investigating similarity and difference in the spatial patterns of hotspots across all different crime types.

Another strand of research focuses on the spatial Conjunctive Analysis of Case Configurations (CACC) (Hart & Miethe, 2015; He et al., 2020; Summers & Caballero, 2017). They investigate whether certain types of crime occur near a specific type of urban facilities or land use with the aim to establish the association between a specific crime and the associated urban and environmental factors. This approach is particularly useful for examining what urban features might induce crime opportunities by their presence. However, it is not designed for analysing a large number of elements such as multiple types of crime, mainly because it is computationally intensive, especially during the process where all possible combinations of associated factors are computed to identify unusual concentration among their distribution.

The crime leading indicator model is another strand of research that addresses the relationship between different types of crime but also involves temporal transition of crime occurrences. It uses past crime incidents to predict the occurrence of other types of crime in the future (Cohen et al., 2007; Gorr & Olligschlaeger, 2002). The model assumes that certain types of crimes (usually less substantial crimes) can be utilised as an indicator for other (often more serious) crimes. In fact, it has been thought that investigating the linkage between the minor offences and the serious, major crimes would help reduce serious crimes by eliminating their precursors. In reality, the crime leading indicator model has an inherent limitation in that the leading indicator variables must be decided a priori, rather than being derived through the model (Gorr & Olligschlaeger, 2002). This may prove to be an obstacle for predicting serious crimes, as there is little theory on the relevant indicators for serious crimes (Cohen et al., 2007). For this reason, Cohen et al. (2007) adopt expert judgment (or empirical knowledge) for selecting 14 property-or-violent crimes as leading indicators to predict the occurrence of more serious Part 1 crimes in the FBI’s Uniform Crime-Reporting program. In this sense, identifying the proximal relationship between different crimes—including that between minor and serious crimes—without relying on subjective judgement is a much-needed step in the theoretical and the substantive levels of criminological approaches.

The leading indicator model and, more broadly, studies on the relationship between different crime types links to another established criminological theory of the Broken Windows theory (Kelling & Coles, 1996; Kelling & Wilson, 1982). The theory suggests that the presence of soft crime indicates deterioration of the local community (i.e. a broken window neighbourhood) and that these crimes lead to the emergence of and increase in more serious crimes in the same area. Despite this widely accepted theory, there is lack of understanding on the specific combination of crimes that are attracted to such areas.

Against this background, this study will investigate the extent of affinity between the hotspots of different types of crimes. Specifically, we aim to identify a group of crimes that share the same areas of concentrations (i.e. crimes that co-exist in the same area) which we will define as crime colocation in this study. The strength of each colocation will be measured by the frequency of a distinct set of crime types forming their respective hotspots in the same areas. While our study does not investigate the transition of crime types over time, it will help address some of the understudied aspects of the leading indicator model and the broken window theory in that we will systematically investigate the linkage between different types of crimes by applying a data exploratory, mining approach to an entire set of crimes data available and, thereby, devoid of any subjective judgement in the selection of the membership to the colocation. Finding the formation of colocation could also help us gain insights into the neighbourhood profiles. On the theoretical level, the knowledge gained on crime colocation will inform the crime leading indicator theory and the broken window theory on the combination of crimes that tend to concentrate in the same area. On the practical level, identifying colocations of crimes will help inform policing strategies on the set of crimes that can be targeted together as well as the risk of other types of crimes occurring in the area when attending to one type of crime.

Methodology

Methods for detecting clusters and hotspots are many. They range from the scan statistics and other search-window-type techniques developed in epidemiology (Kulldorff & Nagarwalla, 1995) to k-function and quadrat analysis approaches used in the wider spatial analytical contexts including criminological geography. Scan statistic and its variants have been used for crime hotspot detection in the past (Shiode, 2011; Shiode & Shiode, 2020a), but they are intended mainly for detecting high concentration of points, or an aggregate of high crime rate areas within a confined search area.

This study uses areal statistics to discover the patterns of colocation between different types of crimes. It adopts a combination of a (1) cluster detection technique and (2) a pattern mining method to identify colocations of crime hotspots. The analysis will be carried out using crime data aggregated at two different levels of spatial granularity, namely the wider community area level and the more refined census tract level. When conducting cluster detection, the decision to measure the concentration of crime using aggregate areal units results in highlighting the entire extent of an area as a crime hotspot, and the results are subject to the modifiable areal unit problem (Buzzelli, 2020; Openshaw, 1984). At the same time, extracting the exact hotspots using individual point data (which may be prone to some measurement error) for each crime type and identifying colocation using these hotspots would create a challenge in itself; namely, measuring the extent of their overlaps accurately and deciding whether statistically significant colocations exist between them. To confirm the overlaps of the hotspots between different crime types without any ambiguity, this study adopts aggregate areas as units of investigation, but also measure crime colocations at two different areal units to examine the impact of MAUP.

Cluster detection

In order to examine whether each areal unit shows a high concentration of a specific set of crime types, the study requires an approach that tests the significance of crime rate for each crime type by aggregating the point count using the same areal unit across all crime types. We will detect crime hotspots through hypothesis testing to identify whether the observed crime counts or crime rates in each area is statistically significant against a theoretically derived distribution. The cluster detection method is designed as follows. Suppose that the number of cases for each crime type is aggregated to areal units (subregions comprising the study area). We assume that the crimes in each area can be considered as point events, and that the clusters of each crime type are detected using these point count data. Suppose G denotes the entire extent of the study area of interest, consisting of a series of subregions or smaller area units. Also, suppose.

Z is one of the subregions in G

ZC is a complement region of Z in G

N is the number of a specific type of crime recorded in G

nZ is the number of a specific type of crime recorded in Z

nZc is the number of a specific type of crime recorded in ZC

aZ is the population size of Z, and

aZc is the population size of ZC.

Let us assume that the spatial distributions of points in Z and ZC conform to the Poisson distributions; i.e. the counts of points in each region are expected to be proportional to the size of population in that region. Then,

$$n_{Z} /a_{Z} \sim \, Poisson(\lambda_{Z} ) \, ,n_{Z}^{c} / \, a_{Z}^{c} \sim \, Poisson(\lambda_{Z}^{c} )$$
(1)

where \(\lambda_{Z}\) and \(\lambda_{Z}^{c}\) are parameters of the Poisson distributions in Z and ZC, respectively. Then, the alternative hypothesis, which considers the points to be clustered in Z, can be denoted as

$$H_{{1}} |\lambda_{Z} > \lambda_{Z}^{c}$$
(2)

and its null hypothesis is

$$H_{0} |\lambda_{Z} = \lambda_{Z}^{c}$$
(3)

Then nZ conforms to the following binomial distribution:

$$n_{Z} \sim B_{i} \left( {N, \frac{{a_{Z} \lambda_{Z} }}{{a_{Z} \lambda_{Z} + a_{{Z^{C} }} \lambda_{{Z^{C} }} }}} \right)$$
(4)

which reduces to

$$n_{Z} \sim B_{i} \left( {N, \frac{{a_{Z} }}{{a_{Z} + a_{{Z^{C} }} }}} \right)$$
(5)

if the null hypothesis was true. Then, the p-value of the null hypothesis of Z, denoted as pz, is

$$p_{Z} = \sum\nolimits_{i = n}^{N} {\left( {\begin{array}{*{20}c} N \\ i \\ \end{array} } \right)\left( {\frac{{a_{Z} }}{{a_{Z} + a_{{Z^{C} }} }}} \right)^{i} \left( {\frac{{a_{{Z^{C} }} }}{{a_{Z} + a_{{Z^{C} }} }}} \right)^{N - i} }$$
(6)

False Discovery Rate (FDR) controlling procedure

The null hypothesis stated above will be used for determining the statistical significance of the crime rate of each crime type in an area; i.e. whether they are unusually high in their proportion to the extent that we identify it as a hotspot for that particular crime type (the null hypothesis is that no significant concentration exists for a specified crime type in a specified area, and the alternative hypotheses that each region forms a cluster of a specified crime type). To test the hypothesis, this study assumes homogeneity of the distribution of crime incidents within each respective area as the null model. The homogeneous Poisson process defined converges to the binomial process when the number of crime incidents is sufficiently large.

The hypothesis testing is subject to the multiple testing problem in that the false positive rate increases with more iteration of hypothesis tests are performed. To control this multiplicity, we will use the False Discovery Rate (FDR) controlling procedure (Benjamini and Hochberg 1995). It gives control over the expected proportion of false positives among all significant hypotheses, instead of the type I error rate, regardless of the number of hypothesis tests; thus providing a reasonable estimate for a large-scale inference. Use of FDR procedure in a geographical context is not new and dates back to at least Caldas de Castro and Singer (2006), with Brunsdon and Charlton (2011) applying it to a spatial cluster detection problem, and Shiode et al. (2015) applying it to detect spatial–temporal crime hotspots at micro-scale. FDR-based spatial cluster detection is a simple but effective statistical method in detecting multiple clusters whilst avoiding the multiple testing problems.

The FDR-controlling procedure can be summarised as follows. Suppose that we are testing m number of hypotheses, of which R number of null hypotheses are to be rejected (Table 1). The multiple testing increases the type I error occurrence (V) by chance. Benjamini and Hochberg (1995) defined the FDR as an index of false discoveries

$$FDR = E\left( \frac{V}{R} \right), \left( {FDR = 0, {\text{ if }}R = 0} \right)$$
(7)
Table 1 m number of hypotheses tests

and proposed an FDR-controlling procedure that keeps the FDR less than the given significance level α. In this study, we will adopt the analytical process by Brunsdon and Charlton (2011) and Shiode et al. (2015) to determine whether each area is detected as a hotspot of a specific crime type. The FDR rate is set at FDR < 0.01 to control the amount of false positive. Applying the FDR-control procedure to each crime type and extracting their hotspots separately would allow us to assess the colocation between all combinations of crimes.

Extraction of colocation patterns

The idea of co-agglomeration between different types of entities such as industries, or the colocation of a set of urban facilities have been well studied in the spatial economics literature. Also, the commercial sector holds a constant demand for the consumer market research to understand the specific combination of products purchased by customers, as this would help the suppliers improve the shelf arrangement of the goods in their store, or to nominate push contents on their website. In this sense, the notions of colocation and co-agglomeration have been developed mainly for identifying the association between a combination of industries that exist close to each other (Ellison & Glaeser, 1997; Leslie et al., 2012). This study will apply these notions to assess the colocation between the hotspots of crimes.

The patterns of colocation are detected usually through the application of a frequent pattern mining algorithm which identifies the same combination of industries that co-exist in the same region. For instance, the Ellison and Glaser (1997) metric and the Duranton and Overman (2005) metric are widely used in this strand of literature for measuring the degrees of colocation and co-agglomeration between industries, with the latter offering the capacity to continuously search beyond boundaries for co-agglomeration between two types of industries through k-function-type cumulative search. However, they are designed to detect the extent of co-agglomeration at the global scale and do not provide insights into the spatial distribution; i.e. they would not suit the purpose of identifying the location of joint-clusters of crime hotspots. Also, while Ellison and Glaser (1997) have the capacity to measure the colocation between more than two industries, the combination of industries need to be offered a priori, and this restricts its application to an exploratory study for mining possible combinations of crime types among numerous possible permutations.

Given these limitations, this study adopts the frequent pattern growth algorithm (FP-growth algorithm) (Han et al., 2000). It uses the locations of clusters for each crime type to extract the combinations of crimes that share hotspots in the same areas. For instance, if the hotspots of crime types A, B and C are found together in a sufficient number of areas, then the combination of crime hotspots {A, B, C} can be considered as a pattern of colocation.

The frequent pattern mining algorithm was first proposed by Agrawal and Srikant (1994). It is often used for analysing consumer purchase behaviour to understand which combinations of items are frequently bought together. Frequent pattern mining distinguishes the frequent pattern by support, that is, the count of a combination of items. A predetermined threshold called the minimum support will be used for evaluating the frequency of the occurrence of each combination. If its frequency reaches or surpasses the minimum support, that combination will be recorded as a frequent pattern.

The same principle can be adopted for detecting a frequent occurrence of shared hotspots among different crime types. Suppose that crime types A to D are recorded across areas I to V, and the hotspots were detected as shown in Table 2. The support of crime A is 60%, as its hotspots are detected in three areas (I, III and V) out of possible five areas. Similarly, the support of crime set {A, B, C} is 40%, as it is found in both areas I and V. If the minimum support is set at 40%, nine patterns {A}, {B}, {C}, (D), {A, B}, {A, C}, {B, C}, {A, D} and {A, B, C} are extracted from the spatial distribution of these crimes.

Table 2 An illustrative example of crime hotspots distribution

As the number of crime types and the areas increases, the search for frequent patterns becomes computationally intensive. The FP-growth algorithm helps overcome this problem by adopting one of the most efficient pattern-mining algorithms (Han et al., 2000). It improves on the original algorithm by compressing large data to a highly condensed, compact structure, adopting a tree-based mining approach to eliminate a large number of redundant searches for candidate sets, and by decomposing the mining task through a divide-and-conquer approach to reduce the search space. Initial performance test showed that the FP-growth method is efficient and scalable for mining both long and short frequent patterns and is roughly an order of magnitude faster than the regular a priori algorithm.

Data

This study explores the 259,150 cases of crime that were recorded across the City of Chicago, IL, United States between 1 January and 31 December 2019. The boundary data from Chicago allowed us to aggregate them across 77 communities (Fig. 1), and 866 tracts from census 2010 within the City of Chicago. The community areas were originally formed as units of neighbourhoods in the 1920s, but their current population size vary considerably from Burnside (2527 people) to Near North Side (105,481 people). For this reason, we will use the crime rates, adjusted by the 2020 residential population (Chicago Metropolitan Agency for Planning, 2021) to extract communities that saw significant concentration of each crime type. While certain types of crime may not be directly proportional to the residential population (or, in some cases, may be affected more strongly by the daytime population), it is still the best predictor of the crime volume, and the best indicator for predicting the variation of the crime volumes across different areas. On the other hand, crime counts recorded at the census area level will not be weighted, as they are bound by the size of the census areas which, by definition, are determined to have a similar number of residents.

Fig. 1
figure 1

77 Chicago Community Areas (with Community Area IDs)

Each case contains information on the time and location of the crime, type of crime by the Uniform Crime Reporting (UCR) codes (e.g. 01: murder, 02: sexual assault), description (e.g. by fire, aggravated, attempt), location description (e.g. apartment, bar, street), whether an arrest was made, and if it was domestic/residential. There are 32 primary classifications, ranging from crimes that occur in large volume including theft (62,383 cases) and battery (49,486 cases) to those with fewer cases such as human trafficking (11 cases) and non-criminal concealed carry license revocation (4 cases). Of these, 10 crime types were omitted from the analysis (Table 3). They were eliminated either because of (1) the scarcity of cases which makes it unlikely to form hotspots in multiple areas; (2) their aspatial nature (e.g. deceptive practice and intimidation are often carried out over the phone, through letters or online communication and the spatial reference usually points to the victim’s registered address); or (2) the diversity of crimes covered under a single category which makes it challenging to make meaningful observation of their clusters (e.g. other offence covers too diverse a range of crimes).

Table 3 List of crime types by UCR that were omitted from the analysis

Of the 223,678 cases among the remaining 22 primary classifications, 4 cases (3 thefts and 1 arson) were missing spatial information and were also omitted, leaving 223,674 cases for the study (Table 4). Some of these classifications consisted of crimes committed in different types of space (e.g. residential, commercial, open space) or were different in nature (e.g. possession of drugs vs. manufacturing of drugs). To separate these variants within the same primary classifications and to detect colocation of their hotspots, the 22 primary classifications were reclassified into 35 categories (Table 5). For instance, assault and battery were divided into.

Table 4 List of crime types by UCR included in the analysis
Table 5 List of crimes recorded in Chicago IL in 2019 by subcategories and the number of cases
  1. (1)

    Residential space: apartment, residence, residence garage, and other private premises;

  2. (2)

    Public space: commercial and public buildings, including banks, schools, churches and civic offices; and.

  3. (3)

    Open space: streets, alleys, parks and open space.

as the nature of domestic violence (assault and battery) would differ from those that take place in public and open space. Burglary, robbery and theft were also divided into similar categories each, although burglary cases were split into two groups (residential vs commercial/public) as it is confined to off street places; and theft cases were split into four groups; namely, residential, commercial/public, street/open space, and vehicles/trains. In addition, weapons violation cases were split between (1) reckless or unlawful use, and (2) unlawful possession or sales; and narcotics cases were divided into (1) possession or use of a substance, and (2) manufacturing or delivery of a substance; as these categorise of crimes may form hotspots in different areas. distribution of the number of crime cases for these crime types is shown in Fig. 2.

Fig. 2
figure 2

The number of crimes recorded in Chicago IL in 2019 by the 35 subcategories

Results

As mentioned above, the 35 subcategories of crime types were aggregated at two areal unit levels: 77 community areas which correspond to the respective local neighbourhoods and 866 census tracts within Chicago. Within each area, each crime type was tested with the FDR procedure to examine their statistical significance with the FDR value being set conservatively at 0.01.

Colocation at the community area level

Results of hotspot detection for the community areas are summarised in Table 6. It lists all community areas that have at least one type of crime hotspot(s), and the type(s) of crime that formed hotspots in that area. The number to the left of the table corresponds to the community area ID. In total, 30 of the community areas had no crime hotspots and were omitted from Table 6, and 10 areas had a hotspot of a single crime type only (as indicated by the value of 1 in the rightmost column of Table 6)—these are theft (residential space, public space, or open space), public peace violation, battery in public space, weapon’s violation-misuse, or prostitution. The rest of the areas had some form of joint clusters of hotspots between different crime types, ranging from 2 to 26 crime types forming hotspots in the same area. It shows a distinct variation in the community areas, ranging from those comprising a handful of crime types to those that host a couple of dozens or more hotspots from different crime types. The larger joint clusters consist of similar sets of crime types which can be considered as crime colocations.

Table 6 List of all community areas with hotspots and the respective crime type

To make these colocations more visible, the frequent pattern growth algorithm was applied to count all possible colocations from which Table 7 was extracted. It shows the list of most frequently appearing combinations of crime types by community areas for each respective size of colocation, ranging from the smallest possible combination between a pair of crime types to the largest combination of 26 crime types. Highlight crime types denote their first appearance in the table. The purpose of producing this table is to identify the most comprehensive set of colocating hotspots by eliminating the less frequent duplicates of colocations from the entire set of 46,978 colocations detected. Of these colocations, battery in residential space (Bttry1), battery in open space (Bttry3), assault in residential space (Asslt1), and criminal damage to property (CrDmg1) came up as the crime types that occur together most frequently—either as a complete set of the four crime types, or a subset thereof—at the community area level. The smallest unit comprises a pair between battery in residential space (Bttry1) and battery in open space (Bttry3); i.e. {Bttry1, Bttry3}, followed by a trio {Bttry1, Bttry3, CrDmg1} and the quartet {Bttry1, Bttry3, Asslt1, CrDmg1}, where a larger colocation set contains the smaller sets completely. Given their frequent and persistent nature as core colocation sets, this study will hereafter call these combinations of crimes the primary colocation.

Table 7 The most representative colocations by community areas. Highlight denotes the first appearance for that crime type on the list

The primary colocation patterns may have been themselves part of a larger colocation set in the respective area but were persistent as a core set of crimes. In such cases, the remaining crime types in the larger colocation were one or more of the following 8 crime types (in the order of frequency): assault in open space (Asslt3), motor vehicle theft (Motor), weapons violation possession (Weapn2), theft from residence (Theft1), criminal damage to vehicle (CrDmg2), robbery in open space (Robry3), criminal trespass (Tresps), and weapons violation misuse (Weapn1). These crime types appear across the most representative colocations of 5 to 11 crime types and more in Table 7, and they seem to form the second tier of colocating crime types, which we will call secondary colocation.

In addition to this, there are even larger colocations containing more hotspots from different crime types. They seem to reflect more crime riddled areas where a wide range of crime types colocate. The listing in Table 7 shows a distinct multi-layered pattern, where the primary colocation contains the secondary colocation, with the tertiary group containing the primary and the secondary colocation. The tertiary colocations were joined by the following crime types which did not appear in the primary and secondary colocations (Table 7) (in the order of frequency: interference with public officer (Inter), homicide (Hmcid), narcotics possession (NARC1), assault in public space (Asslt2), public peace violation (PceVio), offense involving children (Child), narcotics manufacturing (NARC2), criminal sexual assault (SexAslt), robbery in public space (Robry2), battery in public space (Bttry2), theft in open space (Theft3), and burglary in residential place (Bglry1). Colocations that contain these crime types will be hereafter called the tertiary colocations. Figure 3 summarises the multi-layered pattern between the primary, the secondary and the tertiary colocations.

Fig. 3
figure 3

An illustrative diagram showing the multi-layered structure of crime colocations

A group of crimes are missing largely from Table 7; namely, stalking, arson, prostitution, sex offence, kidnapping, liquor violation and gambling. These crimes can be regarded as having little or no association with the key crime sets and are, therefore, listed in Fig. 3 as the non-colocation group.

Also, as an observation, among the same type of crime recorded in different places (e.g. Bttry1, Bttry2, Bttry3), crimes that took place “in residential place” or “in open space” tend to appear more central to this multi-layered pattern, whereas those carried out “in public space” tend to appear only as part of a later colocations (i.e. in more dangerous areas with multiple crime hotspots). Crimes committed in public space are also lower in their crime counts, compared to those recorded in residential or open space; which suggests that the nature of public space and the presence of witnesses may be acting as an effective barrier to reduce the crime.

Figure 3 captures the multi-layered structure of the crime types represented by the colocation pattern in the community area data. The next section explores whether this diagram also applies to the census tract level data.

Colocation at the census tract level

Colocation of crime hotspots were also examined at the census tract level. In general, the frequent patterns of crime hotspots found at the census tract level were consistent with the overall tendency found at the community area level. Owing to the small size of a census tracts, the size of colocations is inevitably smaller than those detected at the community area level. Therefore, the most representative crime types at the census tract level (Table 8) include colocations between 2 and 12 crime types only. Crime types that comprise the primary colocations from the 2 to 4 key crime types are identical to those captured at the community level; namely, battery in residential space (Bttry1), battery in open space (Bttry3), assault in residential space (Asslt1) and criminal damage to property (CrDmg1). A larger colocation comprising up to 12 crime types also include the same types of crime as those in the secondary tier colocations extracted at the community level. In this sense, the diagram shown in Fig. 3 represents the outcomes from both levels of granularities. The only difference is that the small size of census tracts prevented the third tier (the tertiary-level) colocations from getting detected at the census-tract level, and the membership of crime types in the tertiary colocation derived at the community-area level could not be cross examined.

Table 8 The most representative colocations by census tracts. Highlight denotes the first appearance for that crime type on the list

Discussion

Results from the frequent pattern mining suggest that the hotspots of crimes form a clear colocation pattern that is also multi-layered (Fig. 3). At the centre of this colocation pattern are the four key crimes that comprise the primary colocation: namely, Assault in residential space, Battery in residential space, Battery in open space and Criminal damage to property. While their frequency varies slightly and form some permutations among the four crimes, they indicate a robust, consistent ties between them. The robust association between these crimes suggests that they share very similar conditions (e.g. the surrounding environment, the demographic profile) as the trigger for the respective crime opportunity. This is further compounded by the fact that near-identical patterns of multi-layered colocations were detected during both the community-area-level and the census-tract-level analyses.

Figure 3 also highlights a tendency where hotspots of more serious crimes tend to come in only at a later stage where larger colocations are detected. Serious crimes tend to happen in crime riddled areas. For instance, homicide appears in Table 7 after the size of the representative colocation reaches 13 crime types, and this is where other tertiary-level crimes also start to join the colocations. Whether this tendency remains true for other areas and other sets of crime data needs further investigation.

As explained earlier, colocation analysis was developed largely in the context of spatial economics. The proximity between the colocated industries is considered as a product of either some form of interaction (cooperative or competitive) among them, or a shared customer base. These associations can be paraphrased to the context of crime. In particular, the association between the crimes in the primary colocation and those in the secondary colocation (as well as that between crimes in the secondary colocation and those in the third colocation) may also involve some form of interaction between the two groups of crimes. Specifically, whether those in the primary colocation induce the secondary colocation crimes and so on in a manner akin to the crime leading indicator model and the broken window theory remains to be seen. In general, the broken window theory suggests that deterioration of the surrounding environment caused by lighter, less serious crime (e.g. anti-social behaviour, property damage) may nurture a culture and environment that increases opportunities for more serious crime (e.g. murder, robbery). While this study did not confirm the progression of crime conclusively, there were associations between property damage and many violent crimes (assault, battery, burglary, robbery and theft). Whether the latter was induced by the former, or the innate nature of the area has attracted all crimes alike requires further investigation. The impact of lighter crimes on serious crimes can be measured by observing lagged colocation between hotspots from different time points in future studies.

The nature of the crimes in the primary group (battery, assault and criminal damage) also seems to match the description of a broken window neighbourhood; i.e. the beginning of the neighbourhood deterioration that may attract more criminal activities in future. While this point benefits from further investigation, if the multi-layered structure in Fig. 3 or Table 7 shows a development stage of crimes or of the areal deterioration, it may be possible to use the colocated crimes detected in this study for predicting a future crime environment in the same area.

As an observation, crimes in the primary colocation are not always the most voluminous of all crimes; i.e. crimes with some more volume crimes are included in the second colocation (e.g. theft). Similarly, not all crimes in the tertiary colocation are lesser in their volume (e.g. narcotics are relatively voluminous but are in the tertiary colocation). In other words, the strength of colocation (OR strength of the relationship with other crimes) does not reflect the volume of that crime.

As this study is still exploratory in nature, the association between the primary, the secondary and the tertiary colocations as well as those outside these categories requires further investigation. To establish the association among them, these inquiries would benefit from colocation analysis of crime data from multiple years and from different cities and nations. However, the clear pattern of colocation found in this study suggests that a set of crimes form hotspots together, while others vary in their hotspot location. This confirms the assertion by Andresen and Linning (2012) that conducting hotspot analysis across all crime types would be inappropriate as an approach and could lead to a misleading conclusion.