Simulation studies are computer experiments in which data is created via pseudo-random sampling in order to evaluate the bias and variance of estimators, compare estimators, investigate the impact of sample sizes on estimators’ performance, and select optimal sample sizes, among others (Moretti 2020). Brantingham and Brantingham (2004) recommend the use of computer simulations to understand crime patterns and provide policy guidance for crime control (see also Groff and Mazerolle 2008; Townsley and Birks 2008). In this study, we generate a synthetic dataset of crimes known and unknown to police in Manchester, UK, and aggregate crimes at different spatial scales. This permits an investigation into whether aggregates of crimes known to police at the micro-scale level suffer from a higher risk of bias compared with those at larger aggregations, such as neighborhoods and wards.
Based on parameters obtained from the UK Census 2011 and Index of Multiple Deprivation (IMD) 2010, we simulate a synthetic individual-level population consistent with the characteristics of Manchester. The simulated population reflects the real distributions and parameters of variables related to individuals residing in each area of the city (i.e., mean, proportion, and variance of the citizens’ age, sex, employment status, education level, ethnicity, marriage status, and country of birth). The measure of multiple deprivation captures the overall level of poverty in each area. Then, based on parameters derived from the Crime Survey for England and Wales (CSEW) 2011/2012, we simulate the victimization of these individuals across social groups and areas and predict the likelihood of these crimes being known to the police. This allows us to compare the relative difference between all crimes and police-recorded incidents at the different spatial scales.
The main motivation for using a simulation study with synthetic data, instead of simply using crime records, is because the absolute number of crimes in places is an unknown figure, regardless which source of data we use (see sect. “Geographic crime analysis and measurement error”). Police records are affected by a diverse array of sources of error which vary between areas, and the CSEW sample is only designed to allow the production of reliable estimates at the level of police force areas (smaller areas are unplanned domains with very small sample sizes for which analyses based on direct estimates lead to unreliable outputs; Buil-Gil et al. 2021). Nevertheless, the analytical steps followed in this article are designed to provide an answer to our research question (namely, whether micro-level aggregates of police-recorded crime are affected by a larger risk of bias compared with larger scales), rather than producing unbiased estimates of crime in places. Future research will explore if the method used here is also a good way to produce accurate estimates of crime in places and compare these estimates with model-based estimates of crime indicators obtained from more traditional methods in small area estimation (Buil-Gil et al. 2021). Indeed, unbiased estimates of crime in places are needed to guide evidence-based policing and research.
In this section, we describe the data and methods used to generate the synthetic population of crimes known and unknown to police and evaluate differences between spatial scales. Section “Generating the population and simulation steps” outlines the data-generating mechanism and the steps of our simulation study, and in sect. “Empirical evaluation of simulated dataset of crimes,” we provide an empirical evaluation of the simulated dataset. We discuss methods to assess the results in sect. “Assessing the results.”
Generating the population and simulation steps
The simulation of our synthetic population involves three steps which are described in detail below. All analyses have been programmed in R (R Core Team 2020), and all data and code used for this simulation study are available from a public GitHub repository (see https://github.com/davidbuilgil/crime_simulation2).
Step 1. Simulating a synthetic population from census data
The first step is to generate a synthetic population consistent with the social, demographic, and spatial characteristics of Manchester. We download aggregated data about residents at the output area (OA) level from the Nomis website (https://www.nomisweb.co.uk/census/2011), which publishes data recorded by the UK Census 2011. For consistency, we will conduct all our analyses using information collected in 2011. From Nomis, we obtain census parameters of various variables in each OA in Manchester. OAs are the smallest geographic units for which census data are openly published in the UK. The minimum population size per OA is 40 households and 100 residents, but the average size is 125 households. We will also use other units of geography in further steps: lower layer super output areas (LSOAs), that generally contain between four and six OAs with an average population size of 1500; and middle layer super output areas (MSOAs), which have an average population size of 7200. The largest scale used are wards. In Manchester local authority, there are 1530 OAs, 282 LSOAs, 57 MSOAs, and 32 wards.
Although UK census data achieve nearly complete coverage of the population, and measurement error arising from using these data is likely to be very small, Census data are not problem-free. For instance, census non-response rates vary between age, sex, and ethnic groups (e.g., while more than 97% of females above 55 responded the census, the response rate for males aged 25 to 29 was 86%), and questionnaire items (e.g., non-response rates were 0.4% and 0.6% for sex and age questions, respectively, and 3%, 4%, and 5.7% for ethnicity, employment status, and qualifications questions). In Manchester, the census response rate was 89%. In order to adjust for non-response in census data, the Office for National Statistics used an edit and imputation system and coverage assessment and adjustment process before publishing data in Nomis (Compton et al. 2017; Office for National Statistics 2015). Census data are widely used as empirical values of demographic domains in areas for academic research and policy (Gale et al. 2017). From the census, we obtain the number of citizens living in each OA (i.e., resident population size), the mean and standard deviation of age by OA, and the proportion of citizens in each area with the following characteristics defined by binary variables (in parentheses, we detail the reference category): sex (male), ethnicity (white), employment status (population without any income), education (higher education or more), marriage status (married), and country of birth (born in the UK). We use this information to simulate our synthetic individual-level population and their corresponding social-demographic characteristics within each OA. Moreover, we attach the known IMD 2010 decile in each OA. This ensures that we account for both individual and area-level measures in our simulation. The IMD is a measure of multiple deprivation calculated by the UK Government from indicators of income, employment, health, education, barriers to housing and services, and crime and living environment at the small area level (McLennan et al. 2011). Generating these values allows us, in subsequent steps, to simulate crimes experienced by citizens, as well as the likelihood of each crime being known to the police, based on parameters obtained from survey data. We use these specific variables since these are known to be associated with crime victimization and crime reporting rates (see sect. “Geographic crime analysis and measurement error”). Thus, the selection of census parameters is driven by the literature review and the availability of data recorded by the census and IMD.
The variables are generated for d = 1, …, D OAs and i = 1, …, Nd individual citizens according to the distributions detailed below, where Nd denotes the population dimension in the dth OA:
-
\( \mathrm{Ag}{\mathrm{e}}_{di}\sim N\left({\mu}_d^{\mathrm{Age}},{\sigma}_d^{2,\mathrm{Age}}\right),\kern0.5em \)where \( {\mu}_d^{\mathrm{Age}} \)and \( {\sigma}_d^{2,\mathrm{Age}} \) denote the mean and variance of age for the dth OA.
-
\( \mathrm{Se}{\mathrm{x}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{\mathrm{Male}}\right) \), where \( {\pi}_d^{\mathrm{Male}} \) denotes the proportion of males in dth OA.
-
\( {\mathrm{NoInc}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{\mathrm{NoInc}}\right) \), where \( {\pi}_d^{\mathrm{NoInc}} \) denotes the proportion of citizens without any income in the dth OA.
-
\( {\mathrm{HE}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{\mathrm{HE}}\right) \), where \( {\pi}_d^{\mathrm{HE}} \) denotes the proportion of citizens with high education (holding a university degree) in the dth OA.
-
\( \mathrm{Whit}{\mathrm{e}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{\mathrm{White}}\right) \), where \( {\pi}_d^{\mathrm{White}} \) denotes the proportion of white citizens in the dth OA.
-
\( {\mathrm{Married}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{\mathrm{Married}}\right) \), where \( {\pi}_d^{\mathrm{Married}} \) denotes the proportion of married population in the dth OA.
-
\( {\mathrm{BornUK}}_{di}\sim \mathrm{Bernoulli}\left({\pi}_d^{BornUK}\right) \), where \( {\pi}_d^{\mathrm{BornUK}} \) denotes the proportion of population born in the UK in the dth OA.
Thus, we generate N = 503,127 units with their individual and contextual characteristics across D = 1,530 OAs in Manchester. Given that we simulate all individual information based on population parameters obtained from the census using small spatial units of analysis (i.e., OAs), our synthetic population is very similar (in terms of distributions and ranking) to the empirical population of each OA. The Spearman’s rank correlation coefficient of the mean of age, sex, income, higher education, ethnicity, marriage status, and country of birth across areas in census data and our simulated dataset is almost perfect (i.e., larger than 0.99 for all variables).
Step 2. Simulating crime victimization from CSEW data
We use parameters obtained from the CSEW 2011/2012 to generate the crimes experienced by each individual citizen. The CSEW is an annual victimization survey conducted in England and Wales. Its sampling design consists of a multistage stratified random sample by which a randomly selected adult (aged 16 or more) from a randomly selected household is asked about experienced victimization in the last 12 months (Office for National Statistics 2013). The survey also includes questions about crime reporting to the police and whether each crime took place in the local area, among others. The main part of the survey is completed face-to-face in respondents’ households, although some questions (about drugs and alcohol use, and domestic abuse) are administered via computer-assisted personal interviewing. The CSEW sample size in 2011/2012 was 46,031 respondents.
In order to simulate the number of crimes faced by each individual unit within our synthetic population of Manchester residents, we first estimate negative binomial regression models of crime victimization from CSEW data and then use the model parameter estimates to predict crime incidence within our simulated population. Given that different crime types are known to be associated with different social and contextual variables (Andresen and Linning 2012; Quick et al. 2018), and the variables associated with crime reporting to the police also vary according to crime type (Baumer 2002; Hart and Rennison 2003; Tarling and Morris 2010), we estimate one negative binomial regression model by each of four groups of crime types:
-
Vehicle crimes: includes the number of (a) thefts of motor vehicles, (b) things stolen off vehicles, and (c) vehicles tampered or damaged, all during the last 12 months.
-
Residence crimes: number of times (a) someone entered a residence without permission to steal, (b) someone entered a residence without permission to cause damage, (c) someone tried to enter a residence without permission to steal or cause damage, (d) anything got stolen from a residence, (e) anything stolen from outside a residence (garden, doorstep, garage), and (f) anything damaged outside a residence. These refer to events happening both at the current and previous households during the last 12 months.
-
Theft and property crimes (excluding burglary): number of times (a) something stolen out of hands, pockets, bags, or cases; (b) someone tried to steal something out of hands, pockets, bags, or cases; (c) something stolen from a cloakroom, office, car or anywhere else; and (d) bicycle stolen, all during the last 12 months.
-
Violent crimes: number of times (a) someone deliberately hit the person with fists or weapon or used force or violence in any way, (b) someone threatened to damage or use violence on the person or things belonging to the person, (c) someone sexually assaulted or attacked the person, and (d) some member of the household hit or used weapon, or kicked, or used force in any way on the person, all during the last 12 months.
Thus, this approach assumes that distributions and slopes observed in the CSEW at a national level apply to crimes that take place in Manchester local authority. The CSEW sample for Manchester is not large enough to estimate accurate regression models, and thus, we use models estimated at a national level to estimate parameters used to generate crimes at a local level. The implications of taking this approach are further discussed in sect. “Empirical evaluation of simulated dataset of crimes”. To alleviate the concern about this potential limitation, we show in Appendix Table 7 that the negative binomial regression model of crime victimization estimated from respondents residing in urban and metropolitan areas (excluding London) shows very similar results to model results estimated from all respondents in England and Wales.
The negative binomial regression model is a widely adopted model in this context, which has been proven to adjust well to the skewness of crime count variables (Britt et al. 2018; Chaiken and Rolph 1981). To estimate the negative binomial regression models, we use the same independent variables described in step 1 (i.e., age, sex, employment status, education level, ethnic group, marriage status, country of birth, IMD decile). However, in this step, these are taken from the CSEW. This allows us to obtain the regression model coefficient estimates and dispersion parameter estimates (Table 1), denoted by\( {\hat{\ \alpha}}_p \)for a generic p independent variable and\( \hat{\ \theta } \), respectively, that will be used to generate the crime counts per person in the synthetic population. Thus, regression models consider individual and area-level variables typically associated with crime victimization risk and crime reporting, but these do not account for other area-level contextual attributes associated with crime and crime reporting, such as the presence of crime generators and attractors in the area (Brantingham and Brantingham 1995). Since this is a new methodological approach, we include only a small number of variables recorded in the census and IMD to keep the model parsimonious, avoid multicollinearity, and improve the model accuracy. Models do not consider other important factors, such as individuals’ routine activities and alcohol consumption, because these are not recorded in the census.
Table 1 Negative binomial generalized Linear models of crime victimization estimated from CSEW 2011/2012 data Table 1 shows the negative binomial regression models used to estimate crime victimization from CSEW 2011/2012 data. Measures of pseudo-R2 and normalized root mean squared error (NRMSE) indicate a good fit and accuracy of our models. We use the estimated regression coefficients to generate our synthetic population of crimes, but these also provide some information about which individual characteristics are associated with a higher or lower risk of victimization by crime type. For example, age is negatively associated with crime victimization in all crime types. Being male is a good predictor of suffering vehicle and property crimes, but not residence or violent crimes. With regards to income levels, those with some type of income have a higher risk of victimization by vehicle and violent crimes, whereas respondents without any income have a higher risk of suffering residence crimes. Citizens with a higher education degree generally suffer more property and vehicle crimes than residents without university qualifications, whereas those without higher education certificates are at a higher risk of suffering violent crimes. Married citizens tend to suffer more vehicle crimes, while non-married suffer more property and violent crimes. Citizens born in the UK experience more residence and vehicle crimes than immigrants. And areas with high values of deprivation concentrate more vehicle, residence, and property crimes.
Crime victimization counts for each unit in the simulated population are generated following a negative binomial regression model using the regression coefficient and dispersion parameter estimates obtained from the CSEW (Table 1) and the independent variables simulated in step 1. For example, we predict the number of vehicle crimes (Vehii) suffered by a given individual i as follows:
$$ {\mathrm{Vehi}}_i\sim NB\left({\hat{\tau}}_i^{\mathrm{Vehi}},{\hat{\ \theta}}^{\mathrm{Vehi}}\right), $$
(1)
where NB denotes the negative binomial distribution, and:
$$ {{\hat{\ \tau}}_i}^{\mathrm{Vehi}}={\hat{\ \alpha}}_0^{\mathrm{Vehi}}+{\hat{\ \alpha}}_1^{\mathrm{Vehi}}\mathrm{Ag}{e}_i+{\hat{\ \alpha}}_2^{\mathrm{Vehi}}\mathrm{Se}{x}_i+{\hat{\ \alpha}}_3^{\mathrm{Vehi}}\mathrm{Whit}{\mathrm{e}}_i+,{\hat{\ \alpha}}_4^{\mathrm{Vehi}}{\mathrm{NoInc}}_i+{\hat{\ \alpha}}_5^{\mathrm{Vehi}}{\mathrm{HE}}_i+{\hat{\ \alpha}}_6^{\mathrm{Vehi}}{\mathrm{Married}}_i+{\hat{\ \alpha}}_7^{\mathrm{Vehi}}{\mathrm{BornUK}}_i+{\hat{\ \alpha}}_8^{\mathrm{Vehi}}{\mathrm{IMD}}_i,i=1,\dots, N. $$
(2)
We repeat this procedure for all four crime types. Thus, the variability and relationships between variables observed in the CSEW are reproduced in our simulated population, and we assume that these values represent the true extent of crime victimization in the population of Manchester. We evaluate the quality of the synthetic population of crimes in sect. “Empirical evaluation of simulated dataset of crimes.”
Step 3. Simulating crimes known to police from CSEW data
The third step consists of estimating whether each simulated crime is known to the police or not. This allows us to analyze the difference between all crimes (generated in step 2), and those crimes known to the police (to be estimated in step 3) for each area in Manchester. First, we create a new dataset in which every crime generated in step 2 becomes the observational unit. Here, our units of analysis are crimes in places, instead of individual citizens; therefore, some residents may be represented more than once (i.e., those who suffered multiple forms of victimization).
In order to estimate the likelihood of each crime being known to the police, we follow a similar procedure as in step 2, but in this case, we make use of logistic regression models for binary outcomes, which are better described by the Bernoulli distribution of crime reporting. First, we estimate a logistic regression model of whether crimes are known to police or not. We use the CSEW dataset of crimes (n = 14,758), and fit the model using the same independent variables as in step 2 to estimate the likelihood of crimes being known to the police (see the results of logistic regression models in Table 2). We estimate one regression model per crime types to account for the fact that the crime type and incident seriousness are strongly linked to crime reporting (Baumer 2002; Xie and Baumer 2019b). The CSEW asks each victim of each crime whether “Did the police come to know about the matter?” We use this measure to estimate our regression models. Thus, here, we estimate if the police knows about each crime, which is not always due to crime reporting (i.e., estimates from the CSEW 2011/2012 indicate that 32.2% of crimes known to the police were reported by another person, 2.3% were witnessed by the police and 2.2% were discovered by the police by another way).
Table 2 Logistic models of crimes known to police estimated from CSEW 2011/2012 data Second, we estimate whether each crime in our simulated dataset is known to the police, following a Bernoulli distribution from the regression coefficient estimates shown in Table 2 and the independent variables simulated in step 1. As in the previous case, we repeat this procedure for each crime type, since some variables may affect some crime types in a different way than others (Xie and Baumer 2019a). For example, to estimate whether each vehicle crime j, suffered by an individual i, is known to police (KVehiji), we calculate:
$$ {\mathrm{KVehi}}_{ji}\sim \mathrm{Bernoulli}\ \left(\frac{\exp \left({{\hat{p}}_{ji}}^{\mathrm{KVehi}}\right)}{\left[1+\exp \left({{\hat{p}}_{ji}}^{\mathrm{KVehi}}\right)\right]}\right), $$
(3)
where:
$$ {{\hat{p}}_{ji}}^{\mathrm{KVehi}}={{\hat{\ \gamma}}_0}^{\mathrm{KVehi}}+{{\hat{\ \gamma}}_1}^{\mathrm{KVehi}}\mathrm{Ag}{e}_{ji}+{{\hat{\ \gamma}}_2}^{\mathrm{KVehi}}\mathrm{Se}{x}_{ji}+{{\hat{\ \gamma}}_3}^{\mathrm{KVehi}}\mathrm{Whit}{\mathrm{e}}_{ji}+{{\hat{\ \gamma}}_4}^{\mathrm{KVehi}}{\mathrm{NoInc}}_{ji}+{{\hat{\ \gamma}}_5}^{\mathrm{KVehi}}{\mathrm{HE}}_{ji}+{{\hat{\ \gamma}}_6}^{\mathrm{KVehi}}{\mathrm{Married}}_{ji}+{{\hat{\ \gamma}}_7}^{\mathrm{KVehi}}{\mathrm{BornUK}}_{ji}+{{\hat{\ \gamma}}_8}^{\mathrm{KVehi}}{\mathrm{IMD}}_{ji},j=1,\dots, J. $$
(4)
\( {\hat{\gamma}}_p \) denotes the regression model coefficient estimate for a p independent variable, and J denotes all simulated crimes. Measures of pseudo-R2 show a good fit of models.
One important constraint of crime estimates produced from the CSEW is that these provide information about area victimization rates (i.e., number of crimes suffered by citizens living in one area, regardless of where crimes took place), instead of area offence rates (i.e., number of crimes taking place in each area). This may complicate efforts to compare and combine survey-based estimates with police records. Given that our simulated dataset of crimes is based on CSEW parameters and census data about residential population characteristics, our synthetic dataset of crimes is also likely to be affected by this limitation. In order to mitigate the impact of this shortcoming on any results drawn from our study, we follow similar steps as in step 3 in order to estimate whether each crime took place in the residents’ local area or somewhere else and remove from the study all those crimes that do not take place within 15-min walking distance from the citizens’ household (see Appendix 2). Our final sample size is 452,604 crimes distributed across 1530 OAs in Manchester. This facilitates efforts to compare our simulated dataset of crimes with police-recorded incidents, but we note that our synthetic dataset does not account for those crimes that take place in an area but are suffered by persons living in any other place. According to estimates drawn from the CSEW 2011/2012, this represents 26.0% of all crimes, which are likely to be overrepresented in commercial areas and business districts in the city center, where the difference between the workday population and the number of residents is generally very large (e.g., 490.2% in Manchester city center; Manchester City Council 2011). We return to this point in the discussion section to discuss ways in which this shortcoming may be further addressed in future research.
Empirical evaluation of simulated dataset of crimes
Once all synthetic data are generated, we use victimization data recorded by the CSEW and data about crimes known to Greater Manchester Police (GMP) to empirically evaluate whether our simulated dataset of crimes matches the empirical values of crime. This is used to evaluate the quality of our synthetically generated dataset of crimes.
First, Table 3 compares the average number of crimes suffered by individuals across socio-demographic groups as recorded by the CSEW 2011/2012 and our simulated dataset. The distribution of the synthetic dataset of crimes is very similar to that of the CSEW, but values appear to be slightly larger in the synthetic population than in the survey data. For instance, citizens younger than 35 suffer the most crimes in both datasets, and males suffer more vehicle, residence, and property crimes. Crime victimization differences by ethnicity, employment status, education level, marriage status, country of birth, and IMD decile shown in the CSEW are also observed in the simulated dataset of crimes. In the case of residence crimes, incidences in our simulated population appear to be slightly larger than those observed in the CSEW. We note that our simulated dataset refers to crimes taking place in Manchester local authority, whereas the CSEW reports data for all England and Wales. In 2011/2012, the overall rate of crimes known to police per 1000 citizens was notably larger in Manchester than in the rest of England and Wales (Office for National Statistics 2019), and the Crime Severity Score for 2011/2012 (an index that ranks the severity of crimes in each local authority) was 104.6% larger in Manchester than the average of England and Wales (Office for National Statistics 2020). Therefore, the differences observed between CSEW and our synthetic population of crimes are likely to reflect true variations between the crime levels in Manchester and England and Wales as a whole.
Table 3 Average number of crimes suffered by individuals aged 16 or more by social and demographic characteristics in CSEW (weighted) and our simulated data Second, Table 4 presents the proportion of crimes that are known to the police grouped by the socio-demographic and contextual characteristics of victims in CSEW and our simulated data. By looking at the table, we see that the proportions related to the CSEW are very similar to the ones obtained on the simulated data. This shows that modeling results are consistent, thus preserving relationships between variables.
Table 4 Proportion of crimes known to police by social and demographic characteristics of victims in CSEW (weighted) and our simulated data Third, we download crime data recorded by GMP (https://data.police.uk/) and compare area-level aggregates of crimes known to GMP with our synthetic dataset of crimes known to the police. To do this, we only consider those simulated crimes that were estimated as being known to police and taking place in the local area. Spearman’s rank correlation and Global Moran’s I coefficients between the area-level aggregates of our synthetic dataset of crimes and crimes known to GMP are reported in Table 5. Tiefelsdorf’s (2000) exact approximation of the Global Moran’s I test is used as a measure of spatial dependency between the two measures, to analyze if the number of crimes in our simulated dataset is explained by the value of crimes known to GMP in surrounding areas (Bivand et al. 2009).
Table 5 Measures of correlation between simulated dataset of crimes known to police and incidents recorded by Greater Manchester Police in 2011/2012 We aggregate all crimes known to police to each spatial unit using the “sf” package in R (Pebesma 2018). Out of the 87,457 crimes known to GMP, 642 could not be geocoded. We note that we obtained slightly different results using two different analytical approaches to aggregating crimes in areas (i.e., counting crimes in OAs and then aggregating from OAs to LSOA, MSOAs, and wards using a lookup table, versus counting crimes in OAs, LSOAs, MSOAs, and wards, respectively), which may be due to errors arising from the aggregation process or inconsistencies in the lookup table. We chose the second approach (i.e., counting points in polygons at the different scales), since, on average, a larger number of offences were registered in each area using this method. Tompson et al. (2015) demonstrate that open crime data published in England and Wales is spatially precise at the levels of LSOA and MSOA, but that the spatial noise added to these data for the purposes of anonymity means that OA-level maps often have inadequate precision. Thus, we only present and discuss the results obtained at LSOA and larger spatial levels.
Table 5 shows positive and statistically significant coefficients of Spearman’s rank correlation for all crime types at the LSOA level. The index of Global Moran’s I is also statistically significant and positive in all cases. At the MSOA and ward levels, the coefficients of Spearman’s correlation for vehicle crimes are not statistically significant. This is likely to be explained by the small number of MSOAs and wards under study (56 and 32, respectively). Generally speaking, our simulated dataset of synthetic crimes is a good indicator of crimes known to police, although both datasets are not perfectly aligned. Our synthetic dataset of crimes may underestimate crimes known to police in areas with a large difference between workday and residential populations, but it appears to be a precise indicator of crimes known to police in residential areas. In the discussion section, we present some thoughts about how to address this in future research.
Assessing the results
In order to assess the extent to which the number of simulated crimes known to police varies from all simulated crimes at the different spatial scales, we calculate the absolute percentage relative difference (RD) and the percentage relative bias (RB) between these two values for each crime type in each area at four spatial scales.
First, RD is calculated for every area d in the specified level of geography (i.e., Geo = {OA, LSOA, MSOA, wards}), as follows:
$$ {RD}_d^{\mathrm{Geo}}=\mid \frac{K_d-{E}_d}{E_d}\mid \times 100, $$
(5)
where Ed denotes the count of all crimes in area d, and Kd is the count of crimes known to police in the same area.
Second, RB is computed as follows:
$$ {\mathrm{RB}}_d^{\mathrm{Geo}}=\left(\frac{K_d}{E_d}-1\right)\times 100. $$
(6)
We evaluate the average RD and RB at the different spatial scales, but also their spread, to establish if the measures of dispersion across areas become larger when the geographic scale becomes smaller. This permits a demonstration not just of the mean differences between all crimes and crimes known to police at different spatial scales but also the variability in these differences, to help shed light on whether there is higher variability at fine-grained spatial scales. This is investigated via the standard deviation (SD), minimum, maximum, and mean of the RD and RB at the different scales. In addition, boxplots and maps are shown to visualize outputs.