Crime, deterrence and punishment revisited

Despite an abundance of empirical evidence on crime spanning over 40 years, there exists no consensus on the impact of the criminal justice system on crime activity. We construct a new panel data set that contains all relevant variables prescribed by economic theory. Our identification strategy allows for a feedback relationship between crime and deterrence variables, and it controls for omitted variables and measurement error. We deviate from the majority of the literature in that we specify a dynamic model, which captures the essential feature of habit formation and persistence in aggregate behaviour. Our results show that the criminal justice system exerts a large influence on crime activity. Increasing the risk of apprehension and conviction is more influential in reducing crime than raising the expected severity of punishment.


Introduction
Crime, originating from the root of Latin cernō ('I decide, I give judgment'), is the behaviour judged by the State to be in violation of the prevailing norms that underpin the moral code of society. Where informal social controls are not sufficient to deter such behaviour, the State may intervene to punish or reform those responsible through the criminal justice system. The precise sanctions imposed depend on the type of crime and the prevailing cultural norms of the society. For offences deemed to be serious, criminal justice systems have historically imprisoned those responsible, in the hope that a combination of deterrence and incapacitation may lower the crime rate. According to an estimate, about 10 million people in the world are institutionalized for punishment, almost half of which are held in America, China and the UK (Walmsley 2009). Over the past 30 years, the American prison population has more than quadrupled. Raphael and Stoll (2009) have shown that the increase in the US prison population between 1984 and 2002 may be explained almost entirely by an increase in punitiveness, rather than an increase in crime. This has led some to label the extraordinary growth in the US prison population as one of the largest scale policy experiments of the century (Spelman 2000). Others have recently argued that significant reductions in the size of prison populations are possible nowadays without endangering public safety (Sundt et al. 2016).
The USA is not the only country to have experienced upward trends in its imprisonment rate over the last 30 years of course; nor is it the only country where the increase in imprisonment rates was driven by more punitive law and order policies rather than by an increase in crime. The Australian imprisonment rate, for example, has risen by 88% over the last 30 years. In the period covered by this article, it rose by more than 30%. Since 2000, this rise in imprisonment rates in Australia has occurred against a backdrop of falling crime (Weatherburn 2016).
How effective is the criminal justice system in deterring crime? To what extent do changes in the expected punishment influence the motivation of individuals to engage in illegal pursuits? How much wrongdoing does each additional prisoner avert? In order to address these questions in a constructive way, it is important to recognize that changes in the aggregate crime rate stem from individual behaviour. Policies such as increased sentence lengths may lower the crime rate through two possible channels: deterrence and incapacitation. It is well accepted in the literature that for a particular policy to be effective it cannot operate on incapacitation effects alone (Durlauf and Nagin 2011). In turn, for a policy to deter criminal behaviour it must be designed with an understanding of what causes individuals to engage in criminal activity.
During the early part of the twentieth century, most theories of crime tended to attribute criminal behaviour to defects in the individual or in society (Vold et al. 2002). The seminal papers by Becker (1968) and Ehrlich (1975) denied the existence of any qualitative difference between offenders and non-offenders and asserted that individuals engage in criminal activity whenever the expected benefit of doing so exceeds the expected cost. Therefore, criminals do not differ from the rest of society in their basic motivation but in their appraisal of benefits and costs. On this view, a rational criminal behaves in a calculated manner, considering the benefit of the illegal act together with the risk of apprehension and conviction as well as the likelihood and severity of potential punishment, which are a function of three separate stages of processing through the criminal justice system pertaining to the roles of police, courts and prison system, respectively. The idea of a rational criminal forges an important link with the deterrence hypothesis that underpins the criminal justice system-the notion that the crime rate can be reduced by raising the expected cost of criminal activity.
Since the seminal work of Becker (1968) and Ehrlich (1975), a large empirical literature has developed, seeking to inform public policy by collecting data on various populations and building econometric models that describe criminal behaviour of individuals. The public concern about crime is well justified given the pernicious effects that it has on economic activity, as well as on the quality of one's life in terms of a reduced sense of personal and proprietary security. However, despite the rich history of econometric modelling spanning over 40 years, there is arguably no consensus on whether there is a strong deterrent effect of law enforcement policies on crime activity. Empirical studies provide mixed evidence that are insufficient to draw clear conclusions. 1 The present paper revisits the economics of crime and punishment and provides a case study for New South Wales (NSW), Australia. We focus on four individual crime categories-namely theft, robbery, assault and homicide incidents. These offences broadly span the classification of criminal activity often employed in the literature. In addition, we consider two broader crime categories, property and violent crime. In order to alleviate heterogeneity bias, which can potentially arise due to differences across individual crime types in terms of occurrence and level of seriousness (see e.g. Cherry and List 2002), we consider weighted sums of the aforementioned four individual crime categories.
Our empirical strategy relies on GMM estimation of dynamic panel data models, and it takes into account various important methodological issues arising in the empirical analysis of criminal behaviour. In particular, our identification strategy allows for endogeneity or weak exogeneity between crime and the deterrence variables. Due to the panel structure of our modelling framework, instruments naturally arise with respect to sufficiently lagged values of the regressors. The validity of the instruments is examined empirically using tests for weak identification and overidentifying restrictions. In addition, the dynamic specification of our model captures the essential feature of habit formation and costs of adjustment in aggregate behaviour. This is important because it permits distinguishing between the effect of law enforcement policies in the short and the long run, and deriving equilibrium conditions as well as other meaningful dynamic quantities such as mean lag length of the effects. 2 The results of our analysis show that criminal activity is highly responsive to the prospect of arrest and conviction, but much less responsive to the prospect or severity of imprisonment, if at all. This provides support to the idea that the consequences of being arrested and found guilty of a criminal offence include indirect sanctions imposed by society and not just the punishment meted out by the criminal justice system. In 1 See Table 1 for a highly selective overview of crime studies.
2 For a recent overview on the dynamic panel data literature, see Bun (2015). particular, a convicted individual may no longer enjoy the same opportunities in the labour market and so the cost of social stigmatization can already be substantial in the event of conviction.
The sensitivity of our results is analysed extensively. First, we examine different moment conditions, depending on whether the probability of arrest is treated as endogenous or (weakly) exogenous, and we test for the validity of each specification. Second, we apply the methodology of Griliches and Hausman (1986) in order to test for measurement error in the data. Third, we estimate the crime model using a range of estimators other than GMM. Finally, we examine the effect of omitted variables in our model. The conclusions of our analysis appear to be fairly robust.
The remainder of this paper is as follows. Section 2 reviews various methodological issues arising in the empirical analysis of criminal behaviour. Section 3 presents the econometric specification and the identification strategy employed. Section 4 discusses the data and reports the results. A final section concludes.

Methodological issues
Unfortunately, empirical analysis of the effect of law enforcement policies on criminal activity is inherently problematic due to the nature of crime data available. In particular, data collected from individuals are self-reported and are doubtlessly affected by significant measurement error (Freeman 1999). Moreover, the time and cost involved in surveying a representative population can be prohibitively large. As a result, empirical studies of crime typically use some form of aggregate data, which describe crime in locales (e.g. local areas, states or countries) and are based on official records rather than self-reported information.
However, aggregate data are also not without problems. This has led some to suggest that the use of individual and aggregate data may be regarded as two complementary approaches (Trumbull 1989).
To begin with, since the economic model of crime purports to describe illegal behaviour of individual agents rather than an empirical aggregate, summing up over crime offences over individuals might inherently introduce some form of so-called aggregation bias.
Furthermore, the use of aggregate data introduces a problem of lack of exogeneity for some regressors, making the causal effect of law enforcement policies on crime more difficult to identify. For example, an exogenous upward shift in crime rate may eventually overwhelm police resources, given that police resources are fixed in the short term, causing the probability of arrest to decrease. This property is known in the econometrics jargon as reverse causality or simultaneity.
Even if some feedback relationship is not present in the data, the empirical probability of arrest (when defined as number of arrests divided by the number of crime offences) suffers from the fact that the numerator of the dependent variable (number of crime offences) is the denominator in the probability of arrest. This artificially induces a negative correlation between the two variables (Nagin 1978)-a phenomenon that is known as ratio bias (see e.g. Dills et al. 2008).
An additional issue arising with aggregate data might be measurement error. Measurement error can manifest itself in at least two ways. Firstly, crime data typically record reported crime offences, rather than actual ones. For instance, it has sometimes been argued that it does not always serve the victim's best interest to report an offence (Myers 1980). This type of measurement error may particularly affect the crime rate and probability of arrest variables (Levitt 1998), as they are both constructed based on the number of crime offences. Secondly, because of timing issues, judiciary variables (courts and prisons) might follow a different timing than other variables affecting the (individual) propensity to commit a crime. As a result, (say) the empirical probability of imprisonment, which is the ratio of the number of imprisonments over convictions, could reflect crime offences that occurred at previous points in time. 3 To put this differently, if offenders have high discount rates (as it has been argued in, for example, Wilson and Hernstein 1985), the speed with which offenders are apprehended and punished is important. This issue can be particularly pronounced with relatively high-frequency data (e.g. monthly observations), but less so for yearly data. 4 Finally, there is the potential for omitted variable bias in the estimated parameters. Omitted variables imply a bias if they are correlated with included regressors. In particular, it is hardly ever the case that a complete model is specified that includes all deterrence variables prescribed by economic theory. This is likely to be due to lack of data or the fact that certain experimental designs intended to combat endogeneity preclude the possibility of examining all deterrence variables of interest. Whatever the appropriate explanation is, the evidence on crime deterrence has come to conform broadly to several distinct sub-researches, in which the effect of the probability of arrest, the probability of conviction, the probability of imprisonment and the length of average sentence is rarely examined together.
The aforementioned issues-namely aggregation bias, reverse causality, ratio bias, measurement error and omitted variable bias, all render the deterrence explanatory variables endogenous, that is, correlated with the error term of the model. It is well known that in the presence of endogeneity, least-squares-based estimates of the economic model of crime are biased and inconsistent. Despite that, a majority of crime studies do not control for endogeneity, which casts doubt on their results (Blumstein et al. 1978). Dills et al. (2008) use aggregate data to demonstrate that raw correlations between crime rates and deterrence variables are frequently weak or even perverse due to the problem of reverse causality and note that any identification strategy would need to be powerful enough to partial out the effect of deterrence on the crime rate and provide a result consistent with economic theory. Table 1 summarizes the empirical results for some widely cited contributions to the crime deterrence literature using aggregate data. 5 For each of the studies noted, the table reports the sampling population, the unit of observation, the structure of the Author provided multiple estimates, in which case the median is reported data followed by the sample size 6 , the method used to estimate the model, the type of crime analysed and finally the actual results. Clearly, there is a paucity of studies that estimate a fully specified economic model of crime, with notable exceptions being the papers by Wolpin (1978Wolpin ( , 1980, Pyle (1984), Trumbull (1989), Cornwell and Trumbull (1994) and Spengler (2008, 2015). In most of these studies, least-squares-based methods are used to obtain estimates of the parameters; hence, all deterrence variables are treated as exogenous. Trumbull (1989) justifies this choice claiming that endogeneity is not a salient feature of the existing data set, based on the results of a Wu-Hausman specification test. A few studies also apply IV regression and treat the probability of arrest as endogenous, but all remaining variables as exogenous (Cornwell and Trumbull 1994;Spengler 2008, 2015). The authors fail to find a statistically significant relationship between the deterrence variables and crime using a 2SLS procedure. 7 The remaining studies restrict their attention to a particular variable of interest. Failing to include all deterrence variables fosters a disconnect between economic theory and empirical analysis. In order for a criminal to be punished, the person must be arrested and found guilty first; omitting the probability of arrest and conviction clearly ignores a fundamental aspect of the criminal decision. For example, Mustard (2003) shows that arrest rates are likely to be negatively correlated with the probability of conviction and sentence length since arrest rates are often substitutes for conviction rates and sentences. As a result, Mustard (2003) concludes that previous estimates of the marginal effect of the probability of arrest may understate the true effect of the arrest rate by as much as 50%. Furthermore, omitted variables may invalidate estimation based on instrumental variables. Candidate instrumental variables may not be orthogonal to the deterrence variables omitted from the regression; hence, actually they do not constitute instruments.

Model
The dependent variable is the rate of crime, which is defined as the ratio of the number of crime offences committed in a given local government area (LGA) i at time t (labelled cr m it ) over population ( pop it ). The rate of crime is not the same as the binary 'crime-no crime' decision an individual faces, but it is arguably the closest substitute one can observe at the aggregate level.
The economic model of crime postulates that criminals are rational individuals who assess the risk of apprehension and conviction as well as the likelihood of punishment prior to committing an offence, and ultimately evaluate the expected benefit and cost associated with an illegal activity. Therefore, the crime rate is modelled as a function of the empirical probability of arrest, the empirical probability of conviction given arrest and the empirical probability of imprisonment given conviction. This leads to the following specification: for t = 1, . . . , T time periods and i = 1, . . . , N regions, where (cr m it , arr it , conv it , impr it ) denote the number of crime offences, arrests, convictions and imprisonments, respectively. The inclusion of sentence length (avsen it ), income (income it ) and unemployment (unemp it ) in the above equation captures the expected cost/gains from the illegal and legal sectors. Precise definitions of all variables used in our regression analysis are provided in Table 2. Using short-hand notation, the model can be rewritten as: ln cr mr it = α ln cr mr it−1 + β 1 ln pr barr it + β 2 ln prbconv it +β 3 ln pr bimpr it + β 4 ln avsen it + β 5 ln income it The error term in (2) allows for regional-specific effects (η i ), which may be correlated with the regressors, as well as time effects (λ t ) that capture common variations in crime across regions. The coefficient of the lagged value of the dependent variable, α, measures the combined effect of short-run dynamics and time-varying omitted regressors hidden in lagged crime rates. We also considered the possibility that criminals may form expectations about conviction rates in an adaptive manner, implying that lags of these variables should also be included on the right-hand side. We tested specifications including such lagged effects but they were largely insignificant.
Finally, it is worth mentioning that many of the models used in the literature (see e.g. Table 1) are restricted versions of (2). For example, many studies do include the probability of arrest, but exclude the probabilities of conviction and imprisonment and sentence lengths.

Identification strategy
We estimate model (2) by the generalized method of moments (GMM) developed originally by Hansen (1982) and adapted for estimation of dynamic panel data models by Arellano and Bond (1991), Arellano and Bover (1995) and Blundell and Bond (1998). The GMM approach has the advantage that, compared to maximum likelihood, it requires much weaker assumptions about the initial conditions of the data generating process and avoids full specification of the serial correlation and heteroskedasticity properties of the error, or indeed any other distributional assumptions. Moreover, GMM is a natural choice when multiple explanatory variables are endogenous. For the reasons discussed in Sect. 2, we treat the probability of arrest as an endogenous regressor. The lagged crime rate, which models the short-run dynamics, is an additional endogenous regressor.
To remove the region-specific effects, first differences are taken from the original model in levels (2) resulting in: ln cr mr it = α ln cr mr it−1 + β 1 ln pr barr it + β 2 ln prbconv it +β 3 ln pr bimpr it + β 4 ln avsen it + β 5 ln income it GMM estimation of the first-differenced model (3) has been developed by Arellano and Bond (1991). Since dynamic panels are often largely overidentified, an important practical issue is how many moment conditions to use. It is well documented that numerous instruments can overfit endogenous variables in finite samples, resulting in a trade-off between bias and efficiency. There is substantial theoretical work on the overfitting bias of GMM coefficient estimators in panel data models (Ziliak 1997;Alvarez and Arellano 2003;Bun and Kiviet 2006). Furthermore, with many moment conditions the power of (mis)specification tests deteriorates rapidly (Bowsher 2002). Roodman (2009) compares two popular approaches for limiting the number of instruments: (i) the use of (up to) certain lags instead of all available lags and (ii) combining instruments into smaller sets. Using asymptotic expansion techniques, Bun and Kiviet (2006) show that the order of magnitude of bias is reduced when going from all moment conditions to using only nearest lags as instruments. Furthermore, their simulation results show that when the number of time periods is a double digit, like T = 13 in the current study, the GMM estimator using nearest lags as instruments has much less finite-sample bias than the GMM estimator using all moment conditions. Therefore, we follow the recommendation of using a limited number of moment conditions and only employ the three nearest lagged instruments. Furthermore, we collapse them resulting in the following six moment conditions for the model in first differences: for j = 0, 1, 2. We note that another advantage of collapsed instruments is that the underlying time-specific moment conditions do not need to hold exactly for each time period, but only in sum. Regarding the probability of arrest, we will also estimate a specification under weak exogeneity, in which case we make use of Conviction and imprisonment rates, as well as the severity of punishment, are determined by the judiciary system. In practice, it is hard to believe that the judiciary system is strictly exogenous to crime rates. Similarly, it is natural to think that recent changes in economic conditions may exert some impact on current crime rates. As a result, we allow in estimation for a feedback relationship between past values of crime and current values of the remaining deterrence regressors ( prbconv, pr bimpr , avsen), as well as of economic conditions (income, unemp).
As it is briefly mentioned in Sect. 2, since the number of crime offences enters into both the numerator of ln cr mr and the denominator of ln pr barr in Eq. (3), the model is subject to ratio bias. However, so long as the idiosyncratic error term of the model, ε it , is serially uncorrelated, then values of ln pr barr it−1 lagged two or more periods may serve as valid instruments to identify β 2 . 8 It is well known that identification can be weak when the panel data are persistent or, more generally, when the correlation between endogenous regressors and instruments is close to zero. We therefore check the identification strength of the exploited moment conditions in various ways. First, we estimate pure autoregressive models for the endogenous regressors and check whether autoregressive dynamics are reasonably far away from the unit root. Second, we use the Kleibergen and Paap (2006) rank statistic to test for underidentification in the first-differenced and levels IV models.
To check the validity of the estimated specification, we report the p value of Hansen's (1982) J test of overidentifying restrictions and the p value of Arellano and Bond's (1991) test of serial correlation of the disturbances up to second order. The former is used to determine empirically the validity of the overidentifying restrictions in the GMM model. The latter is useful because the use of lagged values of the endogenous variables as instruments (in levels) requires that serial correlation in the idiosyncratic error term is only up to a certain order. 9 Long-run estimates are computed by dividing the short-run slope coefficients by 1 minus the estimated autoregressive parameter. Robust standard errors are reported in parentheses, which are valid under arbitrary forms of heteroskedasticity and serial correlation. Furthermore, we perform the correction proposed by Windmeijer (2005) for the finite-sample bias of the standard errors of the two-step GMM estimator. 10 The 8 That is, ln prbarr it−2 (and further lags) satisfies in this case the two conditions required for validity of an instrument, namely, it remains correlated with ln prbarr it in (3) but it is uncorrelated with ε it . 9 We note that instruments are more likely to be valid in panel data settings compared to time series or cross-sectional regressions because the multi-dimensionality of panel data allows one to capture richer sources of unobserved heterogeneity relative to time series and cross-sectional data alone. 10 All GMM results have been obtained using David Roodman's xtabond2 algorithm in Stata 15. standard errors of the long-run estimated parameters are subsequently obtained using the Delta method.

Data
We construct a new data set containing information on criminal activity and deterrence for all N = 153 local government areas in New South Wales, each one observed over a period of T = 13 years from 1995/1996 to 2007/2008. The Australian Standard Geographic Classification (ASGC) defines the LGA as the lowest level of aggregation following the census Collection District (CD) and Statistical Local Area (SLA). 11 Thus, the LGA represents a low level of aggregation compared to standard practice in the literature, where regressions using city-, state-and country-level data are common.
LGAs in NSW range in size from over 350,000 people (3261.9 persons per km 2 to a little over 2,000 people (i.e. less than one person per km 2 ). The average LGA has a population of 58,438 persons (sd. = 77,585 persons) (Australian Bureau of Statistics 2017). Although LGAs include both urban and rural areas, most of the population of NSW can be found in urban rather than rural areas. The three cities of Sydney, Newcastle and Wollongong account for three quarters of the NSW population (NSW Department of Industry 2017).
Our data on crime are drawn from COPS-the NSW Police Operational Policing System, to which the NSW Bureau of Crime Statistics and Research has online access. This system records each crime incident reported to or discovered by police. A crime incident is defined as an activity detected by or reported to police which: • Involved the same offender(s); • Involved the same victims(s); • Occurred at one location; • Occurred during one uninterrupted period of time; • Falls into one offence category; • Falls into one incident type (e.g. 'actual', 'attempted', 'conspiracy').
The data are categorized according to the date of reporting to or detection by police, not by the date of occurrence of the offence. The deterrence variables (probabilities of arrest, conviction and imprisonment, as well as average non-parole period length) are drawn from a separate database on court appearances and outcomes maintained by the NSW Bureau of Crime Statistics and Research. This database contains information, inter alia, on the charge(s) laid against each offender, which charge(s) resulted in a conviction, whether a prison sentence was imposed, the length of the aggregate (total) sentence and the length of the non-parole period. The aggregate sentence is the maximum time the offender can be held in custody for the offence(s) he/she has committed. The non-parole period defines the minimum period the offender must spend in custody before being released on parole. Where the aggregate sentence is less than 6 months, no non-parole period can be specified. In this instance, the aggregate sentence defines the minimum period the offender must spend in custody. For the purposes of this analysis, we define the sentence length as the length of the non-parole period where one is specified and the length of the aggregate sentence where the aggregate sentence is less than 6 months. 12 It is worth noting that in NSW there are three levels to the NSW court system, namely the Supreme, District and Local Courts. However, more than 90% of criminal cases are finalized in a Local Court, where guilt or innocence is determined by a magistrate rather than by a judge or jury. In 2016, more than 90% of Local Court cases resulted in a conviction on at least one charge, in most cases because the defendant pleaded guilty. 13 The average time taken to finalize a Local Court matter in 2016 was 64 days. The only offences examined here that are exclusively dealt with by a higher court (i.e. the District or Supreme Court) are those involving robbery, homicide and some serious sexual offences. The majority of these cases also involve a guilty plea. The average time to finalize a guilty plea in the higher courts is around 14 months but the distribution is highly skewed, with the majority of guilty pleas being finalized in less than 12 months. In the vast majority of cases, then, only a short period elapses between arrest and sentence and most of those arrested will end up convicted.
Income and population data have been obtained from the Australian Bureau of Statistics (ABS) website, while the unemployment data have been purchased from the Small Area Labour Markets division of the Department of Education, Employment and Workplace Relations (DEEWR). The raw data for income and population are not readily comparable with the crime data because they are based on different ASGC standards, i.e. LGA boundaries are defined slightly differently by the NSW Bureau and the ABS. To this end, we mapped the data to a common ASGC standard (2006) using a series of concordance tables, in order to achieve consistency. Similarly, the unemployment data were first mapped to the same ASGC standard (2006) to account for name and boundary changes that occurred in the LGAs over the sample period. The resulting SLA data were then aggregated to the LGA level to be directly comparable to the other data.
We distinguish between two broad crime categories, i.e. property and violent crime. Property crime is defined as any incident of robbery without a weapon, robbery with a firearm, robbery with a weapon not a firearm, stealing property in a dwelling house, motor vehicle theft, stealing from motor vehicle, stealing from retail store, stealing from dwelling, stealing from person, stock theft, other theft and fraud. Violent crime is defined as any incident of homicide, non-domestic violence-related assault, domestic violence-related assault, robbery without a weapon, robbery with a firearm, robbery with a weapon not a firearm, sexual assault, indecent assault or act of indecency, or other sexual offence. 14 Since both property and violent crime comprise individual crime categories that have quite different occurrence and level of seriousness, adding them all up together using simple (unweighted) summation can potentially invalidate inferences. 15 For instance, a homicide incident occurs less often and can be far more severe compared to a robbery incident. Therefore, in what follows we analyse weighted sums of property and violent crime. The weights are computed using two different ways. Firstly, we compute variance weights based on the sample within standard deviation of the individual crime types. This set of weights reflects the fact that more severe types of crime occur less often. Secondly, we compute weights based on the 'average seriousness score per incident' reported in Heller and McEwen (1973) and Blumstein (1974). 16 The seriousness scores for each of the aforementioned crime categories are as follows: theft, 2.29; robbery, 6.43; assault, 9.74; homicide, 33.29. These scores have been adjusted such that in each of the two broad crime categories, they add up to unity. 17 In order to deal effectively with heterogeneity across different individual crime types, we also analyse in length each of the aforementioned four crime types separately, namely theft, robbery, assault and homicide. These four categories broadly span the classification of crime offences often used in the literature, see e.g. Table 1 in the present paper, as well as Table 1 in Cherry and List (2002).
The various deterrence variables (probabilities of arrest, conviction and imprisonment, as well as average prison length) are computed specifically for each type of crime analysed. This accommodates the expectation that, apart from having different values across crime types, these variables may potentially have a different deterrence effect across crime types.
Tables 3 and 4 report descriptive statistics for the various categories of crime considered in our analysis. 18 As expected, the mean value of the rate of violent crime, as well as that of its individual crime components, is smaller than that of property crime and it exhibits a much smaller dispersion as well. This indicates that violent crime occurs less frequently and is more localized. The empirical probability of arrest 14 It is apparent that robbery is both a property and violent crime because it involves violence (or the threat of it) to unlawfully obtain property. 15 We are grateful to two anonymous referees for alerting us about this issue at first place. 16 On the computation of these scores, the interested reader may refer to Tables 6 and 2 of the aforementioned papers, respectively, as well as the associated discussion. 17 In terms of property crime, weights based on the within standard deviation of individual crime types are theft = 0.950; robbery = 0.050, whereas weights based on the seriousness score are theft = 0.528; robbery = 0.472. In terms of violent crime, weights based on the within standard deviation of individual crime types are robbery = 0.272; assault = 0.718; homicide = 0.010, whereas weights based on the seriousness score are robbery = 0.099 ; assault = 0.387; homicide = 0.514. 18 We note that the results of the GMM estimated aggregate crime models are very similar irrespective of the method used to compute weights; see Tables 5 and 6. Therefore, unless otherwise stated, in what follows aggregate crime results correspond to variance weights computed based on the within standard deviation of individual crime types.  is higher on average for violent crime than property crime, which reflects the fact that in many cases violent crime involves face-to-face contact increasing the probability of apprehension. In regard to the category of homicides, the mean value of average sentence length is much larger than the value in the 90th percentile, which indicates that there are a relatively small number of very big sentences in the sample. Figures 1 and 2 show the development over time of property and violent crime as well as their corresponding arrest rates. Cross-sectional averages have been plotted; hence, the line graphs depict average development across LGAs. Figure 1 shows that property crime increases gradually in the 1990s, peaks around 2000 and then falls sharply afterwards. The property crime arrest rate gradually decreases until 2003 and then stabilizes. By contrast as shown in Fig. 2, average violent crime increases steadily until 2002, and then, it exhibits a small downward trend. At the same time, the arrest rate declines until 2002 and it follows an upward trend after that date. Figure 3 shows that economic conditions, i.e. per capita income and unemployment, improved steadily over the whole sample period.
To get an idea of the time-series persistence in our data, we estimate pure autoregressive models for the crime rate and the probability of arrest, i.e. the endogenous regressors in our empirical analysis. System GMM estimates of autoregressive coefficients are in the range 0.2-0.6, which shows moderate persistence only. Not surprisingly panel unit root tests (Harris and Tzavalis 1999;Im et al. 2003) reject the null hypothesis of a unit root for both crime rates and probability of arrest and for both property and violent crime.

Baseline results
We analyse the weighted sums of property crime and violent crime, based on the econometric model presented in the previous section. First-differenced GMM estimates allowing for endogeneity of lagged crime and the probability of arrest are reported in Tables 5 and 6. We treat all remaining explanatory variables as weakly exogenous. In Table 5, aggregate crime series have been constructed using the within standard deviation of individual crime types, whereas in Table 6 aggregate crime series have been constructed using weights computed based on the average seriousness score per incident. The results reported in Tables 5 and 6 are remarkably similar even if the corresponding weights take rather different values. Therefore, we only discuss the results in Table 5 based on variance weights. To begin with, the p values from the reported overidentifying restrictions test show no evidence of lack of validity of the estimated specification. Furthermore, the Kleibergen and Paap (2006) rank test indicates no underidentification in either equation as the null hypothesis is soundly rejected at the 5% level of significance.
The GMM estimates of the deterrence effects are of the expected sign with the exception of the coefficient of average sentence, which is largely insignificant. For property crime, a 1% increase in the probability of arrest appears to decrease the expected value of the crime rate by 0.283% in the short run and 0.365% in the long run, ceteris paribus. Likewise, the elasticity of the probability of conviction is about Weights are computed based on the within standard deviation of individual crime types. Each regression includes LGA-and time-specific effects − 0.148 and − 0.191 in the short and long run, respectively. The fact that the estimated elasticities are larger in the long run is well anticipated, since typically one needs time to adjust fully to changes in law enforcement policies, due to habitual behaviour, imperfect knowledge and uncertainty. In particular, the value of the autoregressive parameter indicates that it takes about 2 years for 90% of the total impact of either one of the explanatory variables on crime to be realized, all else being constant. The estimated coefficients of the probability of imprisonment are much smaller compared to the coefficients of the probability of arrest and the probability of conviction, whereas the effect of average sentence is insignificant as mentioned above. This indicates that imprisoning more criminals, or imprisoning them for longer, is not as effective as increasing the risk of apprehension or conviction once arrested. In other words, criminal activity seems to be highly responsive to the prospect of arrest and conviction, but less responsive to the prospect or severity of imprisonment. This provides support to the idea that the consequences of being arrested and found guilty of a criminal offence include the indirect sanctions imposed by society and not just the punishment meted out by the criminal justice system. A convicted individual may no longer enjoy the same opportunities in the labour market or the same treatment by their peers, and so the opportunity cost of lost income and the cost to the individual Weights are computed based on the average seriousness score per incident. Each regression includes LGAand time-specific effects of social stigmatization are implied in the event of conviction. Zimring and Hawkins (1973, p 174) argue: Official actions can set off societal reactions that may provide potential offenders with more reason to avoid conviction than the officially imposed unpleasantness of punishment.
The results suggest that the lost social standing resulting from a conviction may well outweigh the effects of prison sentence, let alone a fine or community service order. Table 7 presents results for the four disaggregated crime categories. The conclusions are qualitatively similar to those of the weighted aggregates. In particular, the estimated deterrence effects are of the expected sign, with the exception of average sentence, the effect of which is largely statistically insignificant. Moreover, the coefficients of the risk of apprehension and conviction remain much larger than those of the probability of imprisonment and average sentence. Hence, increasing the risk of apprehension or conviction once arrested appears to be much more effective compared to the practice of imprisoning more criminals, or imprisoning them for longer.  Table 5 There are two notable differences in the results obtained between the disaggregated and aggregated crime offences. Firstly, the standard error of the estimated coefficients obtained from the disaggregated models is relatively larger in general. This reflects the fact that the sample size for individual crime categories is smaller because there are quite more zero-crime occurrences in this case. 19 Secondly, arguably for the same reason, the p value of the rank test statistics indicates a potential weak instruments problem for the robbery and homicide equations, and less so for theft. This shows that analysing weighted aggregates of individual crime categories is quite appealing in this respect.

Sensitivity analysis
As discussed previously, the arrest probability is often seen as an endogenous regressor in the empirical crime literature. Hence, we have estimated our baseline crime model allowing for endogeneity of the probability of arrest. However, efficiency gains in the coefficient estimates may arise by imposing weak exogeneity on this main deterrence regressor. As such, we have re-estimated the model allowing for weak exogeneity in  Table 5 the probability of arrest. In this case, we add (5) into the set of moment conditions employed by the GMM estimator. The results are reported in Table 8. There are two main differences between Table 5 and Table 8. Firstly, in the latter case the standard error of the estimated coefficients is much smaller, often by a magnitude of less than a half. That is, imposing weak exogeneity substantially improves the efficiency of the estimated coefficients. Secondly, the p value of the overidentification test statistic is now smaller in general, although it remains larger than 0.05 in most cases. Furthermore, compared to Table 7 the p value of the rank test falls dramatically, such that this time a weak instruments problem is potentially an issue only for the homicide equation.
To further test the exogeneity of the probability of arrest and conviction, we apply the empirical test of Griliches and Hausman (1986) to detect the presence of measurement error. The idea is that long differences 20 , as opposed to first differences, are less vulnerable to measurement error. Therefore, in the absence of measurement error the OLS estimator of the arrest/conviction rate elasticity in the differenced crime model should not show any systematic pattern across the different lengths. Levitt (1998) applied this test to investigate the extent of measurement error and ratio bias in the crime arrest rate relationship and found no significant measurement error. Table 9 reports findings for the Australian crime data. To save space, we only report results  The results corroborate the findings of Levitt (1998) in that there is little evidence of measurement error and ratio bias in the probability of arrest and the probability of conviction. We took second, third and fourth long differences, and the arrest/conviction rate elasticity changes little across specifications.
Next, we analyse the sensitivity of our results to omitted deterrence variables. Mustard (2003) shows how excluding conviction rates and sentence length from the model leads to omitted variables bias. In particular, due to the negative correlation between these regressors and the probability of arrest, the true effect of arrest rates on crime may be underestimated. Table 10 reports results from specifications including only the probability of arrest as a deterrence variable, which is treated as endogenous.
The pattern of the estimates corroborates the findings of Mustard (2003), i.e. omitting other relevant deterrence variables lowers the arrest rate elasticity considerably in all cases. As expected, the overidentification test statistic suggests that the moment conditions used in GMM estimation are invalidated. 21 The omitted deterrence variables are serially correlated and also correlated with the regressors (see Table 4).
Imposing (weak) exogeneity of the arrest probability implies that one can also use alternative inference methods. In particular, the standard least squares dummy variables (LSDV) estimator becomes a meaningful choice when T is large enough, and so does the mean group (MG) estimator proposed by Pesaran and Smith (1995), which allows for slope parameter heterogeneity across different LGAs. Here, T = 13 is double digit; hence, we might apply such methods with some confidence. In addition to the aforementioned estimators, imposing weak exogeneity of the arrest probability implies that a seemingly unrelated regressions (SUR) estimator could also be a viable alternative, especially in situations where T is thought to be large enough. In the present case, SUR presents an alternative approach for estimating aggregated crime   Table 5 models since it effectively weighs observations according to their (co-)variance (see e.g. Cherry and List 2002). Tables 11, 12, 13 and 14 report results for the pooled OLS (POLS), LSDV, SUR and MG estimators, respectively. It is obvious that not accounting for region-specific effects (Table 11) leads to severe underestimation of the effect of the judicial system on crime, whereas the autoregressive coefficient is biased upwards, as expected. The estimated  Table 5 coefficients obtained from LSDV and MG (Tables 12 and 14) are quite similar to each other and are, overall, plausible in sign and magnitude. Since the pattern of the MG-based estimates is mostly in line with the LSDV estimates, this suggests that the assumption of common parameters across regions is not so restrictive in our sample. The estimated coefficients obtained from SUR for the aggregated crime categories are statistically similar to those obtained from GMM (see Tables 5 and 6), with the  Table 5 main exception perhaps being the coefficient of the lagged dependent variable for violent crime, which appears to be somewhat smaller. The standard errors for all three estimators suggest much higher precision relative to GMM.

Concluding remarks
We estimate an econometric model for crime using a new panel data set containing information on illegal activity and deterrence variables for local government areas in New South Wales, Australia. We take into account various endogeneity concerns expressed previously in the literature. Our findings suggest that the criminal justice system can potentially exert a large impact on crime.
Our results show that increasing the risk of apprehension and conviction exhibits a much larger effect in reducing crime compared to raising the expected severity of punishment. This may have significant policy implications. For example, if it were estimated that the cost of keeping a prisoner incarcerated for a year was roughly equivalent to the cost of making a single additional arrest, then one could justify a redirection of resources from prisons to policing. This implies that imprisoning more criminals, or imprisoning them for longer, is not optimal from a policy perspective, assuming that the cost involved behind these activities is of similar magnitude.
In our analysis, we address the impact of feedback between crime and deterrence, and it controls for measurement error and omitted variables. The resulting dynamic panel data model of crime is estimated by GMM. We show that the detrimental effects of measurement error and ratio bias are largely absent in our data. In general, we do not find overwhelming evidence for endogeneity of arrest rates. Furthermore, we show the necessity of including all relevant deterrence variables from the judiciary system to avoid underestimation of the effect of law enforcement policies.
Our conclusion that the deterrent effect of prison is rather limited will be regarded by some as controversial but it is entirely consistent with recent research on prison downsizing strategies that have been implemented over the last few years in the USA. For example, in the first year following the passage of California's Public Safety Realignment Act in 2011 (an Act designed to reduce the California prison population), the State's prison population fell by approximately 27,400, a roughly 17% decline on the population in 2010. With the exception of a slight and transient increase in autotheft, studies have found little evidence that the reduction in prisoner numbers produced an increase in crime (Sundt et al. 2016;Lofstrom and Raphael 2016). The extensive literature on the specific deterrent effect of prison on re-offending also reveals largely negative findings (Nagin et al. 2009).
There are, nonetheless, several issues that remain to be explored. As Nagin (2013) points out, the conclusion that risk of apprehension exerts a more significant effect than punishment severity is largely derived from research examining the risk of apprehension by police. Little has been done to see whether the risk of conviction, given arrest, also has a deterrent effect or on punishment celerity-that is, the effect which sanction speed has on risk of re-offending (Chalfin and McCrary 2017). We know comparatively little about the kinds of police activities (and other forms of intervention) that exert the strongest influence on the perceived risk of apprehension. Finally, much work remains to be done in assessing the costs and benefits of various criminal justice options for reducing crime.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.