Introduction

In criminology, the impact of interventions and sanctions by the formal social control agencies, police and criminal courts, is assumed to have two directions: They may prevent offenders from further offending, or they may reinforce subsequent delinquency.

The idea that penal sanctions shall have a crime-preventive impact is mainly a heritage of the Age of Enlightenment. In order to restrict the excessive retributive punishments in feudal regimes, two concepts evolved. First, the concept of treatment: if offenders are treated through a strict working regime— in lieu of corporal or capital punishment—they will become honest and will rehabilitate (Howard, 1777).Footnote 1 Second, the concept of deterrence: Philosophers like Beccaria (1986 [1764]) or Bentham (1988 [1776]) proposed that a humane penal harm (which excluded corporal or capital punishment) should be determined in such a way that it should merely deter the offender from further offending by a sanction proportionate to the offense-induced harm (see Bruinsma, 2018).

It took some time until these preventive ideas arrived in the law books and in penal practice. In England and the USA, preventive programs became relevant already in the nineteenth century, while in Germany (with the exception of juvenile penal law),Footnote 2 they became influential only in the late 1960s, two decades after the Nazi-Regime had been defeated.

Today, rehabilitative treatment and deterrence form next to retribution (as offense-proportionate and insofar restricted and just punishment) the basic legitimacy of a modern criminal law. The preventive turn also resulted in a further innovation in terms of modern rationality. From then on, the effectiveness of the criminal justice system became an object of empirical investigation. Next to the black letter lawyer, the social and behavioral scientist entered the stage of penal sciences.

However, against the backdrop of empirical observations, penal sanctions appeared to be much less promising in preventing further offending than the rather optimistic modern reformers had expected (for a review, see Sherman et al., 1998). This was apparently one reason for the broad attention given to the alternative theoretical perspective of labeling, which assumes that penal interventions do not prevent but reinforce or even initiate delinquent behavior.

With the methodological progress of panel studies in developmental and life-course criminology, scholars received the appropriate tools to analyze the causal impact of penal sanctions using quasi-experimental designs. Nevertheless, these sophisticated studies did not produce clear support for unidimensional preventive or promoting effects of penal sanctions either. Rather, many estimated effects were statistically insignificant (Barrick, 2014; Huizinga & Henry, 2008; Kleck & Sever, 2017).

However, today, the different theories on sanctioning effects assume mainly a mediated causal process (see Bernburg, 2019; Paternoster, 2018; Krohn et al., 2014): Penal sanctions may lead to an at maximum moderate increase in subjectively perceived detection and sanctioning risks (deterrence); or may support programs which combine the (re-)construction of social bonds with the promotion of cognitive agency (rehabilitative treatment); or may disturb prosocial structural resources and support a delinquent self-concept (labeling). Empirical results appear to support these contradicting assumptions about a mediated impact in one way or another, lending somewhat more evidence to delinquency-promoting than delinquency-preventing mechanisms (Bernburg, 2019; Huizinga & Henry, 2008; Paternoster, 2018). Due to limited space, we will not report on the impact of formal controls on mediating factors, which were analyzed separately in Kaiser (2022; see also discussion section).

Furthermore, most studies on the impact of formal controls were limited by using official court and police data as proxy for subsequent delinquency (Barrick, 2014; Kleck & Sever, 2017). However, official crime data result from a mixture of delinquent behavior and the reactions of criminal justice agents and thus do not present a pure measure of juvenile behavioral change. Indeed, some US studies show that the exclusive reliance on official data may be problematic. According to their results, formal controls may increase the risk of further formal controls, independent of changes in delinquent behavior (called “secondary sanctioning” by Liberman et al., 2014).

The current study uses adolescent data from two panel studies that have been conducted in England and Germany from 2002 onwards: the Peterborough Adolescent and Young Adult Developmental Study (PADS+) and the study Crime in the modern City (CrimoC), carried out in Duisburg. The goal of this analysis is to investigate whether the differential impact on official and self-reported data as found in a few US studies can also be seen in European countries. To conduct this investigation, the current study explores both the impact of formal control interventions on (i) subsequent delinquent behavior, and (ii) on the risk of subsequent formal controls in a quasi-experimental design (propensity score matching).

After discussing the theoretical framework and reporting on the data as well as on the analytical method, the impact of formal control interventions during adolescence will be analyzed.

Theoretical Framework and Previous Research

In this study, we will investigate whether a criminal justice intervention is associated with changes in young people’s future offending. We will also explore whether a criminal justice intervention amplifies the risk for a future criminal justice intervention—independent of changes in delinquent behavior (Fig. 1).

Fig. 1
figure 1

Potential impact of criminal justice interventions on future offending and future interventions

There are two major theories of why criminal justice interventions may affect people’s future offending: deterrence theory and labeling theory. In the literature studying the association between criminal justice interventions and future offending, it is common to assume (and sometimes conclude) that increases in future offending are indicative of a labeling process and that decreases in future offending are indicative of a deterrence process.Footnote 3 However, establishing whether young people’s future offending is amplified or reduced (or unaffected) by a criminal justice intervention does not in itself answer the question why this happens: “neither increases nor decreases in levels of delinquent involvement following the imposition of sanctions provides unequivocal evidence for either the labeling or deterrence paradigms” (Thomas & Bishop, 1984, p. 1229).Footnote 4 However, one could argue that if there are no strong changes (increases or decreases) in future offending, there is no evidence of (no room for) strong unidirectional deterrence or labeling influences.

In line with this, a review of research shows that the most frequent finding is the absence of a statistically significant association between a criminal justice intervention and future offending, although it is somewhat more common with a finding of increased rather than decreased offending among the results that are statistically significant (Barrick, 2014; Kleck & Sever, 2017).Footnote 5 A result that does not lend much support to the existence of a strong universal unidirectional labeling or deterrent effect (especially since statistical significance in these studies typically does not equal strong effects; ibid.; Huizinga & Henry, 2008; Pratt & Turanovic, 2018).

Different from the impact on offending behavior is an institutional effect: formal control interventions appear to increase subsequent control interventions independent of the level of delinquent behavior (Fig. 1, outcome b). This impact of formal control interventions on further interventions has—although not often investigated—been found to be quite strong.

More precisely, having been arrested (compared to not arrested) tripled the risk of a re-arrest in adolescence or adulthood, and this effect was remarkably stronger than the impact on subsequent self-reported delinquency (Beardslee et al., 2019; Klein, 1986; Liberman et al., 2014; Lopes et al., 2012). Surprisingly, this effect was by and large only slightly stronger in case of a more formal intervention (e.g., court petition) compared to a less intervening informal case processing (e.g., police diversion; Beardslee et al., 2019; Klein, 1986). That this is an institutional effect and not an individual reaction to the application of a formal label (secondary deviance) has already been noticed by Klein (1986): “[L]abelers are somehow responding to their own prior decisions” (p. 63). Liberman et al., (2014), explicitly deliberating on this phenomenon, call it more precisely a “secondary sanctioning” effect which should be reflected in its own right, differently from “secondary deviance” effects as outlined in the early labeling literature (see Lemert, 1951; Becker, 1963): “The effects of secondary deviance and secondary sanctioning are essentially independent” (p. 363).

Hypotheses

Following the current state of research, first, the findings on the overall not strong impact of formal controls on subsequent delinquent behavior are quite mixed (see Barrick, 2014; Huizinga & Henry, 2008; Motz et al., 2020). There is somewhat more support for delinquency-promoting rather than delinquency-preventing effects, while there are also many insignificant findings. Second, regarding an institutional impact, formal controls may increase the risk of subsequent controls (see Liberman et al., 2014).

Our study explores whether the finding that formal controls have a different impact on official re-contact and on subsequent self-reported delinquency can also be observed in European countries. So far, only studies with US data analyzed both official and self-reported outcome data simultaneously (Beardslee et al., 2019; Klein, 1986; Liberman et al., 2014). Their results indicate that the institutional effect of formal controls is (much) larger than the effect on (self-reported) delinquent behavior. These studies should serve as a warning not to use official police or court data as proxies of delinquent behavior, as was done in most previous sanctioning research (see Barrick, 2014; Kleck & Sever, 2017). Their findings imply that formal controls seem to trigger processes that change the risk of re-contact with the criminal justice system beyond changes in delinquent behavior. However, it is unclear whether these processes also operate in other—less punitive—jurisdictions than the USA. To explore whether this might be the case, the current study is, to our knowledge, the first to analyze behavioral and institutional effects simultaneously in a non-US setting. It does so by testing the following hypotheses using data from two European countries:

Hypothesis 1: Formal controls are more likely to increase rather than decrease later delinquency.

Hypothesis 2: Formal controls increase the risk of subsequent formal controls.

Formal Control Effects in Peterborough and Duisburg

Samples

Our analyses are based on data from the Peterborough Adolescent and Young Adult Development Study (PADS+; Wikström et al., 2012, 2023) and the Crime in the modern City study (CrimoC; Boers et al., 2010). Both are panel studies that started data collection with 13-year-old school students in Peterborough and Duisburg. Participants were asked to complete standardized questionnaires. In addition, researchers collected the participants’ police and court records.

The target population of PADS+ covered all 11-year-old school students who lived in Peterborough and entered year 7 in 2002. After sampling randomly, 710 juveniles (approximately one-third of the population) finally participated in the first wave in 2004. In the follow-up waves—that were conducted annually until age 17 and in 2- and 3-year intervals thereafter—PADS+ achieved retention rates of more than 95% (707 in wave 2, 703 in waves 3 and 4, and 693 in wave 5). Police National Computer (PNC) records were collected for 700 students.Footnote 6

In CrimoC, researchers tried to survey all 7th-graders in Duisburg in 2002. Out of 56 schools, 40 (71%) agreed to participate, resulting in 3411 completed questionnaires at wave one (approximately two-thirds of all 7th-graders). The follow-up waves were conducted annually until age 20 and then biennially until age 30. The difference in design resulted in somewhat more unit-non-responses in CrimoC compared to PADS+, although participation was also high (3392 in wave 2, 3339 in wave 3, 3405 in wave 4, and 4548 in wave 5).Footnote 7 Official records were available for 2964 respondents (87%).Footnote 8

To establish proper time order for causal inference, three time periods were defined (see Liberman et al., 2014; Wiley & Esbensen, 2016): pretreatment (T1; covariates), treatment (T2, i.e., official contact), and post-treatment (T3; outcomes: self-reported delinquency and official contact). Table 1 shows how the PADS+ and CrimoC data were aligned with these three periods.

Table 1 Time order

In order to be included in the final analyses, participants from both studies had to meet two conditions: (1) participation in waves 3, 4, and 5, as well as (2) access to their official records. All in all, 690 juveniles in PADS+ (97% of 710), and 2117 in CrimoC (62% of 3405)Footnote 9 matched these criteria. In CrimoC, the resulting sample consists of somewhat less “high-risk youth” than the original sample.Footnote 10

Measures

Our measurement descriptions follow the division into the three (quasi-)experimental time-periods: treatment, outcomes, and covariates (see Table 2 and Supp. material S2a, S2b for descriptive statistics).

Table 2 Descriptive statistics

The treatment variable is official control, a binary variable distinguishing between those with “no-official contact” (= 0) and those with “official contact” (= 1). In PADS+ , 37 of 690 (5.4%) and in CrimoC 88 of 2117 (4.2%) juveniles had been officially registered for an offense within T2. This low number of juveniles with a system contact is in line with previous literature on the risk of police contact (Kaiser et al., 2022a; Lochner, 2007; Wikström et al., 2012). In both samples, official intervention was generally not very intensive. Usually, juveniles were diverted out of the system or received some form of educational measures (see Table 3).

Table 3 Reactions of the juvenile justice system in PADS+ (English sample), CrimoC (German sample)

Outcome variables are self-reported delinquency (SRD) indices and official contact measures (PADS+: PNC; CrimoC: BZR, ER). The pool of SRD indicators consists of nine (PADS+) or 13 (CrimoC) offenses, respectively, committed in the last year (PADS+) or since the start of the last year (CrimoC). Although the number of offenses varies between PADS+ and CrimoC, they cover the same categories of delinquent behavior. On the one hand, SRD indicators were used to calculate prevalence rates of general, violent, and property offenses as well as vandalism.Footnote 11 On the other hand, in order to measure offending intensity, versatility scores were computed (with a maximum of 9 or 13 different offenses in PADS+ or CrimoC, respectively).Footnote 12 In addition, official control (0 = no contact; 1 = contact) within T3 was also considered as an outcome variable in order to analyze effects of “secondary sanctioning” (Liberman et al., 2014).

Covariates

For each study site, the selection of more than 50 covariates was guided by theoretical considerations and empirical evidence. Consequently, they cover a wide range of variables known to be related to offending or an official contact: deviant and delinquent behavior, previous formal controls, individual characteristics, peer, family and school bonding, parental education, neighborhood, and demographics. Including multiple indicators is regarded as a promising strategy to tackle selection bias threats effectively (Steiner et al., 2010). SRD and official control measures in T1 are also included as covariates because matching on them assures that the treatment and the control group are balanced on the focal variables of the current study at baseline.Footnote 13

Analytical Procedure

Methodologically, the crucial point in analyzing formal control interventions is to avoid selection bias: to make sure that post-intervention differences between an intervention and a control group are based on the intervention only, both groups should not differ on other characteristics (so called covariates), following ideally the ceteris paribus-rule. This can best be achieved by an experimental research design based on a random selection of both groups. However, for legal reasons, police, prosecutors, or judges are not allowed to decide randomly whether to intervene or not to intervene in delinquent behavior. Therefore, formal control interventions can typically only be investigated within a quasi-experimental setting. Here, one tries to minimize selection bias by controlling statistically for confounding covariates (see Morgan & Winship, 2015; Shadish et al., 2002). It was common practice to rely on multiple regression analysis to address threats of selection bias (see Nagin et al., 2009). After it turned out, however, that multiple regression is not efficient enough in controlling for confounding effects of covariates (Smith & Paternoster, 1990), propensity score matching (PSM) has been applied as a more appropriate tool of accounting for selection effects (McAra & McVie, 2007; Morris & Piquero, 2013; Ward et al., 2014; Wiley & Esbensen, 2016; Wiley et al., 2013).

To explore how a contact with the juvenile justice system affects the risk of reoffending and further official contact, we use PSM to estimate the average treatment effect on the treated (ATT) as our causal estimate of interest. Derived from the potential outcome model (see Morgan & Winship, 2015; Rubin, 1974), the ATT is computed in the following way:

$$\mathrm{ATT}=\mathrm{E}[\updelta \left|\:\mathrm{Tr}=1\right.]=\mathrm{E}[{\mathrm{Y}}_{\mathrm{i}}^{1}-{\mathrm{Y}}_{\mathrm{i}}^{0} |\mathrm{Tr}=1]$$

The ATT refers to officially treated individuals only (Tr = 1). It is defined as the average (E[])Footnote 14 difference (Yi1 − Yi0 = δ) between their observed reoffending (Yi1) and “their” hypothetical reoffending, i.e., under the assumption that they would not have been treated (Yi0). In reality, a treated individual experienced only the treatment condition (official contact) and not the control condition (no contact). Hence, only one (Yi1) of the two potential outcomes (Yi1, Yi0) can be observed. Consequently, causal effects cannot be computed from the observed values of the treated individuals alone. This missingness of the counterfactual outcome value (Yi0 as the value not realized in reality) is termed the “fundamental problem of causal inference” (Holland, 1986).

To overcome this problem and estimate ATTs, we applied PSM. Matching (including weighting) procedures generally mimic a randomized experiment by balancing the treatment and control group on an array of covariates selected for matching (Morgan & Winship, 2015; Stuart, 2010). They do so by finding and matching control units that are equal (exact matching) or at least most similar to treated units on all selected pretreatment measures. Individuals from the control group that are too dissimilar to the treated individuals are excluded from analyses. Included individuals from the control group are finally used to infer the counterfactual outcome value, allowing for an ATT estimation. Unlike a randomized experiment, matching, however, does not automatically balance unobserved characteristics of treated and untreated individuals. Furthermore, classical matching procedures were based on exact matchings, i.e., finding individuals for the control and treatment group with exactly the same values. However, the higher the number of covariates the less likely it is to meet such a requirement (“curse of dimensionality,” Apel & Sweeten, 2010, p. 544). To face this problem, Rosenbaum and Rubin (1983) introduced the so-called propensity score. It refers to the probability that an individual received the treatment. For this study, the propensity score describes the probability that a juvenile was officially recorded for an offense in T2. A great advantage of this single score is that matching on it (i.e., finding individuals with most similar propensity scores among treated and untreated respondents) may be sufficient to balance the treatment and control group on all pretreatment covariates (Kainz et al., 2017).

Our matching procedure followed four steps (Stuart, 2010): First, we estimated propensity scores for each PADS+ and CrimoC sample member with the help of three different estimation procedures: Bayesian logistic regression (BLR; McElreath, 2016), Bayesian Additive Regression Trees (BART; Chipman et al., 2010), and the covariate balancing propensity score (CBPS; Imai & Ratkovic, 2014).Footnote 15 Second, these three propensity scores were applied in 12 different matching (or weighting) algorithms to find the combination that leads to the best distributional balance of all covariates between the treatment and control groups.Footnote 16 The application of different combinations of propensity score and matching algorithms is recommended to ensure that selection threats induced by pretreatment differences in observed covariates are minimized (e.g., Kainz et al., 2017; Morgan & Winship, 2015). Third, we selected the best PSM procedure for each sample by assessing the covariate balance achieved by the different method combinations using recommended balance statistics (Kainz et al., 2017; see section Covariate Balance Assessment). Fourth, we applied regression models (R’s Zelig package, Imai et al., 2008) to the best-matched samples to estimate ATTs and simulate their uncertainty. While binary SRD prevalence and official contact outcomes were modeled by logistic regression, SRD versatility indices were predicted by Poisson models.Footnote 17

Because the investigated samples suffered from item-non-response, all analytical steps were applied to multiple imputed data sets. Multiple imputation embraces the estimation uncertainty emerging due to missing information in the data set (van Buuren, 2018). It generates multiple data sets by making multiple predictions for the missing values using observed information from other variables. As recommended by Penning de Vries and Groenwold (2017), we conducted matching, the generation of weights, and also the outcome analyses for each imputed data set. The imputed information was finally combined by merging the ATT simulations of all imputed data sets together.Footnote 18

In addition, we also conducted robustness analyses to check how sensitive the ATTs were in relation to different missing data, propensity score estimation, matching, and outcome modeling procedures (Young & Holsteen, 2016). We restricted our sensitivity checks to those propensity score and matching procedure combinations that were relatively successful in establishing covariate balance between treated and untreated individuals.

Results

In this section, we first report how the best-working matching methods balanced the covariate distributions before presenting the ATT estimates and robustness checks.

Covariate Balance Assessment

In the following, we assess the covariate balance of the best-balancing matching procedures using standardized bias (SB) and variance ratio (VR) statistics (Kainz et al., 2017). SB is the difference in covariate means between the treated and untreated group divided by the standard deviation of the treated group. VRs, in contrast, inform about the variance differences in continuous covariates across the treated and untreated groups. SB thresholds of larger than 0.1 and VRs larger than 2 or smaller than 0.5 indicate imbalance (Harder et al., 2010; Kainz et al., 2017).Footnote 19

PADS+ 

In PADS+ , treated individuals differed from untreated ones on an array of pretreatment characteristics. The majority of covariates (44 of 57) was imbalanced before matching, showing SB statistics larger than 0.1; for 34 covariates, the bias was larger than 0.2 (see Table 4 for a selection of focal covariates).Footnote 20 Across all covariates, the average absolute SB difference was 0.18 (median: 0.12) and the maximum bias was 1.02. In addition, the average of the VRs of the 19 continuous covariates was 1.75 (median: 1.49). Only 3 of the 20 continuous covariates exceeded the VR threshold of 2, including the SRD versatility index (2.98).

Table 4 Covariate balance statistics for PADS+ 

For PADS+ , optimal matchingFootnote 21 with a ratio of 1:3 without replacement on the linear propensity score estimated via BART resulted in the best covariate balance. This procedure led to adjusted groups of 37 treated and 111 control cases. For this adjusted sample, mean differences and VRs of the covariates declined strongly. The mean and median of the SB statistics decreased to 0.05, and only 16 covariates exceeded the threshold of 0.1 (only one variable had a bias larger than 0.2). VRs declined to 1.47 on average (median: 1.28) and three covariates had a ratio larger than 2. According to the most stringent thresholds, remaining imbalances indicate that in the adjusted sample treated individuals showed still a slightly different delinquency pattern, were slightly more involved with the legal system (antisocial behavior order, ASBO; youth offending teams, YOT), perceived the risk of consequences when caught somewhat lower, reported less deviant peers, a less supporting family environment, and more informal social control in their neighborhood (see Supp. material S3). Overall, however, the matching procedure decreased the likelihood that differences in pretreatment characteristics confound the ATT estimates.

CrimoC

In CrimoC’s unadjusted sample, covariate differences between treated and untreated individuals were much less pronounced, though still remarkable. The mean of the SB statistics across covariates was already quite low (0.07; median: 0.04), and only 28 of the 57 covariates had a bias greater than 0.1 (only 8 covariates exceeded a threshold of 0.2); the maximum standardized mean difference was 0.35 (see Table 5 for a selection of focal covariates).Footnote 22 VRs were with few exceptions within an acceptable threshold.

Table 5 Covariate balance statistics for CrimoC

Weighting by the oddsFootnote 23 on the covariate balancing propensity score (CBPS) resulted in the best-balanced distribution of covariates across the treatment and control group. After weighting, the CrimoC sample included an adjusted number of 205.8 control and 88 treated units. For this adjusted sample, imbalances in covariates diminished completely. SB statistics of all variables were below 0.1. Mean and median bias was essentially zero (< 0.01). Additionally, VRs of the 23 continuous variables were also all below a value of 2 and their mean (1.19; median: 1.10) was pleasingly low, too. For CrimoC, we can actually assume that it is very likely that our weighting procedure is capable of preventing potential selection bias due to observed covariates.

Average Treatment Effects on the Treated

ATT estimates for the Peterborough and Duisburg samples tell a quite similar story. Most estimates are statistically insignificant, suggesting that a contact with the juvenile justice system had at best weak effects on the prevalence and versatility of reoffending (for prevalence ATT estimates, see black points and lines and gray shaded area in Fig. 2). According to the ATT point estimates, the prevalence of reoffending typically would have changed by less than 5 percentage points (pp.) had offenders with a system contact instead had no contact (see section Analytical Procedure for a definition of the ATT).

Fig. 2
figure 2

Average treatment effects on the treated (ATTs), prevalence rates, Peterborough (PADS+), and Duisburg (CrimoC)

For example, among PADS+ juveniles, an official contact decreased the prevalence of committing vandalism in T3 on average about 2 pp. (ATT =  − 1.8 pp. [89%-CIFootnote 24: − 9.3 pp. 4.8 pp.]), whereas the reduction was estimated to be about 3 pp. (ATT =  − 2.9 pp. [− 11.6 pp. 3.4 pp.]) among the Duisburg youths. The probability of property offending decreased slightly but insignificantly in the German sample (ATT =  − 6.7 pp. [− 16.9 pp. 1.9 pp.]), whereas the effect of a system contact on property offending was estimated to be close to null in the English sample (ATT = 0.6 pp. [− 6.7 pp. 7.0 pp.]). The likelihood of violent and general offending was somewhat—but again insignificantly—increased due to a system contact in both samples (ATTPADS+.Violence = 4.1 pp. [− 3.7 pp. 11.3 pp.]; ATTPADS+.General = 3.7 pp. [− 4.1 pp. 12.1 pp.]; ATTCrimoC.Violence = 4.9 pp. [− 3.1 pp. 9.4 pp.]; ATTCrimoC.General = 1.7 pp. [− 9.5 pp. 10.9 pp.]). The versatility of offending, finally, was barely affected by an intervention of the juvenile justice system. The insignificant ATT estimates indicate that an official contact had probably negligible or only relatively weak effects on the offending variety of adolescents in Peterborough (ATT = 0.04 [− 0.24 0.28]) and Duisburg (ATT =  − 0.09 [− 0.38 0.08]).

Despite these at best rather weak control effects on subsequent delinquency, the ATT estimates suggest that an official contact increased the prevalence of a renewed system contact substantially in the follow-up year. While in PADS+ , the prevalence of a repeated contact rose by some 23 pp. (ATT = 22.7 pp. [16.4 pp. 27.6 pp.]) due to a prior official contact, the increase was still about 15 pp. in CrimoC (ATT = 15.2 pp. [8.6 pp. 18.9 pp.]).

Sensitivity of ATT Estimates to Modeling Approach

To compute the ATTs, we applied not only the reported methods (that best balanced the covariates) but tried several different method combinations (varying in the imputation, propensity score, matching, and/or regression procedure). Among these combinations, only those were selected for ATT robustness checks that balanced the covariate distributions well. For each outcome and each of these 36 (PADS+) or 60 (CrimoC) “candidate” method combinations, we computed ATT point estimates. The distribution of all point estimates was then plotted in density plots (see dotted lines in Fig. 2). Overall, the density plots suggest that the ATT estimates are relatively robust to changes in the analytical procedure. However, ATT estimates are somewhat more model sensitive in the English than the German sample, probably because of PADS+’s smaller sample size and stronger imbalance before matching. This is especially true for the general and violent offending prevalences as well as for the SRD versatility index. For these three outcome variables, most alternative method combinations produced ATT estimates that indicated somewhat more substantial (and in some cases significant) system contact effects than those reported above.Footnote 25

Discussion and Conclusion

Do criminal justice interventions promote or prevent young offenders’ future offending? And, do criminal justice interventions promote future formal interventions? These are the main questions addressed in this research. Although it is commonly assumed that increases in young people’s offending after criminal justice contacts is evidence of some form of labeling and that decreases in their offending after such contacts is evidence of deterrent effects, the interpretation of these relationships is clearly not as simple as that (see Theoretical Framework and Previous Research).

What is studied here are short-term effects of (previous year) criminal justice interventions (mainly diversion measures like cautions, community work; some convictions) on future (next year) offending and criminal justice interventions controlling for selected key background factors through propensity score matching (including previous frequency of delinquent behavior). Most initial criminal justice contacts are first-time criminal justice interventions. The study does not explore (and therefore does not exclude) whether repeated official criminal justice contacts (or the extent of such contacts) tend to gradually promote or prevent an offender’s future offending. In sum, the study led to three key results:

  1. 1.

    The findings do not support any stronger effect of criminal justice contacts on future (next year) offending and, hence, do not support any consistent unidirectional labeling (amplification) or preventive (deterrent or treatment) effect by criminal justice contacts on the future level of young people’s offending.

  2. 2.

    The findings support an increased likelihood of future police contacts for those who already have had a (past) police contact.

  3. 3.

    The findings are remarkably similar in the studied UK and German cities of Peterborough and Duisburg.

The fact that there is no consistent unidirectional association between a criminal justice contact and future offending (finding 1) does not exclude the possibility that this finding may mask the existence of deterrent, treatment, and labeling effects canceling each other out (i.e., for some people, criminal justice contacts may promote, and for others, reduce their future offending). What the findings indicate though, is that there is no evidence of (or room for) any strong consistent unidirectional impact of at least deterrence or labeling on the participants’ future offending. If there are any effects of criminal justice interventions on future offending among our study populations, they must be differential and, if so, may depend on things such as individual differences in how people react to a specific intervention, for example, based on their personality, their experience of previous criminal justice contacts, or the content of the intervention in itself and its social context. Exploring any potential duality of effects (i.e., the existence of labeling, and deterrent effects), and, if so, what determines which effect appears for whom in what context (see Sherman, 1993; Cullen & Jonson, 2014) should be a priority for future studies into the effects of criminal justice interventions.

Fortunately, two previous studies with the data at hand already provide some insights into these questions. In the first, Kaiser (2022) examined whether official contact affected some mediating factors proposed by deterrence and labeling theory (personal morals, deviant peer associations, risk perceptions). Overall, no (in the German study) or at best weak (in the English study) effects on these (intermediate) factors could be found, indicating that criminal justice interventions may trigger only weak crime-relevant processes and providing little support for stronger labeling and deterrent effects canceling each other out. In the second study, Kaiser et al. (2022b) found that the impact of formal controls differs depending on offenders’ personal morals, suggesting that the effects of criminal justice interventions may indeed be differential (for self-control as moderating factor, see also Schulz, 2014; Thomas et al., 2013).

The fact that a criminal justice contact has no impact on self-reported offending but increases the risk of a future criminal justice contact (finding 2) is highly interesting. This may also be consequential for the interpretation of research findings in this area of study. It is similar to the finding of Liberman et al. who found a “considerably larger effect on arrest than on SRO [self-reported offending]” (2014, p. 363; see also Beardslee et al., 2019). One possible explanation is that those already known to the police are more likely to be apprehended for future crimes because they are on the radar of police (see Beardslee et al., 2019). Liberman et al. call the process that leads to an increased probability of being arrested after having been arrested in the past “secondary sanctioning.” They speculate that this may be due to “increased scrutiny of the individual’s future behavior, by police as well as other actors such as teachers and school staff, as well as from reduced tolerance by police and other actors of an arrestee’s future transgressions” (2014, p. 363). “Being on the radar” of the police could also explain why the “secondary sanctioning” effects in the current study seem to be somewhat larger in the English than in the German sample. In the study period, the English police was mandated to search actively for juvenile offending, while the German police mainly reacted to reported crimes (Bateman, 2015; Eisenberg & Kölbel, 2017; Morgan & Newburn, 2012).

Based on the presumption that control interventions are usually initiated by a specific delinquent behavior, one can conclude from our findings that such secondary control effects are auto-dynamic: The posterior event, the second control intervention, is generated by an essentially same anterior event, the first control intervention. Such an institutional-decision-on-institutional-decision impact is different from a causal institutional-decision-on-individual-behavior-effect as originally stated in labeling theory. The latter one is a causal (and not auto-dynamic) effect because here the posterior event (the individual’s delinquent behavior) is generated by an essentially different (i.e., extrinsic)Footnote 26 anterior event (the control intervention). Overall, it may appear that such an auto-dynamic effect might best be understood in the light of an assumption of self-reference as suggested in systems theory (see Luhmann, 1995): A social control system reproduces itself by referring to its own prior control decisions, filed in the institutional memory of police and court registers.

However, since most juvenile crimes are of a less grave nature, and unlikely to engage investigative resources of the police, an increased detection-by-investigation risk may be a less plausible reason (with the exception of drug-related and traffic crimes, for example, police activity is rarely a main source of detection of crime and identification of offenders). Another possible explanation is that there is some unmeasured qualitative difference in the general seriousness of the crimes committed between those who already have an official contact and other offenders (i.e., those apprehended and processed by the police may generally commit more serious crimes). Our data do not differentiate between the seriousness of the crimes of the same kind. For example, some assaults could involve quite minor harms, while others could involve more severe injuries and, therefore, are taken more seriously by victims and bystanders (witnesses) and the police, increasing the risk of the crime being reported and that identified offenders are being formally processed. Crimes that become known to the police are overwhelmingly reported by the general public, as is the identification of possible suspects.

The fact that the findings are almost identical in the studied English and German cities (finding 3), and that they tally well with other research in Western countries, indicates that the results may reflect a more general phenomenon: criminal justice interventions appear to have some smaller effect on future offending than on future criminal justice contacts (Beardslee et al., 2019; Klein, 1986; Liberman et al., 2014; Lopes et al., 2012).

A limitation of our analysis is that we cannot formally test the cross-national differences in the effects of justice contacts. This is because the measures of self-reported delinquency were not initially developed for comparison and thus differ too much between the English and German samples to construct a joint data set and analyze them within a single analysis (see Kaiser et al., 2018). Consequently, our results are affected by two sources of unobserved heterogeneity: first, differences in the measures of delinquent behavior and, second, differences in the experiences of English and German offenders due to (unmodeled or unobserved) differences in the juvenile justice systems (such as being treated differently by police). Against the background of this heterogeneity, the similarity of results across both samples may be seen as even more remarkable and imply that our findings may be quite robust.

Another shortcoming of our study, as is true for all research that cannot rely on random assignment within an experimental design, is that it may be biased by selection effects. Individual differences (e.g., in criminal propensity) may explain both the official contact and subsequent offending (or re-contact). Not accounting for such confounding factors may bias treatment effect estimates. Applying propensity score matching, we tried to counteract confounding by balancing groups of treated and untreated individuals in terms of previously observed characteristics. Mimicking a randomized experiment, this technique is still incapable of balancing unobserved factors. Furthermore, it only prevents bias of observed confounders if these are successfully balanced across groups with and without official contact. Although our matching methods seemed to be quite successful in this respect, some observed covariates remained imbalanced in the English sample. To prevent bias due to these imbalanced factors, we used the matched samples within lagged dependent variable regressions (see footnote 17), which exploit the panel nature of the data to produce (even) more robust causal estimates (Morgan & Winship, 2015). By applying these panel models on groups that were successfully balanced on many potential confounding (observed) factors and conducting various sensitivity analyses (e.g., using different sets of controls in the regression models), we think that our results are quite immune to selection bias.

It is furthermore important to note that, while official contact at T2 (age 15) was the first criminal justice contact for most of our detected juveniles (English sample: 75.7%; German sample: 81.8%), we do not restrict our study to first-time contacts. Being interested in the average impact of official intervention, contact could be either a novel experience or a repeated encounter with the criminal justice system. Despite this focus on an overall effect, it is reasonable to assume that the impact of criminal justice intervention depends on an individual’s prior history with the formal control system. Liberman et al. (2014), for example, who restricted their analysis to first-time arrests, emphasized that a novel experience should have a larger impact than a repeated formal control experience according to both deterrence and labeling theory (see also Anwar & Loughran, 2011; Bernburg, 2019). Unfortunately, due to the relatively low number of participants with official contact at T2, we do not have the statistical power to properly study whether the effect of formal controls indeed depends on juveniles’ sanctioning history. Some preliminary regression analyses (including a product term of official contacts at T1 and T2 as predictor) did not provide consistent patterns of whether the effects depend on the sanctioning history (see Supp. material S6), which highlights the need for future research with larger samples.

Against these limitations, we would caution against making firm general policy recommendations as a result of our findings. There are no strong directional and clear-cut findings as to potential labeling or deterrent effects from criminal justice interventions on future delinquent behavior. Our results rather suggest that if there are such effects, they may operate in different directions (i.e., both promote and prevent future offending), potentially being dependent on the people involved, their life-circumstances, stages in a criminal career and the kind of intervention and its execution. Regarding the risk of secondary control interventions for already registered offenders, it may be important for law enforcement to consider the possibility that increased interventions for those already under formal control may enlarge structural and personal obstacles for a non-delinquent development compared to offenders of a similar delinquent potential who have not been registered.