Do Sports Programs Prevent Crime and Reduce Reoffending? A Systematic Review and Meta-Analysis on the Effectiveness of Sports Programs

Sports programs are widely implemented as measures of crime prevention. In contrast to their popularity, there is little systematic knowledge about their effectiveness. This systematic review and meta-analysis have been carried out to fill this gap. In a systematic review, we gathered data on evaluated prevention programs specifically designed to prevent crime and delinquency. We then conducted a meta-analytic integration with studies using at least roughly equivalent control groups for the program evaluation. To retrieve relevant literature, we conducted a comprehensive international literature search until June 2021 drawing on scientific databases. We also applied snow-balling searches and contacted practitioners in the field. Studies were eligible if they evaluated sports programs designed to prevent delinquency on primary, secondary, and/or tertiary level. We focused on crime-related outcomes and potentially underlying psycho-social factors. We made no restrictions regarding characteristics of the participants or other aspects such as duration of the program. 24 studies were eligible for our systematic review, from which only thirteen were included into our meta-analytic integration. We found a moderate effect of participation in sports programs on crime-related outcomes (d = 0.36, p < .001). Participants showed a significant decrease in outcomes such as aggressiveness or anti-social behavior. We also analyzed psychological outcomes such as self-esteem or mental well-being, which also significantly improved when participating in sports programs (d = 0.87, p < ..05). Sports programs seem to be an effective measure of crime prevention. However, future research needs more sound evaluation designs and moderator analyses to better understand the functioning and improve the implementation of sports programs.


Introduction
"Sport is (…) an important enabler of sustainable development. We recognize the growing contribution of sport to the realization of development and peace in its promotion of tolerance and respect (…)" (UN General Assembly resolution 2015, p. 10). By promoting sports programs as an agent of change, the 2030 Agenda for Sustainable Development adopted by the United Nations continues what has been advertised in policies since the 1960s: Sports are believed to strengthen positive behavior and, among other positive outcomes, to prevent criminal activity (Smith and Waddington 2004). Therefore, sports programs are frequently implemented to reduce and prevent crime, delinquency, and violent behavior (Hartmann 2001;Public Safety Canada 2017). They enjoy high popularity in Western societies reflected for example in policies such as the Enlarged Partial Agreement on Sport in Europe (Ekholm 2013a; Meek 2018) but sports programs have also spread globally (Neeten 2020;Public Safety Canada 2017). In primary prevention aimed at the general population sports are often used to promote positive development in children and adolescents (Fraser-Thomas et al. 2005;Lösel 2012). They are also implemented in secondary prevention for at-risk persons and in tertiary prevention for people who have already committed crimes (Public Safety Canada 2017). Within the latter context sports programs are implemented in prisons as well as in the community (Ekholm 2013b;Meek 2014;Nichols 2007). Most commonly, sports programs are implemented for promoting prosocial behavior and a generally desirable development in young people (Ekholm and Holmlid 2020) as well as in correctional settings (Neeten 2020). Sports programs as a measure against delinquency and reoffending are often taken at face value, but theoretically plausible hypotheses for their effectiveness have also been proposed.

Theoretical Framework
A number of hypotheses at multiple levels suggest that sports can contribute to the reduction of criminal behavior and strengthen resiliency (Spruit et al. 2016a). Multiple explanations are common in criminology where many risk and protective factors affect criminal development Bender 2003, 2017). However, for both risk and protective factors there are dose-response relationships of accumulated influences that cannot simply be subsumed under one single theoretical approach (Lösel and Farrington 2012).
Sports can be directly protective but may also influence other variables and processes which have a protective effect (Faulkner et al. 2007; Morris et al. 2003). Complex interactions have been found between aggression and biological factors (Carré et al. 2017;Glenn and Raine 2014;Çetin et al. 2017). Hormones and neurotransmitters such as low levels of serotonin or elevated levels of testosterone are associated with higher levels of aggression and criminal behavior (Platje et al. 2015;van der Gronde et al. 2014). As regular physical activity has been found to positively influence the production of hormones, which in turn decrease and inhibit violent behavior, it can thus be assumed that sports participation may reduce aggressiveness and violence (Çetin et al. 2017;Gligoroska and Manchevska 2012;van der Gronde et al. 2014).
In the tradition of programs designed to divert offenders from crime (Osgood and Weichselbaum 1984;Wong et al. 2016), common theories on why sports protect against crime assume that participation is a diversion from crime. During sports activities, individuals are physically diverted from delinquent behavior as they are otherwise engaged (Ekholm 2013b;Kelly 2013;McMahon and Belur 2013). This corresponds with routine activity hypotheses to explain crime and desistance (Cohen and Felson 1979;Spruit et al. 2016a): When people engage strongly in sports, they have simply less time to carry out offenses, particularly, when sports take place at times and in contexts that trigger delinquency. Ambrose and Rosky (2013) also conclude that in prison settings engaged individuals have less time to break rules. Physical diversion from crime especially during high crime hours is not the only diverting mechanism that may prevent crime. Sports provide fun and excitement that can satisfy sensation seeking tendencies in juvenile delinquents and, therefore, divert from seeking these experiences in criminal activities (Ekholm 2013b;Segrave 1983). Apart from diversionary explanations, other aspects are also considered to explain the use of sports as a mean for prevention.
Sports are believed to be related to psycho-social factors which may help to abstain or desist from crime (Lösel and Bender 2003;Nichols 2007). There are several theories regarding processes within the individual that are initiated by sports participation. Sports activities with others may lead to opportunities of social learning (e.g., Bandura 1973; Gubbels et al. 2016). Especially coaches and non-deviant companions in sports could act as role models for at-risk youths (Lösel and Bender 2003) by the transmission of prosocial norms and negative attitudes towards aggression (Nichols 2007;Sherry 2010). As social learning is facilitated through close bonds among the individuals (Akers 1998), especially positive and close interactions should promote learning effects. Sports should allow participants to experience reinforcement and self-efficacy from non-delinquent activities and, therefore, contribute to the development of a positive identity (Nichols 2007;Reverdito et al. 2017), which was found to be important for processes of change in desistance from crime (Paternoster and Bushway 2009;Sampson and Laub 2005). These hypotheses correspond with the assumption that 'sport builds character' (e.g., Begg et al. 1996;Spruit et al. 2016a). When participants experience positive values and prosocial norms in the sports setting, they are believed to have the opportunity to adopt these norms and develop alternatives to violent coping strategies (Davis and Menard 2013;Endresen and Olweus 2005;Nichols 2007).
Sports participation is also frequently linked to psychological correlates such as mental health which is considered to be relevant for promoting desistance (Morris et al. 2003). As sports interventions are thought to improve overall mental well-being and are linked to psychological aspects (D'Andrea et al. 2013;Nichols 2007), sports participation might promote desistance via strengthening psychological well-being. Furthermore, sports settings are expected to offer opportunities for positive youth development (Holt 2007) that contribute to self-esteem by offering individuals success that is often not achieved in educational or work-related contexts (Nichols 2007). Nichols (2007) also suggests that participants in sports start to experience themselves as agents of their own actions giving them a sense of control over their lives. Having an internal locus of control rather than perceiving developments and incidents as something that is not within someone's control, has been detected to protect against engagement in crime and delinquency (Ahlin 2014;Zemel et al. 2021).
Another wide-spread theory that reflects some of the above-mentioned explanations is Hirschi's (1969) theory of social bonds and informal social control (Spruit et al. 2016a;Veliz and Shakib 2012). Veliz and Shakib (2012) argue based on Hirschi's theory (1969) that sports can help to form attachment to significant others, e.g., with coaches, associated staff members, and teammates. These social bonds would be endangered when an individual gets involved in crime and, therefore, protect from becoming delinquent. Belief, as another dimension of Hirschi's model, refers to the positive, socially accepted norms a person upholds. If these are conveyed through sports participation, criminal behavior could be deterred. This ties back to theories of social learning as described by Bandura (1973). Gaining self-esteem through sports fits to Hirschi's dimension of commitment, because engaging in criminal activity could lead to a loss of important opportunities that provide the individual with a positive self-identity as well as self-esteem (Veliz and Shakib 2012). Similar to the above-mentioned concepts of routine activity and diversion, involvement, as a fourth aspect of Hirschi's theory, reduces the time to get engaged in criminal activities because time is being spent in the structured setting of sports programs (Veliz and Shakib 2012). Because sports offer change in social environments providing individuals, for example, with role models, they are often perceived as a hook for engagement, reaching out to at-risk youths (Ekholm 2013b;Haudenhuyse et al. 2013).
Due to these expected benefits, specific sports programs focusing on positive youth development and the prevention of crime are frequently implemented (Public Safety Canada 2017). They are especially used in the context of programs against juvenile delinquency as they are an important part of young peoples' lives and attract many of them (Haudenhuyse et al. 2013). Sports programs are also often cheap and easy to implement and therefore widely accessible (Hartmann, and Depro 2006). Furthermore, studies on criminal careers have shown that there is an increase in delinquent and antisocial behavior during adolescence, underlining the importance of prevention during this life period (Moffitt 1993;Spruit et al. 2016a). Sports programs are also widely offered in prisons, especially because they are a right granted to the inmates and, therefore, an available tool for reducing the risk of reoffending after release (Neeten 2020). Within this setting, efforts are necessary to ty into the time after release, for example by connecting participants in the sports programs with organizations after release (Digennaro 2010;Meek 2014). That way, positive relationships can be upheld and commitment to desirable activities fostered (Meek 2014). As a result, mechanisms of social control can be maintained even after release.

Empirical Findings
While there is a broad theoretical background explaining why sports participation may reduce delinquency and prevent reoffending, the empirical findings on the relations between sports activities and offending are somewhat ambiguous (Davis and Menard 2013). For example, engaging in routine activities such as participating in sports or youth clubs was associated with less drug use but with increased assault (Miller 2013). Begg et al. (1996) showed that youths with the highest involvement in sports at the age of 15 were more likely to exhibit delinquent behavior three years later than those who were not involved in sports at all. Endresen and Olweus (2005) examined youths' participating in power sports such as boxing or martial arts and compared them to peers who did not participate, detecting more antisocial behavior among the former. Gubbels et al. (2016), however, did not find a relationship between martial arts and antisocial behavior in a meta-analysis of twelve studies. Martial arts athletes showed higher levels of externalizing behavior only when compared to athletes in individual sports but not compared to non-athletes or team sports participants. Intensity of training was a moderator as well, with more training per week being associated with higher levels of externalizing behavior (Gubbels et al. 2016). A more detailed review on the characteristics of sports was presented by Davis and Menard (2013) who noted that noncontact sports such as tennis correlated with long-term general offending and drug abuse but not with violent crime. In contrast, contact sports such as football (US: soccer) were not associated with later delinquent behavior and could even reduce offending. In a meta-analysis of correlational studies, Spruit et al. (2016a) found no general association between sports participation and juvenile delinquency. Besides gender differences, the setting significantly moderated the relationship. Participants in out-of-school sports activities showed more delinquency than those in school settings. Furthermore, individual sports participants reported more delinquent behavior than non-athletes. Overall, Spruit et al. (2016a) concluded that sports participation does not have a generic impact on delinquency, but rather has positive as well as negative effects that countervail each other.
Regarding psychological correlates of crime, research has identified certain benefits of sports participation. A few reviews discovered desirable effects on emotional problems, social skills as well as stress in adolescents and adults both in and outside of the correctional setting (Eime et al 2013;Spruit et al. 2016b;Woods et al. 2017). Research on self-esteem, however, has generated ambiguous evidence where some studies show positive and others negative effects (Ekeland et al. 2005;Sallis 2000). Findings by Bowker et al. (2003), for example, emphasize the importance of sport types, since non-competitive sports improved self-esteem in contrast to competitive sports. The drawback in these studies often is the lack of randomization regarding the sample as well as an overrepresentation of individuals that participate voluntarily in sports activities. Athletes often join to experience fun and excitement and are very engaged in the programs (Theokas et al. 2007). This self-selection bias might therefore influence the relationship between sports participation and delinquency in either way (Spruit et al. 2016a) and athletes might differ fundamentally from non-athletes. Spruit et al. (2016c) found, for example, that athletes reported fewer delinquent friends than non-athletes. Apart from selfselection, other groups such as marginalized individuals as well as girls are often less involved in such programs (Bailey 2007). A similar problem occurs in the correctional setting. Pantelis (2014) notes that older prisoners are less interested in participation and need more incentives to engage.

Objectives of the Present Study
While many plausible theories are proposed why sports should prevent crime, the effects of sports on delinquency and psychological variables are not as clear cut. Further research is necessary to determine what works best under which circumstances and which theoretical approaches explain these effects. However, there is not yet much well controlled research that addressed these issues as well as the impact of sports programs specifically developed and implemented to prevent criminal behavior over the life-course. And if there is, studies often do not apply rigorous evaluation designs (e.g., McMahon and Belur 2013;Public Safety Canada 2017;Woods et al. 2017). This scarcity of sound program evaluations is insofar surprising as sports programs are widely used in community and institutional polices to prevent delinquency. Against this background, we carried out a systematic review on available research. Additionally, we conducted a meta-analysis on the potentially causal effects of sports programs on criminal behavior and reoffending based on evaluations applying at least roughly equivalent control group designs. Our focus lies on specifically designed sports programs to prevent crime in contrast to mere sports participation in local clubs or school settings.

Criteria of Eligibility
Our criteria of inclusion are set very broadly because we expected a low number of sound studies from past research (e.g., McMahon and Belur 2013;Public Safety Canada 2017). Primary studies had to fulfil the following criteria to be included in this meta-analysis: 1. Type of intervention We included studies evaluating specifically designed sports programs but not sports participation in general. Programs on all three levels of crime prevention were included. Primary prevention of offending addressed participants that were not explicitly mentioned to be at risk, while secondary prevention aimed at samples identified at being at risk. Tertiary preventive programs included individuals who had committed crime, often identified through the setting such as prisons or other forms of confinement. We excluded studies on programs tackling truancy and those focusing on outdoor and adventure activities or martial arts since these interventions have already been addressed in meta-analyses (Gubbels et al. 2016;Maynard et al. 2012;Vertongen and Theeboom 2010;Wilson and Lipsey 2000). However, keeping in mind the expected scarcity of studies, we considered studies eligible when martial arts were identified as part of a program in combination with other sports. 2. Study design Expecting a small body of evaluations, we made no initial constrictions regarding the design and included all quantitative empirical research on our topic for the systematic review. However, we rated the quality of designs on an adapted version of the Scientific Methods Scale (SMS; Farrington et al. 2002;Schmucker and Lösel 2015) and only integrated studies rated Level 3 and above into the meta-analysis to decrease the risk of bias. Additionally, we assessed the risk of bias in greater detail using the 'risk of bias' tool (RoB 2) developed by the Cochrane Collaboration (Sterne et al. 2019). Generally, control group evaluations diminish alternative explanations of effects and offer more robust conclusions (Farrington et al. 2002;Ross et al. 2011). But even within control group designs there are risks of bias. To reflect these challenges, our adaption of the SMS is more differentiated on the upper levels than the original scale. Studies at Level 1 apply no control group, Level 2 studies only non-equivalent groups such as national figures. Studies were rated Level 3 when efforts were made to obtain roughly equivalent control groups (e.g., by statistically controlling for potential differences). Level 4 was rated for studies applying matching procedures to establish equivalence between the control and intervention group (Shadish et al. 2002) as good as possible in the absence of randomization. While both Level 3 and Level 4 describe experimental designs, Level 4 is considered to be of higher quality due to sound matching procedures increasing comparability between studies and decreasing potential biases. Finally, Level 5 studies had to use randomized control trials (RCT; see Schmucker and Lösel 2015). 3. Sample Programs could target people at risk of reoffending, e.g., prisoners and released offenders, as well as people at risk of offending or the general population in a primary preventive way. There were no restrictions regarding age, gender, and country of origin. 4. Outcome measures We considered self-reported or official measures for outcomes such as delinquency, aggressive, criminal or violent behavior or attitudes as well as offending, and reoffending. Based on the assumption that sports influence psycho-social factors such as mental well-being or self-esteem that may promote desistance from crime, we included studies measuring these outcomes as well if they were considered to be relevant for criminal behavior in the located papers. 5. Time frame We made no restrictions regarding the duration of the program, the time until follow-ups, or the year the study was published.

Literature Search
We searched relevant scientific databases (Office of Justice Program, PsycINFO, Cochrane Library, Campbell Collaboration, ScienceDirect, Google Scholar, National Criminal Justice Reference Service, iDiscover, Scopus, ProQuest: Sociological Abstracts, Journal of Sports Therapy, Sport und Recherche im Fokus, Web of Science™) up until June 2021.
Search strings included terms such as sport program*, sport-based interventions, physical activity sport *, deterrence, evalu*, effect*, impact*, effic*, prevent*, reoffend*, offend*, recidivism, crim*, antisocial behaviour, and delinquen*. These search terms were combined in different ways to retrieve both precise results close to our topic of interest but also generate a high recall to avoid missing potentially relevant articles. Schmucker and Lösel (2011) suggest narrowing down search results. For example, in one case we used the search string "sport program* AND evalu* AND prevent* AND delinquen*" and added "AND quant*" in a second step as the first term led to an unmanageable number of studies. As databases are limited sources to retrieve relevant studies (Schmucker and Lösel 2011), we also applied a snow-balling method searching the references of previous studies and contacted authors and project managers. The search process depicted in Fig. 1 resulted in 24 evaluations of 23 different projects that fit our criteria of inclusion for the systematic review. Many studies were excluded on the first step as they did not look at sports programs as preventive measures against crime and delinquency. In many cases, sports participation rather than sports programs was presented. Later, studies often lacked adequate quantitative evaluation. Therefore, only fourteen studies were rated Level 3 and above on the SMS and therefore eligible for the metaanalytic integration, although from one study not enough date for effect size calculation could be retrieved.

Study Coding
We sought data following a coding scheme. Two researchers rated the studies independently to avoid potential coding errors and discussed diverging results to reach consensus. Categories included program, sample, as well as study characteristics such as the type of prevention, the goals, and the setting as well as the country in which the program took place. We coded the type of sports, sample size, and the duration of the program. The study design and internal validity was rated according to our slight adaption to the SMS (Farrington et al. 2002;Schmucker and Lösel 2015) and formed the basis to judge which studies would be included in our meta-analysis. To receive an overview where the strength and weaknesses of studies included in the meta-analysis lay, we also applied the risk of bias tool (RoB 2). We categorized the outcomes into crime-related and psychological outcomes. The first included all outcomes that could clearly be considered as criminal acts, e.g., reconviction rates, but also other aspects that might potentially be harmful to others such as impulsivity, drug use, or attitudes towards aggression. We distinguished these outcomes from those that might also be related to criminal activity but primarily indicate an individual's internal well-being. These variables such as self-esteem, stress, or depression were categorized into psychological outcomes. These two categories are broad and reflect heterogeneous outcomes. For example, reconviction rates as official data differ from self-reported crime and aggressive attitudes. Nevertheless, these outcomes are used to measure the effects of participation in sports programs and are often related in empirical studies (e.g., Farrington and Ttofi 2014;Lösel and Bliesener 2003). They may indicate various levels of criminal intensity in different phases of development. Regarding psychological outcomes, a variety of factors need to come together for committing crime or showing aggressive behavior (e.g., Lösel and Bender 2017). We only included variables that have shown some relation to crime and delinquency in research. Again, a practical issue relates to the small body of research which warrants not too narrow outcome categories.

Effect Size Coding
For our meta-analytic integration, we extracted relevant statistics to code the effect sizes. The effect size calculator provided by the Campbell Collaboration was applied for effect size computation (Wilson, n.d). Odds Ratios (OR) were computed according to  Borenstein et al. (2009) for binary data such as reconviction rates. If significant, OR > 1 means the program was effective in decreasing the likelihood for criminal outcomes, whereas significant OR < 1 indicated that the odds for reconviction were higher for participants in the sports program than for the comparison group. ORs were transformed via the log transformation of OR into the standardized mean difference, Cohen's d, using the formulas suggested by Borenstein et al. (2009). We computed Cohen's d for continuous data using the pooled standard deviation via the online calculator. To calculate the standard error and afterwards the confidence interval of the effect size, we computed the square root of the variance of the effect size (see Borenstein et al. 2009). In the case of pre-post-measurements, we computed Cohen's d for each time point and subtracted the pre-test effect from the post-test effect to get the net effect size that accounts for potential pre-treatment differences. Usually, pre-post designs require knowledge of correlations for correcting the standard deviation, which is often not available (Cuijpers et al. 2017). Lipsey and Wilson (2001) suggest that if not known, any correlation could be suitable. We decided to use r = 0.50 for all pre-post measurements to avoid being too conservative or liberal in our estimations.
Even when studies already provided effect sizes, we computed them according to our formulas. In some cases, several outcomes were reported in the same study. We calculated an overall effect size as well as the associated variance following the suggestion of Borenstein et al. (2009). Following Cohen (1988), the program was evaluated as having a small effect if d ≤ 0.20, a moderate effect if d ≈ 0.50 and a large effect if d ≥ 0.80. Positive effect sizes indicate a successful intervention. We also report Cohen's U 3 obtained in the overall meta-analysis calculated via the tool by Magnusson (2020). The percentage presents how many of the experimental groups are above the mean of the control group (Valentine and Cooper 2003). This measure was found to be more informative than reporting Cohen's d only (Hanel and Mehler 2019).

Statistic Procedures
The meta-analysis was conducted by combining the weighted mean effect size of each study as well as the according standard errors and the confidence intervals (see Hedges and Olkin 1985). We tested for homogeneity via the Q-statistic, but since this test statistic has certain limitations, especially when study number is small, we additionally computed I 2 (Higgins et al. 2003). Homogeneity implies that the studies' effect sizes represent an overall mean, while heterogeneity indicates that the effect sizes represent different population parameters (Schmucker and Lösel 2011). The former entails the application of a fixed effects model, while the latter results in integrating the data with a random effects model. Moderator analyses were carried out using mixed effects models that account for systematic between-study differences but also for random unobserved variables (Lipsey and Wilson 2001). We also considered a potential publication bias, which suggests that studies yielding a large positive effect are more likely to be published than those with non-significant effects. This could obscure systematic differences between published and unpublished literature (Borenstein et al. 2009). Duval's and Tweedie's (2000) trim and fill method was applied if funnel plot asymmetry indicated missing studies.
In the following, we present the findings of our systematic review based on the retrieved 24 studies. Afterwards, we report our meta-analytic results for crime-related as well as psychological outcomes.

Systematic Review
Appendix A contains a detailed overview of the sports programs that fit our criteria of inclusion. A short narrative description of the outcomes of the studies is also presented. Two studies seemed to report an evaluation of the same program (see Hess et al. 2015;Scheithauer et al. 2020) but since they differed for example regarding the sample size, we reported the studies separately in our review.

Study Characteristics
Nine out of 24 evaluations were conducted in the UK, three in the USA, two in Israel, and two in Australia. The other programs were implemented in Germany, Uruguay, France, Italy, The Netherlands, Israel, and Brazil. Nine studies used either non-equivalent or no control groups. Two studies were rated twice on the SMS because different procedures had been applied for psychological and crime-related outcomes (Meek 2014;Nichols andTaylor 1996, as cited in Nichols 2007). But since the assessment of the psychological outcomes in both studies were rated at Level 1 and could not be meta-analytically integrated, we included the rating for crime-related outcomes in Fig. 2 that presents the frequencies of the internal validity ratings from Level 1 to Level 5.

Program Characteristics
As can be seen in Fig. 3, most programs were applied in tertiary prevention. These findings are to a certain extent reflected in the program settings. They were set in communities (n = 4), sports clubs (n = 5), and mostly prisons (n = 9), which entails all types of confinement such as prison or other kinds of custody. Two programs were implemented in a school; four programs could not be adequately sorted into one of these categories because no clear information was available regarding the setting. Due to the many kinds of sports used in the programs, we subsumed them into three categories: Individual sports, team sports, and programs combining team and individual sports. Eight programs used individual sports such as fitness exercises, needs-based activities (e.g., swimming), boxing combined with martial arts, or yoga. Team sports in eleven programs included for example football, rugby, or basketball. The five projects combining individual and team sports used for example boxing alongside team exercises and athletics as well as football and table tennis. One of these studies evaluated several programs offering different sport types. Youths participated in one of five sports programs offering for example horseback riding, tennis, martial arts, football, or basketball.

Sample Characteristics
The samples comprised male and female participants, although no program was conceptualized for women only. From the programs that allowed conclusions on the participants' gender, eleven programs targeted men only and eleven programs were designed for mixedgender groups. Information on age ranges could be retrieved from twelve studies, showing that the participants were between 7 and 59 years old, with a focus on young people and adolescents. Mean age was mostly reported in studies examining programs in prison settings, the computable mean showed an average of M = 25.02 years for participants. For further details of age and sample sizes, see Appendix A.

Risk of Bias in Studies Included in the Meta-Analysis
By including only evaluations rated Level 3 and above on the SMS, we aimed to decrease the risk of bias in our meta-analysis. RCTs were used five times as were studies with matching procedures; three studies were rated on Level 3. A more nuanced assessment of the studies as reported in Table 1 allows to get insight into specific biases present in the included studies.
Of course, with the RoB 2 designed for randomized trials, quasi-experimental designs without randomization will almost always be at high risk of bias. Some evaluations, however, chose retrospective designs decreasing potential confounders. As the SMS indicates, five studies also tried to deal with the lack of randomization possibilities by conducting matching procedures. Regarding missing data, a high bias was detected in five studies due to drop-outs and treatment of these attrition effects. In contrast, due to lack of missing data   Nichols and Taylor (1996)  or appropriate reporting and/or statistical analysis to deal with missing data, four studies showed some concerns and four studies low bias. Missing data was not a specific problem of "weaker" quasi-experimental designs but also occurred in randomized controlled trials.
Risk is also introduced if information on missing data is not reported sufficiently. This is also the case on other domains of biases, e.g., the selection of reported outcomes. While most studies did not seem to have serious risks of selectively reporting outcomes, lack of pre-specified plans (e.g., data analysis plans) introduced some level of bias. Psychological constructs contain the bias that arises due to self-reported data. Studies using official data (e.g., reconviction rates) are less susceptible to bias, however, the complex phenomenon of crime and delinquency includes different aspects that need to be measured with a variety of tools.
Overall, studies in this meta-analysis showed a certain degree of bias that needs to be considered when interpreting the following analysis.

Meta-Analytic Integration
In total, fourteen studies fulfilled our eligibility criteria of Level 3 and above on the SMS for meta-analytic integration, but effect sizes could not be calculated for one study according to our procedures. We separately analyzed those studies providing information on crime-related outcomes (k = 10) and studies examining psychological outcomes (k = 8). We tested the influence of moderators such as the type of sports, the study design, the type of prevention, the setting, the year as well as the country of publication and the participants' age, and gender. Table 2 presents the outcome measures and effect sizes of the studies assessing crimerelated outcomes integrated in the meta-analysis. These outcomes ranged from official measures such as reconviction or reoffense rates to self-reported behavior and attitudes connected to crime. As mentioned, we used the overall study effect size for our integration when more than one outcome was reported in the study. Figure 4 summarizes the effect sizes of included studies as well as the overall effect size in a forest plot. The effects did not differ due to sampling error alone but were also influenced by random or systematic sources (Q (df = 9) = 57.60, p < 0.001; I 2 = 84.38%). As a result, a random effects model was applied and revealed a positive effect of sports programs on crime-related outcomes, d = 0.36 (SE = 0.10), p < 0.001, 95% CI [0.17, 0.56]. The trim-fill analysis suggested one missing study to the left of the mean, and the imputed effect size changed to d = 0.33, 95% CI [0.15, 0.52]. Therefore, sports programs had a moderate effect, with 64% of the participants reporting more positive effects than the control group on average. In total, 14% of the participants in sports programs showed less reconviction and more positive outcomes than the control group, while only 36% reported equally low improvement as the control group.

Crime-Related Outcomes
The meta-analysis including moderator analysis is presented in Table 3. No moderating effects of program, sample, or study characteristics were found. Neither participants' age nor gender did influence the outcomes and neither did program characteristics such as the type of sports, type of prevention, or program setting. The design based on the SMS did not significantly vary between the groups either. Due to the small number of studies, we collapsed moderators to generate broader categories, for example as reported in Table 3 studies were divided according to whether the programs took place in or outside of Europe.

Psychological Outcomes
Eight studies reported psychological outcomes. Their effect sizes and the assessed variables are reported in Table 4. The outcomes ranged from self-esteem to emotions, burdens related to stress and pressure from others, as well as depressive symptoms. As in the analysis on criminal outcomes, the overall study effect sizes were used for the meta-analytic integration. Figure 5 displays the forest plot for included studies and the total effect size. Our tests revealed high heterogeneity between the studies, Q (df = 7) = 265.26, p < 0.001, I 2 = 96.61%, resulting in the application of a random effects model. A large significant mean effect was detected for sports programs on these outcomes, d = 0.87 (SE = 0.36), p < 0.05, 95% CI [0.16 -1.58]. Cohen's U 3 suggests that 81% of participants in sports programs were above the mean of the control group and, therefore, 31% more people reported improved psychological well-being than compared to the control group. We found no indication of publication bias in the trim-fill-analysis. Regarding moderators (see Table 5), the sample characteristic of age showed a significant effect. Overall, studies with older participants showed larger effects than those with younger participants. No other significant moderators emerged. However, when grouping together countries according to whether they belonged to Europe (k = 5) or not (k = 3), programs implemented in European countries tended to show larger effect sizes (p = 0.07).

Discussion
Sports programs are frequently implemented to prevent crime and delinquency (Spruit et al. 2018; United Nations Office on Drugs and Crime 2018). However, surprisingly little well-controlled empirical research has been carried out to evaluate their effectiveness as a measure of crime prevention. In our systematic review we gathered studies measuring the success of sports programs specifically designed for prevention. Additionally, we conducted a meta-analysis to determine the direction and strength of effects. We computed two separate analyses, one on crime-related outcomes in general and one on psychological factors related to criminal behavior. Fifty percent of the programs included in the review were conducted on tertiary prevention and three quarters of them were implemented in prison settings. Applied sports ranged from individual sports (e.g., fitness exercises, swimming, boxing) to team sports (e.g., rugby, football) as well as programs with mixed forms. We found all-male as well as mixed programs, however, no program that was designed for females only. This finding is insofar not surprising as more men than women are incarcerated (Walmsley 2015) and sports seem to attract boys especially (Sandford et al. 2006). Against this background, more male-dominated sports programs in-and out-side of prison are to be expected, although this limits the generalizability to girls and women.
Our meta-analytic results suggest that sports programs significantly protect against criminal behavior and related attitudes. Participants in the sports programs showed more favorable outcomes than those in the control groups. Sports programs had a moderate effect on various outcomes including recidivism, drug use, anger control, attitudes towards offending, and aggression as well as impulsivity as an indicator for controlling violent behavior. According to Lösel and Farrington (2012), direct protective factors reduce antisocial behavior, which is supported by this analysis and provides empirical support to the idea that sports can have a protective effect and work to reduce violence and delinquency. Regarding psychological outcomes associated with the development of criminal behavior such as psychological well-being, our analysis also revealed a large significant effect on these indicators. However, overall, our results need to be considered under the limitations regarding the small body of research we could examine in the meta-analysis (ten studies on crime-related outcomes and eight studies on psychological outcomes) and the risk of bias present in some of the included studies. It must also be To understand why and how sports programs are successful for crime prevention, moderator analysis can provide insights into the contexts that facilitate success. When looking at crime-related outcomes, none of our moderators significantly influenced the programs' effects. Although we above presented various plausible theoretical concepts on the preventive impact of sports programs, the intervention studies did not really open the "black box" on these mechanisms.
We found no significant differences between programs regarding their level of prevention (i.e., primary, secondary, or tertiary) and neither regarding the setting (e.g., program implemented in prisons, schools, sports clubs, or communities). Theoretical considerations propose mechanisms how sports programs effectively prevent crime. For example, one aspect refers to routine activity and diversion from crime, which disrupt individuals from engaging in crime due to being otherwise engaged (Ekholm 2013b;Spruit et al. 2016a;Veliz and Shakib 2012). While this might be true especially in primary or secondary preventive settings where on-going participation is possible, this mechanism might be less applicable to the tertiary setting. Programs are often set in prison and aimed to reduce reoffending at a time when individuals are not involved in these sports programs anymore. Thus, mechanisms of sports programs in different settings might partially overlap but also differ in various aspects. For instance, social bonds built during program participation can protect against crime and delinquency because individuals do not want to damage these positive relationships (Hirschi 1969;Nichols 2007;Veliz and Shakib 2012). However, former prisoners might lose contact with coaches and peers of the program due to release and, therefore, valuable bonds developed during participation could dissolve. In contrast, aspects of Hirschi's (1969) dimension of belief could be relevant for all settings if acceptance of pro-social norms and values persists after participation. Through mechanisms of social learning anticriminal attitudes and norms can be facilitated through role modelling (Bandura 1973; Gubbels et al. 2016). A coach can convey and promote prosocial values and norms as well as teach non-violent conflict solution strategies by modelling his own behavior accordingly (Nichols 2007). Spruit et al. (2018), for example, reported a greater decrease of aggression in youths when the coach showed motivating behavior towards them. Furthermore, if a coach manages to build positive and close relationships with the participants, rolemodelling can be more effective (Nichols 2007). As some studies included in our review report, participants seemed to be able to build positive relationships with their coaches during program participation (Farrell et al. 1996;Poole 2009). Developing positive relationships seems to extend to the peers as well (e.g., Poole 2009; 3PillarsProject 2016), who are also a source for learning prosocial norms and values (e.g., Veliz and Shakib 2012). Since we did not find effects of level of prevention or setting, following the theoretical considerations, different mechanisms could be of relevance in specific settings. For example, routine activities in prison might show fewer influence on post-release crime behavior in comparison to positive experiences with peers and coaches that might have more lasting effects. In contrast, routine activities might be more important for participants in primary or secondary settings.
A similar explanation could be applied to the type of sports used in prevention programs. Different types of sports could offer different learning opportunities and different competencies for the engaged individuals (Ekholm 2013a,b;Holt and Jones 2007). Team sports seem to be more successful in improving a specific set of skills, e.g., peer relationships and self-efficacy, and could provide more opportunities to learn conflict resolution skills and prosocial behavior patterns (D'Andrea et al. 2013Nichols 2007Nicholson and Higgins 2017). However, as Holt and Jones (2007) point out, there is a lack of research examining differences between individual and team sports. A study by Laborde et al. (2016) can shed some light onto the issue. They found more positive effects in individual sports on personality-related aspects such as resilience, self-esteem, and self-efficacy in comparison to team sports. This could be due to the increased responsibility given to and expected from the athlete performing in individual sports. While this finding could not be detected in our analysis of psychological variables, it still warrants further examination. Furthermore, the small number of evaluations in our meta-analysis are likely to have influenced the results and, therefore, limits the ability to sound conclusions on what works best in terms of crime reduction.
It is assumed that sports programs prevent delinquency by positively influencing aspects of psychological well-being (Morris et al. 2003). Therefore, we also looked at these potentially underlying risk factors of crime and delinquency and conducted a second meta-analysis on psychological outcomes. Our analysis revealed a substantial effect of participation in sports programs on outcomes such as self-esteem, depression-related aspects, as well as stress and perceived pressure. There are multiple social and psychological risk and protective factors associated with crime and delinquency Bender 2003, 2017). Therefore, our analysis combining different psychological outcomes allowed for an overall as well as differentiated assessment of the effects of sports program participation. However, relations between these constructs and criminal outcome behavior are not always as clear as expected and must be considered with great care.
The same also applies to the relationship between sports and psychological correlates due to the complex interactions. Nichols (1997), for example, argues that an increase in self-esteem mainly takes place among people who are frequently winning competitions and therefore high achievers in their discipline. On the other hand, to those who do not frequently win or perform successfully, an increase in self-esteem might be harder to obtain and even increase the risk of negative impacts on self-esteem (Andrews and Andrews 2003;Nichols 1997). As participants who are not as athletic as their group members are more likely to experience less favourable self-perceptions, it could be advisable to focus on the individual success and less on winning in general (Andrews and Andrews 2003). Fraser-Thomas et al. (2005) also suggest that less emphasis should be put on winning and competition in preventive sports programs to contribute to a positive environment within the group. Competitive sports could decrease collaboration with other participants and especially team sports might create frequent opportunities for competition in the team (Boone and Leadbeater 2006;Holt and Sehn 2007). However, our moderator analysis found no effect of type of sports on psychological outcomes. As Holt and Sehn (2007) suggest, competition might be beneficial under certain circumstances. Competition could, for example, encourage participants to develop skills of self-evaluation that could contribute to the success of the whole group (Holt and Sehn 2007). This emphasizes that the aims of the sports must be thoroughly defined and clearly communicated when implementing such programs.
This also applies to other aspects of sports programs. In the context of martial arts, careful implementation of sports is very important. While both negative and positive effects have been found (Endresen and Olweus 2005;Gubbels et al. 2016 ), trainings accompanied by a philosophy of defence instead of attacking were related to more positive outcomes (Davis and Menard 2013;Endresen and Olweus 2005). This emphasizes the significant role of coaches in sports programs and corresponds to findings of Spruit et al. (2016a): Athletes in out-of-school settings showed more delinquency than those in schools. They argued that trainers in out-of-school sports activities are less equipped with pedagogical skills than those in schools (2016a). However, in our analyses we did not find a moderating influence of the setting on relevant outcomes. But only two studies integrated in the analysis of psychological outcomes took place in schools and the others took place in prison settings and one in a sports club. Unfortunately, most primary studies did not provide detailed data on the specific influence of coaches. For future research, it is important to examine the program context and the individual relationships between the coach and the participants.
Research on "what works" in programs for juvenile delinquents and offender rehabilitation suggests that such implementation conditions are at least as important as the content of a specific program (Lipsey 2018;Lösel 2018). Therefore, programs need to be meticulously planned and implemented with great care to avoid adverse effects that could lead to an increase instead a decrease of delinquent behavior. As the findings of the Cambridge-Somerville Youth Study have shown, good intentions are no guarantee for desirable effects (McCord 2003). More research is needed on the role and expertise of the coaches as well as sport types used in the programs.
The only significant moderator in our analysis on psychological outcomes was age. Studies with overall older participants showed larger effects compared to studies with younger target groups. As both groups included prisoners and programs set inside prisons and neither moderator significantly explained the variances in effect sizes, other explanations need to be considered. Potentially, if sports provide individuals with opportunities for positive experiences (e.g., Nichols 2007;Reverdito et al. 2017;Scully et al. 1998) older participants could be more sensible for these positive reinforcements if they were not able to make such experiences in a long while. Furthermore, they might have been incarcerated for a longer time and might have, therefore, had less opportunities to increase their psychological well-being in other ways. However, too little is known on the effects in middleaged individuals as many studies in sports participation focus on children and adolescents and how sports might predict outcomes in later life (see for example Doré et al. 2019;Spruit et al. 2016b). It should also be noted that our small sample cannot rule out biases within the studies. Nevertheless, it is worth examining groups with different demographics to understand specific effects.

Limitations
Despite the worldwide huge implementation of sports programs for crime prevention and our broad set of inclusion criteria, we were only able to retrieve a rather small number of relevant studies for the systematic review. Therefore, the results of our meta-analytic integration could have been strongly influenced by specific studies. In addition, our review revealed that studies often used no adequate control groups. Therefore, the number of eligible studies was further reduced in the meta-analysis on methodologically sound study designs. Since studies with lower quality often report higher effects, which might lead to an overestimation of the true effects (Dreier 2013;Sharpe 1997;Weisburd et al. 2001), we included only studies using at least roughly equivalent comparison groups and were thereby also able to diminish the risk of alternative explanations for the results (Farrington et al. 2002;Lösel 2007). Nevertheless, even studies with more rigorous designs are at risk of bias as shown in our analysis. Biases are for example introduced due to the measurement of the outcomes or deviations in adherence to programs from some participants. Furthermore, many studies had to deal at least with some degree of missing data. Our review indicates the need for more methodologically well-controlled evaluation designs as well as improved reporting of potential biases in the studies. Of course, well-controlled RCT s would be preferable for evaluating program effects. However, sometimes external circumstances in the implementation of programs pose serious difficulties to achieve internal validity (e.g., Lösel 2007). For example, Verdot et al. (2010) reported that randomization was not allowed due to ethical reasons. In other cases, approval from administrators can also hinder the use of RCTs in prisons (Marshall and Marshall 2007). However, there are also examples that show randomized studies in the present field are even possible in prisons (e.g., Bilderbeck et al. 2013;Castleton and Cid 2012). If randomization is not possible, researchers should base their evaluations on the best possible quasi-experimental designs (e.g., Campbell 1969;Farrington et al. 2019;Lösel 2007;Weisburd et al. 2001).
Our broad inclusion criteria resulted in a great variety of outcome measures of crime and related outcomes. In meta-analysis, this poses the threat that studies are too heterogeneous and outcomes reflect different constructs (Sharpe 1997). Our results also showed much heterogeneity between outcomes. These are inherent problems of meta-analyses on topics with rather small homogeneous data bases. The broadness of inclusion needs to be assessed for every research question and varies accordingly (Schmucker and Lösel 2011). Crime appears in many different forms that are not sufficiently differentiated in the primary studies of this review. More data on specific outcomes would have enabled more details on what works best in certain contexts and for specific outcomes. Our analysis showed that especially studies on tertiary prevention offered relatively sound information for meta-analytic integration, which is a positive trend. More sound studies on other levels of prevention should increase knowledge for different groups and contexts.

Conclusion
Despite limitations, which are mainly due to too few sound primary studies and associated risks of bias, our findings suggest that sports programs can prevent crime and delinquency and reduce reoffending. The programs also showed significant effects on indirect or mediating psychological factors underlying delinquency such as psychological well-being. More well-controlled studies are necessary to determine which context factors influence underlying mechanisms and how these can be promoted. Future research should address aspects of peers and coaches on the program the type of sports, their implementations, and potential gender differences. Additionally, evaluations need to draw on better scientific designs to strengthen the validity of the results.  Nichols and Taylor (1996) in Nichols