Introduction

It is well-established in the empirical literature that persons who are criminally involved are, on average, lower educated (De Li 1999; Bernburg and Krohn 2003; Morgan and Kett 2003; Harlow 2003). Interpretation of this result is not straightforward, because many underlying factors, such as socio-economic background and individual-specific characteristics, may affect criminal involvement and educational outcomes simultaneously. In this regard, causal studies that disentangle the true effects from the influence of underlying factors are of great importance. Whereas evidence on the negative causal effect of education on youth crime is growing (Machin et al. 2012; Landersø et al. 2013; Anderson 2014; Aoki 2014), much less is known about whether early criminal behavior causally affects educational outcomes. This knowledge gap is mainly due to the fact that exogenous variation in criminal involvement is difficult, if not impossible, to identify, which limits causal inference. Furthermore, the existing literature has rarely examined factors that underlie the relationship between criminal behavior and education. Such evidence is of great importance, because it can be helpful to address criminal behavior and lower educational outcomes in a more effective and efficient way.

In this study, we analyze how criminal involvement during adolescence, measured by registered contacts with the police, is related to school dropout. We use administrative individual-level data from the Netherlands, which contain information on criminal involvement, on educational careers and on an extensive set of background characteristics of young people. These data allow us to take into account the timing of criminal involvement and school dropout, as well as address the issue of reverse causality. We exploit information on all students in the Netherlands who were enrolled in the first three grades of secondary school in the academic year 2005/2006, and whom we follow until the academic year 2011/2012.

In addition to examining the relationship between early criminal involvement and school dropout, we identify the factors that underlie this relationship. In particular, we examine to what extent the unconditional correlation between criminal involvement and school dropout can be explained by observable factors, and by school-, class-, family- and individual specific characteristics, which are unobservable. For this purpose, we first estimate the unconditional correlation between early criminal involvement and school dropout using a basic ordinary least squares model. We acknowledge that this association is likely to be driven by observed and unobserved heterogeneity. Therefore, we subsequently estimate this association by including an extensive set of observable family and individual characteristics. We then proceed to models that additionally account for the influence of unobservable underlying factors by estimating school, class, sibling and twin fixed effects. We acknowledge that these fixed effects can be part of the extensive set of observable characteristics that we already control for or can be correlated with them. Therefore, we also provide an alternative analysis, conditional only on age, gender and ethnicity.

The data do not allow us to identify whether twins are identical or fraternal, but it is possible to estimate the same-gender twin fixed effect model. Since different-gender twins are never identical, the proportion of identical twins in the population, and hence the average genetic overlap, will increase when we restrict the sample to only same-gender twins.

We find that even after controlling for same-gender twin fixed effects, the estimated relationship between criminal involvement and school dropout remains positive and statistically significant. On the one hand, this result can be driven by selection on individual characteristics that we cannot rule out after taking into account same-gender twin fixed effects. Although same-gender twins have on average a larger genetic overlap, share many common features and are subjected to similar environmental influences, they can differ, for example, in cognitive and non-cognitive development, may have different peers, or can be treated differently by their parents. On the other hand, these estimates can reflect a true treatment effect of spending time on criminal, rather than on educational, activities or resulting from interactions with juvenile justice. Even though the exact mechanisms behind the estimated effects remain unknown, we conclude from the heterogeneity analysis that serious criminal involvement, which usually results in more intensive interactions with justice, is likely to causally affect school dropout.

The previous literature has distinguished several potential mechanisms underlying the effect of criminal involvement on education. First of all, criminal involvement and interactions with the justice system can disrupt individuals from the educational (and learning) process, which can eventually lead to lower educational performance, and, in turn, to school dropout (Sweeten 2006; Hjalmarsson 2008; Lochner 2011). Secondly, accumulation of ‘criminal capital’ can replace the need to invest in education and can reduce the effort put into the learning process (Grogger 1998; Lochner 2004; Ward and Williams 2015). Furthermore, stigma as a result of criminal involvement can negatively affect educational outcomes. In particular, teachers and parents might spend less time and effort on children who were criminally involved. Sweeten (2006), for example, shows that court appearance is likely to have a stronger impact on high school graduation than arrest. He explains that, from a prospective of formal labeling theory, court appearance might limit students’ educational opportunities more severely, because schools might impose a ‘zero-tolerance policy’ towards such students and can even expel them from school. Furthermore, a number of studies show that having a criminal record provides a negative signal to employers (e.g. Pager 2003; Apel and Sweeten 2010; Uggen et al. 2014). Anticipating this, individuals who were arrested can become less motivated to attain higher levels of education. Finally, interactions with the criminal justice system can potentially provide shocks to non-cognitive skills of young people, which in turn can negatively affect educational outcomes, for instance through motivation or aspirations (Behncke 2012). Although these mechanisms are extensively discussed in the literature, there is no convincing evidence regarding which mechanisms are essential in the relationship between criminal involvement and educational outcomes. The only available evidence that provides some suggestions about the mechanisms is the finding that arrest has a stronger negative effect on education in comparison to delinquency that does not result in arrest (Ward et al. 2015), and that detention of juvenile offenders causally reduces the chances of school completion, compared to probation of juveniles offenders who committed similar offences (Aizer and Doyle 2015).

The remainder of this study is organized as follows. “Previous Empirical Literature” section discusses previous empirical research and the contributions of our study to the literature. “Data” section describes the data used in this study. “Empirical Strategy” section explains the empirical strategy. “Results” section discusses the main results. “Heterogeneity in the Relationship Between Early Criminal Involvement and School Dropout” section presents several heterogeneity analyses. Sensitivity checks are provided in “Sensitivity Analysis” section. Section “Conclusions” finalizes the study.

Previous Empirical Literature

Sweeten (2006) argues that evidence from previous studies on the relationship between criminal involvement and educational outcomes is limited because research samples used in these studies are often not representative and selection bias is not adequately addressed. His study contributes to the literature by using data from the National Longitudinal Survey of Youth 1997 (NLSY97), a nationally representative sample of young people, and by controlling for an extensive set of variables in the relationship between early criminal involvement (measured by arrest and court appearance) and educational attainment. The results suggest that first-time arrest and court appearance, when separately estimated and conditional on earlier delinquency and other demographic controls, positively affect the odds of high school dropout. Once court appearance and arrest are estimated together in the same model, the effect of arrest disappears, while the effect of court appearance remains a statistically significant predictor of school dropout. The author acknowledges that despite controlling for earlier delinquency, educational performance in middle school, school suspension and several demographic characteristics, his study does not completely eliminate the possibility of selection bias.

More recent studies have used the same data but have applied different econometric techniques to extensively address endogeneity between early criminal involvement and educational attainment (Hjalmarsson 2008; Ward and Williams 2015; Ward et al. 2015). Hjalmarsson (2008) uses an extensive set of controls to account for observed heterogeneity in the relationship between early arrest and incarceration and high school graduation, and applies techniques proposed by Altonji et al. (2000) to assess how sensitive this relationship is to selection on unobservables. In this sensitivity analysis, selection on unobservables is deduced from selection on observables. Under certain assumptions, one can estimate what share of selection on unobservables has to be present, compared to selection on observables, to conclude that the estimated effect is entirely driven by unobserved heterogeneity and thus does not represent a causal effect. Hjalmarsson (2008) shows that being incarcerated before age 16 lowers the probability of graduating from high school by about 26 percentage points. From the sensitivity exercise, she concludes that this relationship appears to partly represent a causal effect. Furthermore, Hjalmarsson (2008) finds that early arrest reduces the probability of high school graduation by 11 percentage points. She concludes that this relationship, however, does not appear to be causal, given the results from the sensitivity analysis.

Ward and Williams (2015) add to the literature by studying the relationship between self-reported delinquency by age 16 and graduation from high school using the NLSY97 data, thereby going beyond investigating the effect of criminal justice interactions, i.e. arrest and incarceration. They also explore the mechanisms involved in this relationship. Similarly to Hjalmarsson (2008), they apply the same test to assess the sensitivity of the results to the effect of unobservables. The study provides plausible evidence that delinquency by age 16 reduces the likelihood of graduating from high school by an estimated 7–10 percentage points, and graduating from college by an estimated 7–13 percentage points.

Using the same data, Ward et al. (2015) estimate whether being arrested, and separately being delinquent, leads to school dropout. To account for the effect of unobserved confounders, Ward et al. (2015) apply a multivariate mixed proportional hazard model, which takes into account the timing of first arrest relative to school leaving. The authors conclude that the effect of arrest on the probability that an individual leaves school is approximately twice the magnitude of the effect of delinquency.

Kirk and Sampson (2013) use data from the Project on Human Development in Chicago Neighborhoods Longitudinal Cohort Study, merged with administrative data on arrests from the police department. The study examines the relationship between juvenile arrest and high school graduation as well as college enrollment on a sample of students from Chicago public schools. To address selection bias, Kirk and Sampson (2013) apply propensity score matching techniques. The results suggest that arrested adolescents drop out of high school 22 % more often than not arrested adolescents with similar background characteristics; whereas enrollment rates in four-year colleges are 16 % lower for arrested individuals relative to not arrested individuals. Using Rosenbaum’s (2002) bounding approach, the authors show that these results are robust to unobserved heterogeneity.

Aizer and Doyle (2015) also use data from Chicago. These are administrative individual-level data on juvenile offenders who were arrested and referred to a juvenile court in Chicago, over a period of 10 years. The majority of these individuals are serious offenders and recidivists, since juvenile offenders who commit minor crimes are usually dealt with by the police only. Juvenile offenders referred to court are randomly assigned to different judges. Based on the decision of the judge, a juvenile offender either receives a sentence of incarceration followed by probation or probation only. Detention takes place in the Cook County Juvenile Temporary Detention Center, Illinois. These sentences typically last 1–2 months, including pretrial detention. Juveniles in detention cannot visit their regular school. In contrast, juvenile offenders on probation attend school and do not have further contact with the judge. The authors exploit random assignment of juvenile offenders to different judges to examine how sentencing affects high school completion. They link data from juvenile courts to administrative data on education for the same state. The study concludes that incarceration of juveniles decreases high school completion by around 13 percentage points. The strongest results are found for juveniles aged 15 and 16. The estimated effects from Aizer and Doyle (2015) represent the effects at the margin of being detained versus being only referred to the juvenile court but not detained because of facing a favorable judge.

Our study is most closely related to the study of Webbink et al. (2013), which estimates a twin fixed effects model using data on twin pairs from Australia. If the family and regional environment is similar for fraternal twins, the twin fixed effect model controls adequately for this unobserved heterogeneity. Identical twins additionally are identical in their genes. Another advantage of the twin fixed effect approach is that twins are prevalent across all educational levels, which contrasts with instrumental variable approaches or regression discontinuity designs that estimate effects at a very specific margin. For the sample of fraternal twins, Webbink et al. (2013) find that arrest before the age of 18 reduces educational attainment by up to 0.99 education years and lowers the probability of completing senior high school by up to 24 percentage points. These results are statistically significant, but relatively imprecise, because there are only 28 fraternal twin sets with variation in arrest status in the sample. The sample of identical twins include 14 twin sets with variation in arrest, and as such the precision of these estimates is even lower, and the estimates are not statistically significant [the effect on educational attainment is −0.03 (standard error of 0.583) and the effect on the probability of high school graduation is −0.12 (standard error of 0.115)].

In addition to addressing the effect of observed and unobserved heterogeneity in the relationship between youth crime and education, this study contributes to the literature by a step-by-step analysis of the different factors that underlie this relationship. Our study informs on the relative explanatory power of different sets of factors in the relation between crime and school dropout, in particular, factors common to the school, to the class, to the family, and to sets of twins. This analysis allows for a comparison in the relative importance of these factors in explaining the relationship between crime and dropout. The current study therefore adds to the literature on the determinants of criminal involvement and school dropout, whereas the approach that we use can be applied in a similar manner to examine the role of different factors (e.g. school, family and peers) in other relationships that are likely to be selective on such factors. Moreover, it is particularly interesting to compare the estimates for twins with other fixed effects analyses, as one could potentially also think of arguments why twin fixed effects analysis leads to an overestimation of the identified effect, for example because twins might be more inclined to differ from one another in their behaviour (e.g. Bound and Solon 1999; Sacerdote 2010). Although twin characteristics are commonly used in criminological research and considered as a valid instrument to control for genetic and environmental influences (e.g. Beaver 2008; Mocan and Tekin 2005; Nedelec et al. 2012; Barnes et al. 2014 and references therein), another concern regarding twin studies is to what extent a sample of twins is representative for the whole population and whether results are generalizable. Given the critiques on the use of twin variation, the comparison with other fixed effects analyses is interesting.

The second contribution of this study is the use of administrative longitudinal data that contain information on characteristics of all students in the first grades of secondary schools in the Netherlands. We can track these students in the crime register and in educational data over several years. Moreover, we can identify sets of siblings and sets of twins in these data. The substantial number of siblings and twins in our data ensures that there is sufficient statistical power for the analysis of the fixed effects models and that the estimates can be precisely measured. Studies that use a sampled population of twins can suffer from lack of statistical power (e.g. Webbink et al. 2013).

The majority of previous studies have used self-reported data to examine the relationship between criminal involvement, typically measured by arrest and incarceration, and educational outcomes.Footnote 1 An advantage of using self-reported data is that they might contain criminal involvement for which young people were not caught by the police. At the same time, administrative data should be more reliable in collecting information on arrests and incarcerations. In surveys, individuals are asked to provide retrospective information on their involvement in criminal acts. The danger of using such information is that respondents might refuse to report some information, or they can purposely misreport their criminal past, or they can forget the exact timing of such involvement. The latter is especially a concern for studies that want to explore the effect of the timing of criminal involvement using self-reported data.

The strength of our data is also that they allow us to ensure that criminal involvement predates school dropout, thereby reducing concerns about reverse causality. Previous studies have rarely addressed, or even acknowledged, the problem of reverse causality in the relationship between criminal involvement and education. There are some positive exceptions. Ward et al. (2015) address the issue of reverse causality by allowing school leaving to affect the transition into delinquency and arrest. Webbink et al. (2013) partly account for reverse causality by controlling for school performance, namely the three-point measure of grades in primary school, the three-point measure of grades in secondary school, grade repetition, and teacher’s view on under-achievement. Although controlling for primary school performance is a valuable addition, it should be noted that controlling for school performance in secondary school implies that any effect that criminal involvement has on educational attainment and that operates through secondary school performance is taken away.

Our data enable us to explore the relationship between early criminal involvement and school dropout with respect to the timing of the first criminal involvement, the severity of criminal offences and heterogeneity across the population of juvenile offenders. Finally, by examining data from a European country, the Netherlands, this study contributes to the external validity of existing evidence that is predominantly established using data from the U.S.

Data

We use longitudinal administrative data on students in the first grades in (upper) secondary education in the Netherlands. We can follow their educational careers from the academic year 2005/2006 until the academic year 2011/2012. Using unique personal identifiers, these data are linked to the yearly based crime register for the period between 2005 and 2010, and to administrative data that contain information on individual and family background characteristics for the period from 1999 to 2010.Footnote 2 Around 0.5 % of students from the dataset on education are registered in secondary school in the Netherlands but live abroad, and they cannot be linked to other administrative datasets. Therefore, we excluded them from the sample. This results in 534,432 students who are enrolled in the first, second or third grade in secondary school, registered in October 2005. We follow these students during six academic years. For convenience, we refer to these students as the first, second and third cohort students, respectively.

Students in the Netherlands normally start their secondary education at age 12.Footnote 3 From the beginning of secondary education, they are tracked into different curriculum levels. This tracking is largely based on their results on a cognitive test, taken at the end of primary education (Cito score) and on the advice of the teacher in the last grade of primary school (College voor Toetsen en Examens 2015). There are three main tracks in secondary education: pre-vocational education (4 years), upper secondary general education (5 years) and pre-university education (6 years). Students who finish pre-vocational education continue their secondary education in upper secondary vocational education. It is common (mainly in upper secondary or pre-university education) that students are not immediately tracked but remain in a mixed track in the first and sometimes also the second year of secondary education.

This study focuses on school dropout as an outcome of educational attainment. School dropout is measured using the definition of the Dutch Ministry of Education, which states that students are considered as school dropouts unless they are registered in secondary or upper secondary education, or unless they finished upper secondary general education, pre-university education, or level 2 of upper secondary vocational education with a diploma. Students usually finish their secondary education when they are 18 or older.

Criminal involvement, the explanatory variable, reflects offending behavior of juveniles for which they were caught by the police. Juvenile offenders apprehended for minor offences are sent to the Halt bureau and registered in the yearly updated Halt data.Footnote 4 The juvenile justice system in the Netherlands is based on the restorative model of criminal justice. Halt is a restorative justice program implemented into the national system of juvenile justice. Juvenile offenders, provided they committed Halt-worthy offences, do not receive a criminal record, on the condition that they participate in the Halt program. All major offences for which juveniles can be sent to Halt are presented in Appendix. The Halt punishment includes different components, among them community service work, learning assignments, offering apologies to the victim, paying a fine, and participating in a training or a behavioral therapy program. Halt assignments last for a maximum of 20 h and are scheduled during the after-school time.

Juveniles who committed more serious offences or recidivists are registered in the annually updated Suspect Identification System (HKS, an acronym for Dutch ‘Het Herkenningsdienstsysteem’). These juvenile offenders have to deal with juvenile court. If they are found guilty, a task penalty, a training program, a psychological treatment, a fine, or detention can be imposed (Tak 2003; Van der Laan and Blom 2011). Juvenile offenders are obliged to follow education even when they are sent to a detention center. The duration of detention varies from 1 day to a maximum of 12 months, for juvenile offenders under 16 years old, and a maximum of 24 months, for 16 years old and above (Tak 2003).

Both the Halt and HKS data contain information on the criminal involvement of juveniles aged 12 and older, information on criminal behavior in the past and information on the severity of the criminal activity.Footnote 5 Juveniles aged between 12 and 16 are subject to juvenile justice law. Juveniles aged 17 and 18 are also accountable to juvenile justice law and, in particular cases of serious crime, to adult justice law.Footnote 6 To sum up, the measure of crime in our study reflects two types of registered offending behavior: (1) minor delinquent acts (e.g. vandalism, theft, trespassing, reckless behavior in the public places) for which juvenile offenders are apprehended by the police and referred to a restorative juvenile justice program Halt; and (2) offences registered by the police in the HKS register for which juvenile offenders are referred to juvenile court.

The status of criminal involvement is measured on a yearly basis from 2005 to 2010, at least 1 year before the establishment of school dropout status. This is to make sure that school dropout does not precede criminal involvement. We label students as ‘criminally involved’ if they are observed in either the HKS or Halt data before the event of either school dropout or school completion, thereby addressing potential reverse causality. If criminal involvement takes place after the school dropout status is defined, the explanatory variable is coded as 0.

Table 1 shows the descriptive statistics of the student sample. Panel A presents information on criminal involvement and school dropout. 15.4 % of the students were criminally involved between 2005 and 2010. The average registered juvenile crime rate per year in our data is around 3.5 %. Around one third of all registered criminal involvement comes from the Halt data, and two thirds are from the HKS data. Around 0.7 % of the juveniles in our data were involved in serious criminal activities.Footnote 7 The data also show that 6 % of juvenile offenders have been criminally involved in the past.Footnote 8 Figure 1 presents the distribution of first criminal involvement by the age of the students in the sample. It shows that students most often are involved in crime when they are aged 15. This is similar to the peak of criminal involvement of individuals in other countries (e.g. Farrington 1986; Piquero et al. 2007). Figure 2 shows the distribution of school dropout by the age of students. It is clear from the figure that most students who are school dropouts leave school without obtaining a starter qualification when they are 17.

Table 1 Descriptive statistics
Fig. 1
figure 1

Distribution of first criminal involvement by age, among criminally involved individuals

Fig. 2
figure 2

Distribution of school dropout by age, among early school leavers

Panel A also shows that 20.3 % of juvenile offenders become school dropouts, whereas this percentage is 9.8 for non-offenders. The average school dropout rate is 11.2. Panel B in Table 1 presents descriptive statistics of individual and household background characteristics, namely, grade cohort, age, gender, the level of urbanization of the area where the student lives, educational level of students, retention status in previous years, working status of (both) parents, single-parent household, household income (gross), education level of the mother, house ownership and the size of the household. Students in our data are on average aged 13.5 in 2005, most of them are born in the Netherlands (95 %) and around half of the students are girls (49 %). A large share of students in 2005 is still in a mixed track (38 %). Around 37 % of students are in pre-vocational education (9.9 % in the theoretical subtrack and 26.8 % in other subtracks of pre-vocational education). Around 11 % of students are in the secondary general education track and around 15 % of students are in the pre-university track.

We label students as grade repeaters if they are older than they are ‘supposed’ to be in a certain grade in 2005, i.e. they turn 13, 14 or 15 in the first, second and third grade of secondary school, respectively, before the 1st of October, which is the cut-off date for school entry in the Netherlands.Footnote 9 From Panel B in Table 1, it follows that 26.5 % of students repeated a grade before 2005. This is similar to the statistics reported in Ikeda and García (2014). As grade retention status is informative about students’ academic performance before 2005, we control for retention status in our analysis, in addition to the covariates presented above.

Using information on date of birth and the personal identifiers of parents, we identify whether students in our sample have (non-)twin siblings in the first, second or third cohorts.Footnote 10 Panel B shows that 24.2 % of the students have a sibling in one of these cohorts. 2.6 % of the students are twins and 1.7 % of students are twins of the same-gender. Panel C presents descriptive statistics for school dropout and criminal involvement among siblings, twins and same-gender twins. The rates of school dropout and criminal involvement are slightly lower among twins and same-gender twins, but only marginally.

Empirical Strategy

We estimate the relationship between criminal involvement and school dropout using the following linear regression modelFootnote 11:

$$Y_{ij} = \alpha_{0} + \alpha_{1} D_{ij} + \alpha_{2} X_{ij} + u_{ij} + \varepsilon_{ij}$$
(1)

where \(Y_{ij}\) is a binary variable which takes value 1 if individual \(i\) in family \(j\) is a school dropout and zero otherwise. The variable \(D_{ij}\) indicates if individual \(i\) was criminally involved, \(X_{ij}\) represents a vector of observed individual and household characteristics, and \(\varepsilon_{ij}\) is a random (zero mean) error term. In our baseline model (Model 0), we estimate the association between criminal involvement and school dropout without including controls. In Model 1, we include an extensive set of controls (i.e. the household, parent and student characteristics shown in Table 1). The estimated relationship may still be influenced by characteristics that are not observed, such as school and class effects, the family environment, peer effects and genetic endowments. The term \(u_{ij}\) represents this unobserved heterogeneity.

In the empirical literature, sibling and twin fixed effects approaches have been extensively used to better control for unobserved confounding factors (see an overview in Miller et al. 1995; Kohler et al. 2011; Holmlund et al. 2011). The intuition of these analyses is that siblings and twins share a similar social environment from the earliest years of life (e.g. McGuire and Segal 2013) and have an overlap in their genetic makeup (Bouchard et al. 1990). Therefore, using information on siblings and twins provides an opportunity to control for many socio-economic and genetic endowments. We deliberately use information on siblings that are close in age, because such siblings are more likely to be exposed to family cycles (shocks) in a similar way (e.g. parental income, death of a family member, divorce), than siblings who are far from each other in age. Moreover, close in age siblings are more likely to have the same peers. A fixed effect model that would account for characteristics of siblings who are far from each other in age is expected to explain away (comparably) little variation in the outcome.

We address unobserved heterogeneity by estimating the following fixed effect model:

$$Y_{ij} = \beta_{0} + \beta_{1} D_{ij} + \beta_{2} X_{ij} + \varphi_{j} + e_{ij} .$$
(2)

The variables \(Y_{ij}\), \(D_{ij}\), \(X_{ij}\) and \(\varepsilon_{ij}\) are similar to those in Eq. 1. The term \(\varphi_{j}\) represents unobservable family effects common to all siblings (twins) in family \(j\). We estimate five different models in which we use the following specifications of the fixed effects term: school fixed effects (Model 2); class fixed effects (Model 3); sibling fixed effects (Model 4), twin fixed effects (Model 5), and same-gender twin effects (Model 6).

We compare the estimated variable of interest in each model with that of the baseline model. In this way, we determine what proportion of the unconditional correlation can be explained by controlling for observable and unobservable factors. Model 2 absorbs unobserved school fixed effects. Model 3 takes into account unobserved class fixed effects. Model 4 absorbs unobserved (and constant) family effects by controlling for sibling fixed effects.

It is frequently argued that the social environment of non-twin siblings is less similar than the social environment of twins. The timing of birth and the period in which children are raised are different and there can also be parental life-cycle differences, such as parents’ age and socio-economic conditions (see Behrman and Taubman 1976; Behrman et al. 1980). Fraternal (dizygotic) twins share at least 50 % of their genes, similar to non-twin siblings, while identical (monozygotic) twins share 100 % of their genes. Additionally, twins are more likely to grow up in a more similar environment than non-twin siblings (see Kohler et al. 2011). We estimate Model 5 and include a fixed effect for twins (both fraternal and monozygotic), which will capture any unobserved heterogeneity specific to twins. Ideally, we would like to continue in this fashion and estimate a model which includes a fixed effect for identical twins. The economic literature often uses information on identical twins to make inferences about causality (see Holmlund et al. 2011 and references therein). In addition to the higher genetic overlap, identical twins tend to generate their own environments more similarly than fraternal twins (Stenberg 2013). Parents, moreover, treat identical twins more similarly (Borkenau et al. 2002). Therefore, the inclusion of an identical twin fixed effect accounts better for individual-specific effects, in addition to family-specific effects. Our data do not allow identifying identical twins. Nevertheless, we can identify same-gender twins. This increases the share of identical twins in the sample. Therefore, in Model 6, we include a fixed effect for same-gender twins.Footnote 12 It is known that one-third of twins are identical, one-third are fraternal with the same-gender, and one-third are fraternal twins of the opposite gender (e.g. Keith et al. 1995; Torrey et al. 1994). We can deduce that around half of same-gender twins in our sample are expected to be identical.Footnote 13 Therefore, using the same-gender twin effects estimators does not only control for environmental influences common to the sets of twins (e.g. parental education, family environment), but also takes into account genetic similarities between twins. At the same time, we acknowledge that same-gender twins can be different with regards to some features and they are not necessarily confronted with the same social influences (e.g. same-gender twins can have different peers; parents can treat same-gender twins unequally). Failure to account for such differences would lead to a bias in the estimates.

The relationship between criminal involvement and school dropout across subgroups may be different and, therefore, we also estimate the following model:

$$Y_{ij} = \delta_{0} + \delta_{1} D_{ij} + \delta_{2} D_{ij} Z_{ij} + \delta_{3} Z_{ij} + \delta_{4} X_{ij} + \varphi_{j} + e_{ij} .$$
(3)

The variables in Eq. (3) are similar to those included in (2) but we now additionally include interaction terms \(Z_{ij}\) between the criminal involvement indicator variable and either of the following characteristics: juveniles who committed serious offenses, recidivists, juveniles who were sent to a restorative justice program, girls, juveniles who live in urban areas and students in vocational education.

Finally, we are interested in how the relationship between criminal involvement and school dropout is influenced by the age of first criminal involvement. For this purpose, we generate a set of dummy variables that indicates for each age (in years) whether the student became criminally involved for the first time. We include these indicator variables in the estimation model instead of \({\text{D}}_{i}\). Hence, the estimation model becomes:

$$Y_{ij} = \theta_{0} + \sum\limits_{a = 12}^{17} {\tau_{a} } D_{aij} + \theta_{1} X_{ij} + \varphi_{j} + e_{ij} .$$
(4)

Subscript a in Eq. 4 denotes the age when a student was criminally involved for the first time, where a = 12 refers to criminal involvement at age 12 and earlier, and the reference is ‘no criminal involvement’.

Results

Our main estimation results are shown in Table 2. We check whether the change in the criminal involvement coefficient is statistically significantly different between the new model and the previous one, using a seemingly unrelated estimation test (Weesie 1999). This test combines the estimation results under one parameter vector and generates a simultaneous covariance matrix of the robust type. The p-values resulted from this test are presented in the last row in Table 2. The unconditional correlation in the first column of Table 2, which we refer to as Model 0, shows that criminal involvement is associated with a 10.7 percentage points increase in school dropout. This coefficient is statistically significant at the 1 % level. When we include the extensive set of control variables in Model 1, this association equals 8 percentage points, and remains statistically significant at the 1 % level. The results of this model suggest that 25 % of the unconditional correlation is explained by including observed family and individual background characteristics.Footnote 14 The parameter difference for Crime is statistically significant between Model 1 and Model 0. In our further models, we control for school, class, family and twin fixed effects. Because these models aim to address unobserved heterogeneity, conditional on an extensive set of observable characteristics related to family and individual background, we name these analyses cumulative.Footnote 15

Table 2 The relationship between criminal involvement and school dropout (cumulative)

Fixed effect models only effectively use observations that contain variation in the explanatory variable (i.e. criminal involvement) within the set, to identify its effect. This also implies that, the smaller the number of groups with variation in criminal status, the lower the precision of the estimate. The last row in Table 2 presents the number of students for which there is variation in criminal status within a set (i.e. schools, classes, siblings and twins). Model 2 presents the results when we account for school fixed effects. In this model, 534,375 out of 534,432 students are in schools in which there is variation in criminal involvement across students. In Model 3, 458,163 students are in classes in which there is variation in criminal involvement across students. Model 4 shows that out of 129,593 students with siblings in the sample 25,725 have a sibling whose criminal involvement status is different.Footnote 16 Similarly, Model 5 shows that 2159 out of 13,816 twins vary in criminal involvement status from their twin sibling. Finally, the same-gender twin sample (Model 6) contains information on 9278 twins, and 1187 of them vary in criminal involvement status compared to their same gender twin brother or sister.

Model 2 presents the results of the analysis when school fixed effects are taken into account.Footnote 17 The coefficient for criminal involvement is estimated to be 7.9 percentage points and is highly statistically significant. This suggests that unobserved heterogeneity within schools adds little to the previous estimate in explaining away the unconditional correlation between criminal involvement and school dropout (3 %). This can be explained by the fact that populations in secondary schools in the Netherlands are rather homogeneous compared to school populations in other countries, such as the U.S. There is a large choice of schools in the Netherlands and all of them are equally financed by the Dutch government. It also follows from our data that the proportion of criminally involved students centers closely around the overall average of 16 % in the large majority of secondary schools.Footnote 18 The parameter difference for Crime between the current model and the previous model (Model 1) is, although low in magnitude, statistically significant. We obtain similar results when we control for class fixed effects, in Model 3. In particular, the coefficient only reduces to 7.7 percentage points and remains highly statistically significant. The difference in the estimates for criminal involvement between Model 2 and Model 3, although small in the magnitude, is statistically significant.

We further add sibling fixed effects in the analysis to account for family effects (Model 4). There is a statistically significant positive association between criminal involvement and school dropout in this model, equals to 5.1 percentage points. This indicates that an additional 24 % of the unconditional correlation can be attributed to unobservable family specific characteristics. The change in the criminal involvement coefficient from Model 3 to Model 4 is statistically significant. In contrast, the difference in criminal involvement between Model 4 and Model 5, when the twin fixed effects are included, is not statistically significant. The estimated association for the twin subsample is estimated to be 4.3 percentage points and it is statistically significant at the 5 % level. We conclude that 8 % of the unconditional correlation between criminal involvement and school dropout can be attributed to genetic and environmental factors that are twin specific, in addition to variation already explained by controlling for observable characteristics and unobservable family characteristics. Model 6 uses information on same-gender twins, for which the share of identical twins in the sample is higher. The estimated positive association of 3 percentage points is marginally statistically significant. This result implies that an additional 12 % of the unconditional correlation can be attributed to genetic and environmental factors that are specific to same-gender twins.

The estimate after controlling for same-gender twin effects can either represent factors that differ between same-gender twins and affect both crime and education, or the true effect resulting, for example, from the interaction with criminal justice. The difference between Models 4 and 5 represents an increase in the share of identical twins from 33 to 50 %, which leads to a larger average genetic overlap (from approximately 67–75 %), as well as a possibly larger similarity in environmental factors. One could extrapolate from this that a sample of only identical twins would lead to a near-zero estimate. However, these are point estimates with a substantial confidence interval. In fact, the estimates for criminal involvement from Models 5 and 6 are not statistically significantly different from each other. Nevertheless, Model 6 does not account for all possible confounding factors and therefore can still be driven by selection. Overall, the result from this model implies that any possible direct causal effect from criminal involvement to school dropout is likely to be relatively small.

It is difficult to decompose the role of school and class factors from family factors in the estimates presented in Table 2, because observable characteristics included in the model with school (class) fixed effects and the model with family fixed effects (e.g. working status of parents, single-parent household, education of mother, the level of urbanization, household characteristics, see Table 2) can be part of school (class) and family characteristics or can be correlated with these characteristics. Therefore, we analyze how much school, class, family and twin fixed effects explain if they are added conditional only on age, gender and ethnicity. The results of this analysis are presented in Table 3. We call this analysis separate, to contrast the analysis presented in Table 2. Using a different set of controls affects the overall R-squared: it is 8.8 % for Model 1 in Table 2 and only 2.0 % for Model 1 in Table 3.

Table 3 The relationship between criminal involvement and school dropout (separate)

It follows from Model 1 that 4.7 % of the unconditional correlation between early criminal involvement and school dropout [\(\delta\) = 0.107*** (standard error of 0.001)] is explained by controlling for age, gender and ethnicity. Model 2 shows that using the school fixed effects adds relatively little to explaining the unconditional correlation between youth crime and school dropout, i.e. 9.3 % only. Model 3 shows that, using the class fixed effects, explains 13.1 % of the unconditional correlation between youth crime and school dropout. Family characteristics, estimated using the sibling fixed effects, explain a larger part of the unconditional correlation between early criminal involvement and school dropout, namely 49.5 %. The twin fixed effects model shows very similar results compared to the results from the sibling fixed effects model; it explains 51.4 % of the relationship between criminal involvement and school dropout. Finally, controlling for same-gender twin fixed effects explains 79.2 % of the unconditional correlation between criminal involvement and school dropout. Using seemingly unrelated estimation test (Weesie 1999), we check whether the change in the criminal involvement coefficient is statistically significantly different between the new model and the previous one. The results of this test are presented in the last row of Table 3. This test shows that the difference in the parameter Crime between the given model and the previous model is statistically significant in all cases, with the exception of the difference between Model 4 and Model 5.

Heterogeneity in the Relationship Between Early Criminal Involvement and School Dropout

In this section, we estimate to what extent the relation between criminal status and dropout differs for different types of individuals. The estimation results of Eq. 3 are presented in Table 4. Panel A of this table suggests that the association between criminal involvement and school dropout is stronger by 21.3 percentage points (in the same-gender twin sample) if students were involved in severe criminal activities. It is likely that juveniles who committed severe criminal offences are sent to detention (Tak 2003; Van der Laan and Blom 2011). Therefore, the received result is in line with previous studies showing that incarceration is likely to reduce the probability of graduation from high school by 26 percentage points (Hjalmarsson 2008) or 13 percentage points (Aizer and Doyle 2015), depending on the approach used to identify this effect and the reference group. The baseline estimates, representing non-serious crime, are similar to the estimates in our main results, in Table 2 and Table 3, when we do not distinguish between overall and severe criminal involvement. Another interesting finding is that the estimated difference in the effect between severe and overall criminal behavior is rather similar across estimation models (from Model 1 to Model 6). Most importantly, the coefficient of the interaction term does not change much when moving from the model with controls (Model 1) to the model with sibling fixed effects (Model 4). This is sharp contrast with the estimated effect of overall criminal involvement, in particular moving from Models 1–3 to Model 4. Hence, the estimated difference appears not to be driven by observable characteristics or unobserved family fixed effects. The estimates from Model 5 and Model 6 are imprecisely estimated and, therefore, somewhat inconclusive. The consistency in the coefficient between Models 1–4 relies on very small standard errors. Hence, the consistently stronger estimate for severe crime can possibly represent a true treatment effect of crime on school dropout, for example through an interruption in the educational process due to interaction with criminal justice, stigma effects or accumulation of criminal capital.

Table 4 The relationship between criminal involvement and school dropout across different demographic groups

Panel B shows how the association between criminal involvement and school dropout differs between juveniles who were sent to the restorative justice program Halt for a minor criminal offense and students in traditional juvenile justice. Restorative justice is targeted at juveniles who are first-time offenders and those are involved in relatively less serious crimes. The coefficient for the interaction term indicates that the estimated relation between criminal involvement and school dropout is lower for students who went to restorative justice. This coefficient remains similar and statistically significant through all models, with the exception of the same-gender twin fixed effects model. In the last model, the coefficient for the interaction term has a negative sign but it is imprecisely estimated and not statistically significantly different from zero. Panel C compares reoffending juveniles with juveniles who were criminally involved only once. Recidivists are more likely to drop out from school; the interaction term remains statistically significant until the sibling fixed effect model. The estimate size is smaller and not statistically significant in the twin fixed effect models, but the level of precision in these models is relatively low.

The results from panel D show that the associations between crime and school dropout is stronger for boys in the more basic models, but these differences become statistically insignificant in richer models, in particularly in the sibling fixed effects model and in the twin fixed effects model. Results from interactions with an urbanization level dummy (Panel E) suggest that the association between crime and school dropout is (slightly) stronger for juvenile offenders who live in urban areas, but these differences are low and the estimates in models with the sibling and twin fixed effects are statistically insignificant.

In Panel F, we use an interaction term for being criminal involved and being a student in a vocational track. We discard information on criminal involvement from grades 1 and 2 because tracking is often postponed to year 2 and sometimes also to year 3 of secondary education (see Table 1). Accordingly, we have recoded the status of criminal involvement from grade 3 and higher to make sure that criminal involvement did not affect tracking. The estimation results suggest that the relation between criminal involvement and school dropout is (slightly) higher for juveniles in vocational education. The coefficient of the interaction term changes little across the estimated models, although it is not statistically significantly different from zero in the last two columns, because of the relatively high imprecision of these estimates.

We further estimate Eq. 4 to examine whether the timing of first criminal involvement matters for the estimated association between criminal involvement and school dropout. The estimates presented in Table 5 show that the unconditional correlation between criminal involvement at early ages and school dropout is stronger compared to criminal involvement in later adolescence. These estimates partially converge after we control for background characteristics, and sibling and twin fixed effects. The estimate for first criminal involvement at age 17 and later becomes statistically insignificant when we include sibling fixed effects. Additionally, there remains some difference in the estimated effect of crime before age 12 and crime at later ages, also in the sibling fixed effect model. All age-specific estimates are statistically insignificant in the final column, but this is mainly driven by the high imprecision of those results.

Table 5 The relationship between age at first criminal involvement and school dropout

Sensitivity Analysis

In Tables 2 and 3, we observe that the number of observations across Models 2–6 decreases in every step. Therefore, differences in estimates across the estimation models can also reflect differences in sample composition. In this section, we estimate the model without fixed effects, but controlling for the different sibling and twin subsamples. The results presented in Table 6 show that the estimates for all different subsamples are similar. In addition, using interaction terms we find that the effects of criminal involvement on school dropout are not statistically significantly different between students who have a sibling, a twin sibling or do not have any siblings in the sample we use.

Table 6 The relationship between criminal involvement and school dropout across different samples (without using fixed effects models)

As indicated above, criminal involvement is measured on a yearly basis, before the status of school dropout is identified. An alternative way to measure the effect of criminal involvement on school dropout would be to fix the measurement of criminal involvement to a certain age. Because the peak of school dropout in our data is when juveniles are 17 years old, as shown in Fig. 2, we choose age 16 as a threshold of criminal involvement. Table 7 presents estimates of the relationship between criminal involvement before age 16 and school dropout. These estimates are marginally higher than the estimates presented in our main analyses, in Tables 2 and 3. From this test we conclude that the way in which we measure criminal involvement is not likely to substantially affect our results.

Table 7 The relationship between criminal involvement by age 16 and school dropout (cumulative)

A final sensitivity check is related to a possible limitation of sibling and twin fixed effects approaches. In particular, it is possible that siblings and twins can be inclined to differentiate themselves from each other, which would reduce comparability between them. In such a way, sibling and twin fixed effects models might actually lead to overestimated effects. To test this, we remove the fixed effects and include on the right-hand side of the regression a dummy variable which reflects the school dropout status of the sibling or the twin. If there are more than two siblings (twins) in the set, we use the mean value of dropout status of other siblings (twins). These results are presented in Table 8. Compared to the fixed effects results in our main analyses, the coefficients for criminal involvement obtained from this alternative approach are higher. This suggests that a possible overestimation of using sibling or twin fixed effects through the described ‘differentiation’ mechanism is unlikely to operate here.

Table 8 The relationship between criminal involvement and school dropout, without using fixed effects

Conclusions

In this study we have analyzed the relationship between adolescent criminal involvement and school dropout and have examined which factors underlie this relationship. Various models are estimated, in which we control for observable family and individual characteristics, school fixed effects, class fixed effects, sibling fixed effects, twin fixed effects, and same-gender twin fixed effects.

We find that controlling for an extensive set of observable family and individual characteristics, explains around one fourth of the unconditional correlation between criminal involvement and school dropout. Separately controlling for unobserved school characteristics (conditionally on age, gender and ethnicity) in the model with school fixed effects, explains only around 9 % of this association, while including class fixed effects explains around 13 %. Generalization of this finding is limited by the fact that populations in secondary schools in the Netherlands are rather homogeneous and the share of criminally involved individuals in the majority of schools is rather similar. The sibling fixed effects model and the twin fixed effects model explain around half of the association between criminal involvement and school dropout (50 and 51 %, respectively). Controlling for same-gender twin fixed effects explains up to 73 % of the unconditional correlation between criminal involvement and school dropout. We conclude that family-specific characteristics and factors that are common to same-gender twin, but not common to siblings, play an important role in explaining the unconditional correlation between criminal involvement and school dropout, while the role of school-specific characteristics is relatively low. The part of the unconditional correlation that remains unexplained can either represent characteristics that we cannot control for and simultaneously affect crime and education, such as remaining genetic differences in fraternal twins or differences in the generated environments among twins, or represents a true treatment effect of early criminal involvement on school dropout. Having no information on individual characteristics that might be different within the set of twins (e.g. motivation, ability level) makes it difficult to further decompose the effect of criminal involvement on school dropout from the effect of remaining unobserved heterogeneity. Our results suggest that, even if there is a true treatment effect of criminal involvement on school dropout, it is likely to be relatively small. One, however, could also argue that criminal involvement of one twin in the family can affect the educational outcomes of another twin, and therefore it is also possible that the true treatment effect of criminal involvement on school dropout is underestimated in models that use twin fixed effects (see e.g. Sacerdote 2010).

We further find that the association between serious criminal behavior and school dropout is stronger than between overall criminal involvement and school dropout. The heterogeneity analysis shows that the estimate for this difference remains constant across different estimation models, in contrast to the base estimate for general criminal involvement which strongly reduces in the sibling and twin fixed effect models. The estimated difference between overall and severe crime does not appear to be driven by selection. We conclude that it is likely that at least part of the association between severe crime and school dropout represents a causal effect. If the difference between severe crime and overall crime is causal, then the total effect of severe crime would be causal (and positive) as well, unless the base effect of overall crime on dropout is negative, which is very unlikely.

The current study has contributed to the further understanding of the relationship between criminal involvement and educational outcomes. While this evidence is important, it remains unclear which exactly are the mechanisms that are responsible for the negative effect of early criminal involvement on educational attainment. This issue deserves more attention in future studies. We further acknowledge that in the current study we do not measure crimes that remain undiscovered, which are highly prevalent according to information from self-reported surveys (see Van der Laan and Blom 2011). Future studies can combine information on registered and self-reported youth crime to estimate the relationship between such forms of crime and educational outcomes. Further research can also focus on assessing the relationship between criminal involvement and measures of academic performance, such as test scores.