Skip to main content
Log in

Non-random Study Attrition: Assessing Correction Techniques and the Magnitude of Bias in a Longitudinal Study of Reentry from Prison

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Objectives

Longitudinal data offer many advantages to criminological research yet suffer from attrition, namely in the form of sample selection bias. Attrition may undermine reaching valid inferences by introducing systematic differences between the retained and attrited samples. We explored (1) if attrition biases correlates of recidivism, (2) the magnitude of bias, and (3) how well methods of correction account for such bias.

Methods

Using data from the LoneStar Project, a representative longitudinal sample of reentering men in Texas, we examined correlates of recidivism using official measures of recidivism under four sample conditions: full sample, listwise deleted sample, multiply imputed sample, and two-stage corrected sample. We compare and contrast the results regressing rearrest on a range of covariates derived from a pre-release baseline interview across the four sample conditions.

Results

Attrition bias was present in 44% of variables and null hypothesis significance tests differed for the correlates of recidivism in the full and retained samples. The bias was substantial, altering effect sizes for recidivism by a factor as large as 1.6. Neither the Heckman correction nor multiple imputation adequately corrected for bias. Instead, results from listwise deletion most closely mirrored the results of the full sample with 89% concordance.

Conclusions

It is vital that researchers examine attrition-based selection bias and recognize the implications it has on their data when generating evidence of theoretical, policy, or practical significance. We outline best practices for examining the magnitude of attrition and analyzing longitudinal data affected by sample selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Other approaches include Manski bounds, Cosslett’s selection model, Newey’s series estimator, Powell’s two-step semiparametric estimator, Robinson’s estimator, along with other statistical methods such as xtARGLS, support vector regression, cluster-based estimating, and kernel mean matching (Winship and Mare 1992). Many of these models with limited dependent variables are estimated using LIMDEP’s statistical software package.

  2. There are circumstances, however, in which reincarceration may actually make someone easier to locate and study as they are in custody (Fahmy et al. 2019). But, overall, it is more difficult to make inferences about this group than those in the general population.

  3. In some cases, researchers rely on propensity score matching (PSM) to correct for bias in their samples (Dehejia and Wahba 2002), though this method is not relevant here and is beyond the scope of the paper. PSM matches subjects based on relevant variables and should reduce the potential bias due to confounding variables (Rosenbaum and Rubin 1983). Although some researchers suggest that PSM holds promise over a listwise deleted sample (Dehejia and Wahba 1998; Lennox et al. 2012), the technique requires a strict set of assumptions that are difficult to meet in many cases (Dehejia 2005), it may not be appropriately estimated, and does not provide a universal correction for selection (Dehejia 2005; Smith and Todd 2005). For instance, PSM’s conditional independence assumption is predicated on the idea that assignment in the treatment group (e.g., retained versus attrited) is based on relevant observed characteristics (Campbell et al. 2020; Tucker 2010). Because of this assumption, however, PSM cannot account for the hidden bias created by unobserved characteristics (Tucker 2010; Wolfolds and Siegel 2019). When the goal is to control for endogeneity that arises from unobservable characteristics (such as in our case), the Heckman selection model is superior (Lennox et al. 2012).

  4. Although the Heckman two-step correction can be completed in two stages, the preferred method involves a Full Information Maximum Likelihood (FIML) model which estimates the equations simultaneously to reduce model error.

  5. We used the following search terms in a Boolean fashion: Heckman AND (“two-step” OR “two-stage” OR “Two-step correction” OR “Two-stage correction” OR “two step” OR “ two stage” OR “Two step correction” OR “Two stage correction” OR “selection” OR “correction” OR “Mills ratio” OR “Mill's ratio” OR “rho”) NOT "two-stage least squares" and searched the following journals: Criminology, Criminal Justice and Behavior, Journal of Developmental and Life-Course Criminology, Journal of Experimental Criminology, Justice Quarterly, Journal of Quantitative Criminology, and Journal of Research in Crime and Delinquency. The initial search resulted in 67 articles. After careful review of the articles for relevance and the use of a Heckman correction, 46 articles were removed, resulting in a final sample of 21 articles.

  6. Although we understand that Little’s (1988) Missing Completely at Random (MCAR) test does not consider unobservable data and cannot determine whether missing data is truly MCAR, we ran the test to help justify our decision to listwise delete missing data. The test statistic was non-significant (p = 1.00), which provides support for our decision. Additionally, only 1.37% of cells were missing, which gives us confidence that that the missing data were sparse enough to utilize listwise deletion. For variables that were missing more than one response, regressions were estimated with a dummy variable adjustment. Coefficients were compared between the mean/mode replaced models and the listwise deleted models and no statistically significant differences existed.

  7. In response to a reviewer’s comment, we have run additional analyses and created a table explicating the use of our exclusion restrictions from the LoneStar Project’s metadata. We closely examined the findings of linear probability models comparing our exclusion restrictions’ ability to predict wave 3 retention versus rearrest from the Clark et al. (2020) paper and are confident that our exclusion restrictions are appropriately justified (see Appendix 2).

  8. We also created a continuous measure, arrest count, representing the number of times a respondent was arrested after release from prison. This estimate was necessary for one modeling strategy—the TPM.

  9. Although probit models are statistically available for the analysis of a binary outcome, in order for us to compare coefficients across models, as suggested by a reviewer, LPMs were the most appropriate.

  10. Consistent with traditional Heckman models, we attempted to use a continuous measure of arrest. Due to the overdispersion of zeros indicating no arrests in our data, this measure was not normally distributed. A heckpoisson command exists in Stata, but our data did not meet poisson distribution assumptions. Given the challenges with normality, a heckprobit was also assessed in our analyses; however, we ultimately decided to run a FIML LPM Heckman model with a binary outcome in order to compare coefficients across models. Therefore, we estimate a FIML LPM Heckman in order to compare equality of coefficients as well as a heckprobit to maintain the integrity of the Heckman correction using a binary outcome. As demonstrated by Tables 4 and 5, statistically significant coefficients did not vary between Heckman models.

  11. It is possible that this variation is due to the loss of analytical power between the full and retained sample. It becomes more difficult to detect statistically significant differences when a sample changes from 791 to 506 people.

  12. These estimates were calculated by first coding each variable to determine how many of the two coefficients across each model were significant and noting the agreement between those coefficients. For each variable across models, if both coefficients were significant or if both coefficients were non-significant, it was coded as 2 (i.e., “agreement”). If one coefficient was significant and the other was non-significant, it was coded as 0 (i.e., “disagreement”). This step is conducted at the variable level, so each variable within the model had an agreement estimate, which were later summed. Step two requires calculating the percent agreement between the models which involved summing agreement/disagreement estimates from each variable and dividing them by the total number of coefficients across models. That equation was \(Model Agreement=\frac{Sum\,of\,Agreement}{2\times Number\,of\,Variables}\times100\).

  13. We are aware that coefficient comparisons, similar to Paternoster et al. (1998), are designed for and assume independent samples. That is not the case for our data. However, this was the most viable way of comparing coefficients across models since we are not able to estimate seemingly unrelated estimation (SUR). SUR is implausible because equations have to be balanced in terms of number of observations, the sureg command in Stata does not allow for conditional statements (such as weights) or use of the same dependent variable, and the SUR method is used to model parameters of all equations simultaneously; thus, we have no way of fitting different model types. Due to these limitations, please use caution when interpreting the findings.

  14. Stolzenberg and Relles (1997) require a continuous outcome variable. Although this may prohibit its use for some research questions, we encourage researchers to move away from binary outcomes which limit the variation and restrain the social world to a binary.

References

Download references

Acknowledgements

Funding was provided by National Institute of Justice (Grant No. 2014-MU-CX-0111).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meghan M. Mitchell.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

See Table 6.

Table 6 Systematic review of Heckman correction from 2006 to 2019

Appendix 2

See Table 7.

Table 7 LPMs comparing our exclusion restrictions’ ability to predict wave 3 retention versus rearrest (N = 791)

Appendix 3

See Table 8.

Table 8 Uncorrected two-part regression model predicting arrest count

Appendix 4: Description of Constructs and Coding

Prison visits (mean score of family members and friends visiting); response options: (Never, A few times, Monthly, Bi-weekly, Weekly, Daily).

  • Lead in question: In the last six months, did you have visitation privileges? (No, Yes)

  • [IF YES] Please tell me if you have been visited by the following people:

    • Any family members?

    • Any friends who are not gang members?

Delinquent peers (mean score); response options: (No, Yes).

… Think about your closest friends before being incarcerated. A close friend is someone you could call in an emergency, someone you can trust.

Have any of those friends ever…

  • been arrested?

  • been convicted of a crime?

  • been in a correctional facility, such as a jail, prison, or juvenile correctional facility?

  • had problems with drugs or alcohol?

  • Are any of those friends currently in a correctional facility?

Procedural justice (mean score); response options: (Always, Most of the time, Sometimes, Never); alpha = 0.89.

How often do police officers…

  • give people a chance to tell their side of the story before they make decisions?

  • treat people fairly?

  • respect people’s rights?

  • make decisions that are good for everyone in the community?

  • clearly explain the reasons for their actions and decisions?

  • treat people with dignity and respect?

  • try to do what is best for the people they are dealing with?

Low self-control (mean score); response options: (1 = not at all like you, 2 = A little bit like you, 3 = Somewhat like you, 4 = More so like you, 5 = very much like you); alpha = 0.80.

  • You are good at resisting temptation. (reverse coded)

  • You have a hard time breaking bad habits.

  • You say inappropriate things.

  • You do certain things that are bad for you if they are fun.

  • You refuse things that are bad for you. (reverse coded)

  • Pleasure and fun sometimes keeps you from getting work done.

  • You have trouble concentrating.

  • You are able to work effectively toward long-term goals. (reverse coded)

  • Sometimes you can’t stop yourself from doing something, even if you know it is wrong.

  • You often act without thinking through all the alternatives.

  • You have iron self-discipline. (reverse coded)

Stress (mean score); response options: (All of the time, Most of the time, Sometimes, None of the time); alpha = 0.64.

In the past month, how often have you felt…

  • that you were unable to control the important things in your life?

  • confident about your ability to handle your personal problems? (reverse coded)

  • that things were going your way? (reverse coded)

  • difficulties were piling up so high that you could not overcome them?

  • worried or stressed about your upcoming reentry to the community?

Social support (mean score); response options: (Strongly agree, Agree, Disagree, Strongly disagree); alpha = 0.95.

You have someone in your family who…

  • is willing to help you make decisions.

  • really tries to help you.

  • can give you the emotional help and support you need.

  • provide help or advice on finding a place to live.

  • provide help or advice on finding a job.

  • provide support for dealing with a substance abuse problem.

  • provide transportation to work or other appointments if needed.

  • provide financial support.

Social capital (mean score); response options: (Strongly agree, Agree, Disagree, Strongly disagree); alpha = 0.74.

  • You can do just about anything you really set your mind to.

  • You often feel helpless dealing with the problems of life. (reverse coded)

  • You have little control over the things that happen to you. (reverse coded)

  • There is really no way you can solve some of the problems you have. (reverse coded)

  • What happens to you in the future mostly depends on you.

  • You are tired of the problems caused by the crimes you committed.

  • You want to get your life straightened out.

  • You think you will be able to stop committing crimes when released.

  • You will give up friends and hangouts that get you into trouble.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mitchell, M.M., Fahmy, C., Clark, K.J. et al. Non-random Study Attrition: Assessing Correction Techniques and the Magnitude of Bias in a Longitudinal Study of Reentry from Prison. J Quant Criminol 38, 755–790 (2022). https://doi.org/10.1007/s10940-021-09516-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-021-09516-7

Keywords

Navigation