Skip to main content
Log in

Now You See It, Now You Don’t: A Simulation and Illustration of the Importance of Treating Incomplete Data in Estimating Race Effects in Sentencing

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Objectives

Evaluate the impact of missing data on observed racial disparities in the likelihood of an incarceration sentence, given that complete case analysis in the common analytic approach used in criminological research.

Methods

Using a simulation study with data based on cases sentenced in the Court of Common Pleas in Pennsylvania between 2010 and 2019, we assess the differences in the likelihood of incarceration between similarly situated White and Black defendants based on varying sample sizes and patterns of missing data.

Results

Complete case analysis (CCA) of incomplete data can fail to provide unbiased estimates of the race effect, even with less than 10% of cases missing. The degree of bias introduced depends on the amount, pattern, assumptions, and treatment of missing data. Multiple imputation provides an established, valid methodology for the unbiased estimation of race effects when data are missing at random, and this holds across sample sizes and number of imputations.

Conclusions

The existence and magnitude of race effects on the likelihood of an incarceration sentence can vary greatly based on the degree, pattern, assumptions, and treatment of missing data. Limitations include that missing data mechanisms cannot be truly known outside of a data simulation. Future sentencing research should prioritize the identification, treatment, and reporting of missing data prior to isolating race effects, in line with calls from the field for more open science practices. Sensitivity analyses should also be prioritized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. See https://www.sentencingproject.org/publications/racial-impact-statements/

  2. Relatedly, some studies erroneously include observations with “unknown” race in their samples. These observations should be considered missing but are often included in an “other race” category. This issue is discussed further in the conclusions as an important consideration for future research.

  3. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

  4. This estimate was not made with a causal inference procedure so it should not be treated as a valid estimate of the real race effect. Instead, this is only used as a plausible population value for our simulated data set. Within the simulated analyses, we make similarly flawed estimations. In addition, we do not attempt to address the missingness that is present in the Pennsylvania data, which we show can lead to flawed and unreliable estimates with our simulations.

  5. Each additional variable doubles the number of missing data patterns, so a fourth variable would result in 16 patterns with only 7 observed patterns.

  6. Iterations were run in parallel on 15 computation cores with the number of iterations chosen to balance computation time across a wide range of simulation settings and precision of the approximation to the simulated sampling distributions of interest.

References

  • Albonetti CA (1991) An integration of theories to explain judicial discretion. Soc Probl 38:247–266

    Article  Google Scholar 

  • Allison PD (2000) Multiple imputation for missing data: a cautionary tale. Sociol Methods Res 28:301–309. https://doi.org/10.1177/0049124100028003003

    Article  Google Scholar 

  • Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20:40–49. https://doi.org/10.1002/mpr.329

    Article  Google Scholar 

  • Bales WD, Piquero AR (2012) Racial/ethnic differentials in sentencing to incarceration. Justice Q 29:742–773. https://doi.org/10.1080/07418825.2012.659674

    Article  Google Scholar 

  • Bartlett JW, Harel O, Carpenter JR (2015) Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. Am J Epidemiol 182:730–736. https://doi.org/10.1093/aje/kwv114

    Article  Google Scholar 

  • Baumer EP (2013) Reassessing and redirecting research on race and sentencing. Justice Q 30:231–261

    Article  Google Scholar 

  • Belin TR (2009) Missing data: what a little can do, and what researchers can do in response. Am J Ophthalmol 148:820–822

    Article  Google Scholar 

  • Blumer H (1958) Race prejudice as a sense of group position. Pac Sociol Rev 1:3–7

    Article  Google Scholar 

  • Bodner TE (2008) What improves with increased missing data imputations? Struct Equ Model 15:651–675. https://doi.org/10.1080/10705510802339072

    Article  Google Scholar 

  • Brame R, Paternoster R (2003) Missing data problems in criminological research: two case studies. J Quant Criminol 19:55–78

    Article  Google Scholar 

  • Brame R, Bushway SD, Paternoster R, Turner MG (2014) Demographic patterns of cumulative arrest prevalence by ages 18 and 23. Crime Delinq 60:471–486

    Article  Google Scholar 

  • Bushway S, Johnson BD, Slocum LA (2007) Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol 23:151–178

    Article  Google Scholar 

  • Bushway SD, Piehl AM (2001) Judging judicial discretion: legal factors and racial discrimination in sentencing. Law Soc Rev 35:733–764

    Article  Google Scholar 

  • Cassidy M, Rydberg J (2020) Does sentence type and length matter? Interactions of age, race, ethnicity, and gender on jail and prison sentences. Crim Justice Behav 47:61–79

    Article  Google Scholar 

  • Chin JM, Pickett JT, Vazire S, Holcombe AO (2021) Questionable research practices and open science in quantitative criminology. J Quant Criminol. 1–31

  • Couloute L (2018) Nowhere to go: Homelessness among formerly incarcerated people. Prison Policy Initiative. https://www.prisonpolicy.org/reports/housing.html

  • Cro S, Morris TP, Kenward MG, Carpenter JR (2020) Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Stat Med 39:2815–2842. https://doi.org/10.1002/sim.8569

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (methodol) 39:1–38

    Google Scholar 

  • Demuth S, Steffensmeier D (2004) Ethnicity effects on sentence outcomes in large urban courts: comparisons among White, Black, and Hispanic defendants. Soc Sci Q 85:994–1011

    Article  Google Scholar 

  • Desmond M, Lopez Turley RN (2009) The role of familism in explaining the Hispanic-White college application gap. Soc Probl 56:311–334

    Article  Google Scholar 

  • Doerner JK, Demuth S (2010) The independent and joint effects of race/ethnicity, gender, and age on sentencing outcomes in U.S. federal courts. Justice Q 27:1–27

    Article  Google Scholar 

  • Epperson MW, Sawh L, Patel S, Pettus C, Grier A (2023) Examining case dismissal outcomes in prosecutor-led diversion programs. Crim Justice Policy Rev 34:236–260

    Article  Google Scholar 

  • Feldmeyer B, Warren PY, Siennick SE, Neptune M (2015) Racial, ethnic, and immigrant threat: is there a new criminal threat on state sentencing? J Res Crime Delinq 52:62–92

    Article  Google Scholar 

  • Fox JA, Swatt ML (2009) Multiple imputation of the supplementary homicide reports, 1976–2005. J Quant Criminol 25:51–77

    Article  Google Scholar 

  • Franklin TW, Henry TKS (2020) Racial disparities in federal sentencing outcomes: clarifying the role of criminal history. Crime Delinq 66:3–32

    Article  Google Scholar 

  • Freudenberg N, Moseley J, Labriola M, Daniels J, Murrill C (2007) Comparison of health and social characteristics of people leaving New York City jails by age, gender, and race/ethnicity: Implications for public health interventions. Public Health Rep 122:733–743. https://doi.org/10.1177/003335490712200605

  • Gaston S (2019) Producing race disparities: a study of drug arrests across place and race. Criminology 57:424–451

    Article  Google Scholar 

  • Gaebler J, Cai W, Basse G, Shroff R, Goel S, Hill J (2022) A causal framework for observational studies of discrimination. Stat Public Policy 9:26–48

    Article  Google Scholar 

  • Glynn RJ, Laird NM, Rubin DB (1993) Multiple Imputation in mixture models for nonignorable non-response with follow-ups. J Am Stat Assoc 88:984–993. https://doi.org/10.2307/2290790

    Article  Google Scholar 

  • Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576

    Article  Google Scholar 

  • Graham JW, Olchowski AE, Gilreath TD (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevent Sci. 8:206–213. https://doi.org/10.1007/s11121-007-0070-9

    Article  Google Scholar 

  • Green E (1961) Judicial attitudes in sentencing: A study of the factors underlying the sentencing practice of the criminal court of Philadelphia. Macmillan, London

    Google Scholar 

  • Green E (1964) Inter-and intra-racial crime relative to sentencing. J Criminal Law Criminol Police Sci 55:348–358

    Article  Google Scholar 

  • Gruenewald J, Pridemore WA (2012) A comparison of ideologically-motivated homicides from the new extremist crime database and homicides from the supplementary homicide reports using multiple imputation by chained equations to handle missing values. J Quant Criminol 28:141–162

    Article  Google Scholar 

  • Harel O (2007) Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat Methodol 4:75–89. https://doi.org/10.1016/j.stamet.2006.03.002

    Article  Google Scholar 

  • Harel O, Zhou X (2007) Multiple imputation: review of theory, implementation and software. Stat Med 26:3057–3077. https://doi.org/10.1002/sim.2787

    Article  Google Scholar 

  • Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In: Annals of economic and social measurement, NBER, pp 475–492

  • Hepburn JR (1978) Race and the decision to arrest: an analysis of warrants issued. J Res Crime Delinq 15:54–73

    Article  Google Scholar 

  • Hester R, Sevigny EL (2016) Court communities in local context: a multilevel analysis of felony sentencing in South Carolina. J Crime Justice 39:55–74

    Article  Google Scholar 

  • Hickert A, Bushway SD, Harding DJ, Morenoff JD (2022) Prior punishments and cumulative disadvantage: how supervision status impacts prison sentences. Criminology 60:27–59

    Article  Google Scholar 

  • Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85:765–769. https://doi.org/10.2307/2290013

    Article  Google Scholar 

  • Johnson EH (1957) Selective factors in capital punishment. Soc Forces. 165–169

  • Johnson BD, Lee JG (2013) Racial disparity under sentencing guidelines: a survey of recent research and emerging perspectives. Sociol Compass 7:503–514

    Article  Google Scholar 

  • Johnson O, Omori M, Petersen N (2023) Racial-ethnic disparities in police and prosecutorial drug charging: analyzing organizational overlap in charging patterns at arrest, filing, and conviction. J Res Crime Delinq 60:255–299

    Article  Google Scholar 

  • Jordan KL, Freiburger TL (2010) Examining the impact of race and ethnicity on the sentencing of juveniles in the adult court. Crim Justice Policy Rev 21:185–201

    Article  Google Scholar 

  • Jordan KL, McNeal BA (2016) Juvenile penalty or leniency: Sentencing of juveniles in the criminal justice system. Law Hum Behav 40:387–400

    Article  Google Scholar 

  • Kadane JB, Lamberth J (2009) Are blacks egregious speeding violators at extraordinary rates in New Jersey? Law, Probability Risk 8:139–152

    Article  Google Scholar 

  • King RD, Light MT (2019) Have racial and ethnic disparities in sentencing declined? Crime Justice 48:365–437

    Article  Google Scholar 

  • Knox D, Lowe W, Mummolo J (2020) Administrative records mask racially biased policing. Am Polit Sci Rev 114:619–637

    Article  Google Scholar 

  • Kurlychek MC, Johnson BD (2019) Cumulative disadvantage in the American criminal justice system. Ann Rev Criminol 2:291–319

    Article  Google Scholar 

  • Kutateladze BL, Andiloro NR, Johnson BD (2016) Opening Pandora’s box: how does defendant race influence plea bargaining? Justice Q 33:398–426

    Article  Google Scholar 

  • Kutateladze BL, Andiloro NR, Johnson BD, Spohn CC (2014) Cumulative disadvantage: examining racial and ethnic disparity in prosecution and sentencing. Criminology 52:514–551

    Article  Google Scholar 

  • Laniyonu A, Donahue ST (2023) Effect of racial misclassification in police data on estimates of racial disparities. Criminology 61:295–315. https://onlinelibrary.wiley.com/doi/abs/10.1111/1745-9125.12329

  • Light MT (2022) The declining significance of race in criminal sentencing: evidence from U.S. federal courts. Soc Forces 100:1110–1141

    Article  Google Scholar 

  • Little RJA (1993) Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 88:125–134. https://doi.org/10.2307/2290705

    Article  Google Scholar 

  • Little RJA, Wang Y (1996) Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52:98–111. https://doi.org/10.2307/2533148

    Article  Google Scholar 

  • Lynch M (2019) Focally concerned about focal concerns: a conceptual and methodological critique of sentencing disparities research. Justice Q 36:1148–1175

    Article  Google Scholar 

  • MacDonald JM, Donnelly EA (2019) Evaluating the role of race in sentencing: an entropy weighting analysis. Justice Q 36:656–681

    Article  Google Scholar 

  • McCormack PD, Clarke K, Walfield S, Spina F (2023) The (mis) measure of race and ethnicity in crime data. J Ethn Crim Justice 1–23. https://doi.org/10.1080/15377938.2023.2241404

  • Manski CF (2003) Partial identification of probability distributions, vol 5. Springer, New York

    Google Scholar 

  • Martinez BP, Petersen N, Omori M (2020) Time, money, and punishment: Institutional racial-ethnic inequalities in pretrial detention and case outcomes. Crime Delinq 66:837–863

    Article  Google Scholar 

  • Mears DP, Brown JM, Cochran JC, Siennick SE (2021) Extended solitary confinement for managing prison systems: placement disparities and their implications. Justice Q 38:1492–1518

    Article  Google Scholar 

  • Metcalfe C, Chiricos T (2018) Race, plea, and charge reduction: an assessment of racial disparities in the plea process. Justice Q 35:223–253

    Article  Google Scholar 

  • Mitchell O (2005) A meta-analysis of race and sentencing research: explaining the inconsistencies. J Quant Criminol 21:439–466

    Article  Google Scholar 

  • Mitchell O, Yan S, Mora DO (2022) Trends in prison sentences and racial disparities: 20-years of sentencing under Florida’s Criminal Punishment Code. J Res Crime Delinquency. 1–39

  • Omori M, Johnson O (2019) Racial inequality in punishment. In: Oxford research encyclopedia of criminology and criminal justice

  • Omori M, Petersen N (2020) Institutionalizing inequality in the courts: decomposing racial and ethnic disparities in detention, conviction, and sentencing. Criminology 58:678–713

    Article  Google Scholar 

  • Owusu-Bempah A, Luscombe A (2021) Race, cannabis and the Canadian war on drugs: an examination of cannabis arrest data by race in five cities. Int J Drug Policy 91:102937

    Article  Google Scholar 

  • Painter-Davis N, Ulmer JT (2020) Discretion and disparity under sentencing guidelines revisited: the interrelationship between structured sentencing alternatives and guideline decision-making. J Res Crime Delinq 57:263–293

    Article  Google Scholar 

  • Perkins NJ, Cole SR, Harel O (2018) Principled approaches to missing data in epidemiologic studies. Am J Epidemiol 187:568–575. https://doi.org/10.1093/aje/kwx348

    Article  Google Scholar 

  • Pratt TC (1998) Race and sentencing: a meta-analysis of conflicting empirical research results. J Crim Just 26:513–523

    Article  Google Scholar 

  • Raghunathan TE, Lepkowski JM, Hoewyk JV, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Statistics Canada 27:85–95

    Google Scholar 

  • Ramos J (2023) Immigration in an era of mass reentry: does immigrant concentration guard against serious recidivism? J Res Crime Delinq 60:213–254

    Article  Google Scholar 

  • Rehavi MM, Starr SB (2014) Racial disparity in federal criminal sentences. J Polit Econ 122:1320–1354

    Article  Google Scholar 

  • Reitz KR, Klingele CM (2019) Model penal code: sentencing—workable limits on mass punishment. Crime Justice 48:255–311

    Article  Google Scholar 

  • Ridgeway G, Moyer RA, Bushway SD (2020) Sentencing scorecards: reducing racial disparities in prison sentences at their source. Criminol Public Policy 19:1113–1138

    Article  Google Scholar 

  • Roberts JM Jr, Roberts A, Wadsworth T (2018) Multiple imputation for missing values in homicide incident data: an evaluation using unique test data. Homicide Stud 22:391–409

    Article  Google Scholar 

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  Google Scholar 

  • Schafer JL (1999) Multiple imputation: a primer. Stat Methods Med Res 8:3–15. https://doi.org/10.1177/096228029900800102

    Article  Google Scholar 

  • Schafer JL (2003) Multiple imputation in multivariate problems when the imputation and analysis models differ. Stat Neerl 57:19–35. https://doi.org/10.1111/1467-9574.00218

    Article  Google Scholar 

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177

    Article  Google Scholar 

  • Schlesinger T (2008) The cumulative effects of racial disparities in criminal processing. Advocate 13:22–34

    Google Scholar 

  • Siddique J, Harel O, Crespi CM (2012) Addressing missing data mechanism uncertainty using multiple-model multiple imputation: application to a longitudinal clinical trial. Ann Appl Stat 6:1814–1837. https://doi.org/10.1214/12-AOAS555

    Article  Google Scholar 

  • Sidi Y, Harel O (2018) The treatment of incomplete data: reporting, analysis, reproducibility, and replicability. Soc Sci Med 209:169–173

    Article  Google Scholar 

  • Skeem JL, Lowenkamp CT (2016) Risk, race, and recidivism: predictive bias and disparate impact. Criminology 54:680–712

    Article  Google Scholar 

  • Skeem, J. L., Montoya, L., & Lowenkamp, C. (2023). Place matters: racial disparities in pretrial detention recommendations across the US. Available at SSRN 4354698.

  • Snowball L, Weatherburn D (2007) Does racial bias in sentencing contribute to Indigenous overrepresentation in prison? Aust N Z J Criminol 40:272–290

    Article  Google Scholar 

  • Spohn C (2000) Thirty years of sentencing reform: The quest for a racially neutral sentencing process. In: Horney J (ed) Policies, processes, and decisions of the criminal justice system. National Institute of Justice, Washington, DC, pp 427–501

    Google Scholar 

  • Spohn C (2015) Evolution of sentencing research. Criminol Public Policy 14:225–232

    Article  Google Scholar 

  • Spohn C, Holleran D (2000) The imprisonment penalty paid by young, unemployed black and Hispanic male offenders. Criminology 38:281–306

    Article  Google Scholar 

  • Spohn C, StGeorge S. (2022). Women lifers: what the United States Sentencing Commission data tell us about women eligible for and sentenced to life without parole. Victims Offenders. 1–21

  • Steffensmeier D, Demuth S (2000) Ethnicity and sentencing outcomes in US federal courts: who is punished more harshly? Am Sociol Rev 65:705–729

    Article  Google Scholar 

  • Steffensmeier D, Ulmer J, Kramer J (1998) The interaction of race, gender, and age in criminal sentencing: the punishment cost of being young, black, and male. Criminology 36:763–798

    Article  Google Scholar 

  • Steinmetz KF, Henderson H (2016) Inequality on probation: an examination of differential probation outcomes. J Ethnicity Criminal Justice 14:1–20

    Article  Google Scholar 

  • Stevens T, Morash M (2015) Racial/ethnic disparities in boys’ probability of arrest and court actions in 1980 and 2000: the disproportionate impact of “getting tough” on crime. Youth Violence Juvenile Justice 13:77–95

    Article  Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–540. https://doi.org/10.2307/2289457

    Article  Google Scholar 

  • Ulmer JT (2012) Recent developments and new directions in sentencing research. Justice Q 29:1–40

    Article  Google Scholar 

  • Ulmer JT, Johnson B (2004) Sentencing in context: a multilevel analysis. Criminology 42:137–178

    Article  Google Scholar 

  • Ulmer JT, Light M, Kramer J, Eisenstein J (2011) Does increased judicial discretion lead to increased disparity? The “liberation” of judicial sentencing discretion in the wake of the Booker/Fanfan decision. Justice Q 28:799–837

    Article  Google Scholar 

  • Vach W, Schumacher M (1993) Logistic regression with incompletely observed categorical covariates: a comparison of three approaches. Biometrika 80:353–362. https://doi.org/10.2307/2337205

    Article  Google Scholar 

  • van Buuren S, Boshuizen HC, Knook DL (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18:681–694

    Article  Google Scholar 

  • van der Heijden GJ, Donders AR, Stijnen T, Moons KG (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59:1102–1109. https://doi.org/10.1016/j.jclinepi.2006.01.015

    Article  Google Scholar 

  • von Hippel PT (2020) How many imputations do you need? A two-stage calculation using a quadratic rule. Sociol Methods Res 49:699–718. https://doi.org/10.1177/0049124117747303

    Article  Google Scholar 

  • White IR, Carlin JB (2010) Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 29:2920–2931. https://doi.org/10.1002/sim.3944

    Article  Google Scholar 

  • White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30:377–399. https://doi.org/10.1002/sim.4067

    Article  Google Scholar 

  • Wooldredge J (2012) Distinguishing race effects on pretrial release and sentencing decisions. Justice Q 29:41–75

    Article  Google Scholar 

  • Wooldredge J, Frank J, Goulette N, Travis L III (2015) Is the impact of cumulative disadvantage on sentencing greater for Black defendants? Criminol Public Policy 14:187–223

    Article  Google Scholar 

  • Wu J (2016) Racial/ethnic discrimination and prosecution: a meta-analysis. Crim Justice Behav 43:437–458

    Article  Google Scholar 

  • Zane SN, Welsh BC, Mears DP, Zimmerman GM (2022) Pathways through juvenile justice: a system-level assessment of cumulative disadvantage in the processing of juvenile offenders. J Quant Criminol 38:483–514

    Article  Google Scholar 

  • Zatz MS, Rodriguez N (2006) Conceptualizing race and ethnicity in studies of crime and criminal justice. In: Peterson R, Hagan J, Krivo L (eds) The many colors of crime: Inequalities of race, ethnicity and crime in America, NYU Press, pp 39–53

Download references

Acknowledgements

The authors would like to thank the Pennsylvania Commission on Sentencing for providing the data that were used to construct the simulated data set. The authors would also like to thank the anonymous reviewers for their helpful feedback on earlier versions of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Clare Strange.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Simulated Data Set Construction

Categorical and Ordinal Variables

Crime type (\(CrimeType\)): Sampled from

\(X\)

Persons

Property (ref.)

Drug

DUI

Other

\(P\left( {CrimeType = x} \right)\)

0.157

0.254

0.228

0.238

0.123

Case disposition (\(CaseDisp\)): sampled from

\(X\)

Yes

Plea (ref.)

\(P\left( {CaseDisp = x} \right)\)

0.978

0.022

Offense gravity score (\(OGS\)): Sampled from 1,2,…,14 with the probability distribution

\(X\)

1

2

3

4

5

6

7

\(P\left( {OGS = x} \right)\)

0.295

0.099

0.264

0.031

0.156

0.048

0.036

 

8

9

10

11

12

13

14

 

0.020

0.015

0.021

0.007

0.003

0.001

 < 0.001

Prior record score (\(PRS\)): Sampled from

\(X\)

None (ref.)

1/2/3

4/5

REVOC/RFEL

\(P\left( {PRS = x} \right)\)

0.486

0.303

0.182

0.030

Recommended minimum (\(RecMin\)): sampled from

\(X\)

Yes

No (ref.)

\(P\left( {RecMin = x} \right)\)

0.670

0.30

Sex (\(Sex\)): Sampled from

\(X\)

Male (ref.)

Female

\(P\left( {Sex = x} \right)\)

0.774

0.226

Race (\(Race\)): sampled from

\(X\)

White (ref.)

Black

Latino

Other

\(P\left( {Race = x} \right)\)

0.717

0.268

0.009

0.006

County (\(County\)): sampled from indexes 1,2,…,67

\(X\) P (County = x)

Adams (Ref.) 0.0100

Allegheny 0.1177

Armstrong 0.0047

Beaver 0.0141

Bedford 0.0044

 

Berks 0.0318

Blair 0.0148

Bradford 0.0052

Bucks 0.0474

Butler 0.0148

 

Cambria 0.0116

Cameron 0.0004

Carbon 0.0088

Centre 0.0099

Chester 0.0290

 

Clarion 0.0032

Clearfield 0.0036

Clinton 0.0039

Columbia 0.0038

Crawford 0.0081

 

Cumberland 0.0201

Dauphin 0.0302

Delaware 0.0644

Elk 0.0022

Erie 0.0203

 

Fayette 0.0166

Forest 0.0005

Franklin 0.0173

Fulton 0.0016

Greene 0.0025

 

Huntington 0.0038

Indiana 0.0070

Jefferson 0.0037

Juniata 0.0019

Lackawanna 0.0163

 

Lancaster 0.0266

Lawrence 0.0059

Lebanon 0.0116

Lehigh 0.0297

Luzerne 0.0239

 

Lycoming 0.0152

McKean 0.0041

Mercer 0.0099

Mifflin 0.0047

Monroe 0.0124

 

Montgomery 0.0636

Montour 0.0012

Northampton 0.0251

Northumberland 0.0078

Perry 0.0040

 

Philadelphia 0.0567

Pike 0.0043

Potter 0.0016

Schuylkill 0.0123

Snyder 0.0035

 

Somerset 0.0047

Sullivan 0.0004

Susquehanna 0.0024

Tioga 0.0025

Union 0.0028

 

Venango 0.0053

Warren 0.0031

Washington 0.0098

Wayne 0.0031

Westmoreland 0.0317

 

Wyoming 0.0029

York 0.0517

   

Year (\(Year\)): sampled from

X

2010

(Ref.)

2011

2012

2013

2014

2015

2016

2017

2018

2019

P (Y ear = x)

0.103

 

0.097

0.100

0.105

0.104

0.097

0.101

0.097

0.096

0.100

Quantitative Variable: Let \(A_{i} = Age_{i} - min\left( {Age} \right)\) for \({\text{i }} = { }1, \ldots ,{\text{N}}\).

Draw \(\overline{{A_{i} }} \sim Gamma\left( {\hat{a},\hat{b}} \right)\) where \(\hat{a} = \frac{N - 1}{N}\frac{{\overline{{A^{2} }} }}{{\widehat{Var}\left( A \right)}}\) and \(\hat{b} = \frac{N}{N - 1}\frac{{\widehat{Var}\left( A \right)}}{{\overline{A}}}\) is the scale parameter with \(\overline{A} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} A_{i}\) and \(\widehat{Var}\left( A \right) = \frac{1}{N - 1}\mathop \sum \limits_{i = 1}^{N} \left( {A_{i} - \overline{A}} \right)^{2}\). The simulated age for individual \(i\) is then \(\overline{{Age_{i} }} = \overline{{A_{i} }} + min\left( {Age} \right)\). In the PCS data set, \(\overline{A} = 22.57\) and \(\widehat{Var}\left( A \right) = 132.434\).

Incarceration Outcome Variable: We used the estimated coefficients from the fitted logistic regression model (Eq. 1) using the PCS data as the “true” parameter values for the simulated population. We then sample sentencing decisions using the same logistic regression model evaluated on the simulated independent and control variables.

Missingness Model Parameters

The multinomial model uses two variables to determine the probability of an observation being missing: incarceration and whether the defendant’s race is Black. Two interaction variables were created for non-incarceration and race being Black and being incarcerated and race being Black. The intercept of the model is set for each of the eight missing data patterns to reflect the observed probability of that pattern in the Pennsylvania data. The parameters of the model vary by pattern and to induce over- or under-estimation of the race effect in the complete case analysis. See Table 1 for a list of the missing data patterns in the PA data.

MNAR Missingness

The missingness model assigns slightly more missingness than we saw in the administrative data while still having a relatively low number of incomplete cases.

$$P\left( {PatternrforSubjecti} \right) = \frac{{exp(\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} + \beta_{4r} Z_{2i} )}}{{1 + \mathop \sum \nolimits_{r = 1}^{8} exp\left( {\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} + \beta_{4r} Z_{2i} } \right)}}$$
(5)

To over-estimate the race effect, missingness in race is conditional on race being Black and incarceration so we set the parameters of the model to be \(exp\left( \beta \right) = \left( {10, 0.1, 0.001, 100} \right){\prime} \in R^{4}\). The first element in \(\beta\) emphasizes missingness based on incarceration, the second de-emphasizes missingness based on being Black while the third element corresponding to \(Z_{1}\) greatly de-emphasizes missingness for defendants who aren’t incarcerated and are Black and the fourth corresponding to \(Z_{2}\) greatly emphasizes missingness for individuals who are incarcerated and are Black.

$$P\left( {PatternrforSubjecti} \right) = \frac{{exp\left( {\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} } \right)}}{{1 + \mathop \sum \nolimits_{r = 1}^{8} exp(\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} )}}$$
(6)

To under-estimate the effect, missingness in race is again conditional on race being Black and incarceration, so we set the parameters of the model to be \(exp\left( \beta \right) = \left( {5,5, 10} \right){\prime} \in R^{3}\). The first element in \(\beta\) emphasizes missingness based on incarceration, the second element emphasizes missingness based on being Black while the third element greatly emphasizes missingness for defendants who aren’t incarcerated and are Black corresponding to \(Z_{1}\).

MAR Missingness

We used similar parameter settings for the MAR simulations as we did for the MNAR, but now missingness based on race is only for patterns 1,3,5, and 7 which are no missing variables, missing in age, missing in recommended minimum, and missing in both age and recommended minimum. This maintains the MAR assumption since the missingness is not contingent on race for the patterns where race is missing. However, since we are performing the analyses with complete case analysis, we are still systematically excluding data from our analysis in such a way that will dramatically bias the results based on the values of race and incarceration.

Slightly different parameter values are used for patterns 3 and 5 compared to patterns 1 and 7. See Table 1 for a list of the missing data patterns.

$$P\left( {PatternrforSubjecti} \right) = \frac{{exp(\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} )}}{{1 + \mathop \sum \nolimits_{r = 1}^{8} exp\left( {\alpha_{r} + \beta_{1r} INCAR_{i} + \beta_{2r} Black_{i} + \beta_{3r} Z_{1i} } \right)}}$$
(7)

To get an over-estimated race effect estimate with CCA, we de-emphasize missingness based on \(Z_{1}\) for patterns where race is observed.

$$P\left( {PatternrforSubjecti} \right) = \frac{{exp(\alpha_{r} + \beta_{1r} Z_{1i} + \beta_{2r} Z_{2i} )}}{{1 + \mathop \sum \nolimits_{r = 1}^{8} exp(\alpha_{r} + \beta_{1r} Z_{1i} + \beta_{2r} Z_{2i} )}}$$
(8)

To under-estimate the effect with CCA, we emphasize \(Z_{1}\) and de-emphasize \(Z_{2}\) for patterns where race is observed (Tables 3 , 4 , 5 and 6 ).

Table 3 The parameters for the missingness model for CCA over-estimates with MNAR missing values
Table 4 The parameters for the missingness model for CCA over-estimates with MNAR missing values
Table 5 The parameters for the missingness model for CCA over-estimates with MAR missing values
Table 6 The parameters for the missingness model for CCA under-estimates with MAR missing values

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stockton, B., Strange, C.C. & Harel, O. Now You See It, Now You Don’t: A Simulation and Illustration of the Importance of Treating Incomplete Data in Estimating Race Effects in Sentencing. J Quant Criminol (2023). https://doi.org/10.1007/s10940-023-09577-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10940-023-09577-w

Keywords

Navigation