Abstract
Objectives
Evaluate the impact of missing data on observed racial disparities in the likelihood of an incarceration sentence, given that complete case analysis in the common analytic approach used in criminological research.
Methods
Using a simulation study with data based on cases sentenced in the Court of Common Pleas in Pennsylvania between 2010 and 2019, we assess the differences in the likelihood of incarceration between similarly situated White and Black defendants based on varying sample sizes and patterns of missing data.
Results
Complete case analysis (CCA) of incomplete data can fail to provide unbiased estimates of the race effect, even with less than 10% of cases missing. The degree of bias introduced depends on the amount, pattern, assumptions, and treatment of missing data. Multiple imputation provides an established, valid methodology for the unbiased estimation of race effects when data are missing at random, and this holds across sample sizes and number of imputations.
Conclusions
The existence and magnitude of race effects on the likelihood of an incarceration sentence can vary greatly based on the degree, pattern, assumptions, and treatment of missing data. Limitations include that missing data mechanisms cannot be truly known outside of a data simulation. Future sentencing research should prioritize the identification, treatment, and reporting of missing data prior to isolating race effects, in line with calls from the field for more open science practices. Sensitivity analyses should also be prioritized.
Similar content being viewed by others
Notes
Relatedly, some studies erroneously include observations with “unknown” race in their samples. These observations should be considered missing but are often included in an “other race” category. This issue is discussed further in the conclusions as an important consideration for future research.
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
This estimate was not made with a causal inference procedure so it should not be treated as a valid estimate of the real race effect. Instead, this is only used as a plausible population value for our simulated data set. Within the simulated analyses, we make similarly flawed estimations. In addition, we do not attempt to address the missingness that is present in the Pennsylvania data, which we show can lead to flawed and unreliable estimates with our simulations.
Each additional variable doubles the number of missing data patterns, so a fourth variable would result in 16 patterns with only 7 observed patterns.
Iterations were run in parallel on 15 computation cores with the number of iterations chosen to balance computation time across a wide range of simulation settings and precision of the approximation to the simulated sampling distributions of interest.
References
Albonetti CA (1991) An integration of theories to explain judicial discretion. Soc Probl 38:247–266
Allison PD (2000) Multiple imputation for missing data: a cautionary tale. Sociol Methods Res 28:301–309. https://doi.org/10.1177/0049124100028003003
Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20:40–49. https://doi.org/10.1002/mpr.329
Bales WD, Piquero AR (2012) Racial/ethnic differentials in sentencing to incarceration. Justice Q 29:742–773. https://doi.org/10.1080/07418825.2012.659674
Bartlett JW, Harel O, Carpenter JR (2015) Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. Am J Epidemiol 182:730–736. https://doi.org/10.1093/aje/kwv114
Baumer EP (2013) Reassessing and redirecting research on race and sentencing. Justice Q 30:231–261
Belin TR (2009) Missing data: what a little can do, and what researchers can do in response. Am J Ophthalmol 148:820–822
Blumer H (1958) Race prejudice as a sense of group position. Pac Sociol Rev 1:3–7
Bodner TE (2008) What improves with increased missing data imputations? Struct Equ Model 15:651–675. https://doi.org/10.1080/10705510802339072
Brame R, Paternoster R (2003) Missing data problems in criminological research: two case studies. J Quant Criminol 19:55–78
Brame R, Bushway SD, Paternoster R, Turner MG (2014) Demographic patterns of cumulative arrest prevalence by ages 18 and 23. Crime Delinq 60:471–486
Bushway S, Johnson BD, Slocum LA (2007) Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol 23:151–178
Bushway SD, Piehl AM (2001) Judging judicial discretion: legal factors and racial discrimination in sentencing. Law Soc Rev 35:733–764
Cassidy M, Rydberg J (2020) Does sentence type and length matter? Interactions of age, race, ethnicity, and gender on jail and prison sentences. Crim Justice Behav 47:61–79
Chin JM, Pickett JT, Vazire S, Holcombe AO (2021) Questionable research practices and open science in quantitative criminology. J Quant Criminol. 1–31
Couloute L (2018) Nowhere to go: Homelessness among formerly incarcerated people. Prison Policy Initiative. https://www.prisonpolicy.org/reports/housing.html
Cro S, Morris TP, Kenward MG, Carpenter JR (2020) Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Stat Med 39:2815–2842. https://doi.org/10.1002/sim.8569
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (methodol) 39:1–38
Demuth S, Steffensmeier D (2004) Ethnicity effects on sentence outcomes in large urban courts: comparisons among White, Black, and Hispanic defendants. Soc Sci Q 85:994–1011
Desmond M, Lopez Turley RN (2009) The role of familism in explaining the Hispanic-White college application gap. Soc Probl 56:311–334
Doerner JK, Demuth S (2010) The independent and joint effects of race/ethnicity, gender, and age on sentencing outcomes in U.S. federal courts. Justice Q 27:1–27
Epperson MW, Sawh L, Patel S, Pettus C, Grier A (2023) Examining case dismissal outcomes in prosecutor-led diversion programs. Crim Justice Policy Rev 34:236–260
Feldmeyer B, Warren PY, Siennick SE, Neptune M (2015) Racial, ethnic, and immigrant threat: is there a new criminal threat on state sentencing? J Res Crime Delinq 52:62–92
Fox JA, Swatt ML (2009) Multiple imputation of the supplementary homicide reports, 1976–2005. J Quant Criminol 25:51–77
Franklin TW, Henry TKS (2020) Racial disparities in federal sentencing outcomes: clarifying the role of criminal history. Crime Delinq 66:3–32
Freudenberg N, Moseley J, Labriola M, Daniels J, Murrill C (2007) Comparison of health and social characteristics of people leaving New York City jails by age, gender, and race/ethnicity: Implications for public health interventions. Public Health Rep 122:733–743. https://doi.org/10.1177/003335490712200605
Gaston S (2019) Producing race disparities: a study of drug arrests across place and race. Criminology 57:424–451
Gaebler J, Cai W, Basse G, Shroff R, Goel S, Hill J (2022) A causal framework for observational studies of discrimination. Stat Public Policy 9:26–48
Glynn RJ, Laird NM, Rubin DB (1993) Multiple Imputation in mixture models for nonignorable non-response with follow-ups. J Am Stat Assoc 88:984–993. https://doi.org/10.2307/2290790
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576
Graham JW, Olchowski AE, Gilreath TD (2007) How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevent Sci. 8:206–213. https://doi.org/10.1007/s11121-007-0070-9
Green E (1961) Judicial attitudes in sentencing: A study of the factors underlying the sentencing practice of the criminal court of Philadelphia. Macmillan, London
Green E (1964) Inter-and intra-racial crime relative to sentencing. J Criminal Law Criminol Police Sci 55:348–358
Gruenewald J, Pridemore WA (2012) A comparison of ideologically-motivated homicides from the new extremist crime database and homicides from the supplementary homicide reports using multiple imputation by chained equations to handle missing values. J Quant Criminol 28:141–162
Harel O (2007) Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat Methodol 4:75–89. https://doi.org/10.1016/j.stamet.2006.03.002
Harel O, Zhou X (2007) Multiple imputation: review of theory, implementation and software. Stat Med 26:3057–3077. https://doi.org/10.1002/sim.2787
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In: Annals of economic and social measurement, NBER, pp 475–492
Hepburn JR (1978) Race and the decision to arrest: an analysis of warrants issued. J Res Crime Delinq 15:54–73
Hester R, Sevigny EL (2016) Court communities in local context: a multilevel analysis of felony sentencing in South Carolina. J Crime Justice 39:55–74
Hickert A, Bushway SD, Harding DJ, Morenoff JD (2022) Prior punishments and cumulative disadvantage: how supervision status impacts prison sentences. Criminology 60:27–59
Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85:765–769. https://doi.org/10.2307/2290013
Johnson EH (1957) Selective factors in capital punishment. Soc Forces. 165–169
Johnson BD, Lee JG (2013) Racial disparity under sentencing guidelines: a survey of recent research and emerging perspectives. Sociol Compass 7:503–514
Johnson O, Omori M, Petersen N (2023) Racial-ethnic disparities in police and prosecutorial drug charging: analyzing organizational overlap in charging patterns at arrest, filing, and conviction. J Res Crime Delinq 60:255–299
Jordan KL, Freiburger TL (2010) Examining the impact of race and ethnicity on the sentencing of juveniles in the adult court. Crim Justice Policy Rev 21:185–201
Jordan KL, McNeal BA (2016) Juvenile penalty or leniency: Sentencing of juveniles in the criminal justice system. Law Hum Behav 40:387–400
Kadane JB, Lamberth J (2009) Are blacks egregious speeding violators at extraordinary rates in New Jersey? Law, Probability Risk 8:139–152
King RD, Light MT (2019) Have racial and ethnic disparities in sentencing declined? Crime Justice 48:365–437
Knox D, Lowe W, Mummolo J (2020) Administrative records mask racially biased policing. Am Polit Sci Rev 114:619–637
Kurlychek MC, Johnson BD (2019) Cumulative disadvantage in the American criminal justice system. Ann Rev Criminol 2:291–319
Kutateladze BL, Andiloro NR, Johnson BD (2016) Opening Pandora’s box: how does defendant race influence plea bargaining? Justice Q 33:398–426
Kutateladze BL, Andiloro NR, Johnson BD, Spohn CC (2014) Cumulative disadvantage: examining racial and ethnic disparity in prosecution and sentencing. Criminology 52:514–551
Laniyonu A, Donahue ST (2023) Effect of racial misclassification in police data on estimates of racial disparities. Criminology 61:295–315. https://onlinelibrary.wiley.com/doi/abs/10.1111/1745-9125.12329
Light MT (2022) The declining significance of race in criminal sentencing: evidence from U.S. federal courts. Soc Forces 100:1110–1141
Little RJA (1993) Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 88:125–134. https://doi.org/10.2307/2290705
Little RJA, Wang Y (1996) Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52:98–111. https://doi.org/10.2307/2533148
Lynch M (2019) Focally concerned about focal concerns: a conceptual and methodological critique of sentencing disparities research. Justice Q 36:1148–1175
MacDonald JM, Donnelly EA (2019) Evaluating the role of race in sentencing: an entropy weighting analysis. Justice Q 36:656–681
McCormack PD, Clarke K, Walfield S, Spina F (2023) The (mis) measure of race and ethnicity in crime data. J Ethn Crim Justice 1–23. https://doi.org/10.1080/15377938.2023.2241404
Manski CF (2003) Partial identification of probability distributions, vol 5. Springer, New York
Martinez BP, Petersen N, Omori M (2020) Time, money, and punishment: Institutional racial-ethnic inequalities in pretrial detention and case outcomes. Crime Delinq 66:837–863
Mears DP, Brown JM, Cochran JC, Siennick SE (2021) Extended solitary confinement for managing prison systems: placement disparities and their implications. Justice Q 38:1492–1518
Metcalfe C, Chiricos T (2018) Race, plea, and charge reduction: an assessment of racial disparities in the plea process. Justice Q 35:223–253
Mitchell O (2005) A meta-analysis of race and sentencing research: explaining the inconsistencies. J Quant Criminol 21:439–466
Mitchell O, Yan S, Mora DO (2022) Trends in prison sentences and racial disparities: 20-years of sentencing under Florida’s Criminal Punishment Code. J Res Crime Delinquency. 1–39
Omori M, Johnson O (2019) Racial inequality in punishment. In: Oxford research encyclopedia of criminology and criminal justice
Omori M, Petersen N (2020) Institutionalizing inequality in the courts: decomposing racial and ethnic disparities in detention, conviction, and sentencing. Criminology 58:678–713
Owusu-Bempah A, Luscombe A (2021) Race, cannabis and the Canadian war on drugs: an examination of cannabis arrest data by race in five cities. Int J Drug Policy 91:102937
Painter-Davis N, Ulmer JT (2020) Discretion and disparity under sentencing guidelines revisited: the interrelationship between structured sentencing alternatives and guideline decision-making. J Res Crime Delinq 57:263–293
Perkins NJ, Cole SR, Harel O (2018) Principled approaches to missing data in epidemiologic studies. Am J Epidemiol 187:568–575. https://doi.org/10.1093/aje/kwx348
Pratt TC (1998) Race and sentencing: a meta-analysis of conflicting empirical research results. J Crim Just 26:513–523
Raghunathan TE, Lepkowski JM, Hoewyk JV, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Statistics Canada 27:85–95
Ramos J (2023) Immigration in an era of mass reentry: does immigrant concentration guard against serious recidivism? J Res Crime Delinq 60:213–254
Rehavi MM, Starr SB (2014) Racial disparity in federal criminal sentences. J Polit Econ 122:1320–1354
Reitz KR, Klingele CM (2019) Model penal code: sentencing—workable limits on mass punishment. Crime Justice 48:255–311
Ridgeway G, Moyer RA, Bushway SD (2020) Sentencing scorecards: reducing racial disparities in prison sentences at their source. Criminol Public Policy 19:1113–1138
Roberts JM Jr, Roberts A, Wadsworth T (2018) Multiple imputation for missing values in homicide incident data: an evaluation using unique test data. Homicide Stud 22:391–409
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Schafer JL (1999) Multiple imputation: a primer. Stat Methods Med Res 8:3–15. https://doi.org/10.1177/096228029900800102
Schafer JL (2003) Multiple imputation in multivariate problems when the imputation and analysis models differ. Stat Neerl 57:19–35. https://doi.org/10.1111/1467-9574.00218
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177
Schlesinger T (2008) The cumulative effects of racial disparities in criminal processing. Advocate 13:22–34
Siddique J, Harel O, Crespi CM (2012) Addressing missing data mechanism uncertainty using multiple-model multiple imputation: application to a longitudinal clinical trial. Ann Appl Stat 6:1814–1837. https://doi.org/10.1214/12-AOAS555
Sidi Y, Harel O (2018) The treatment of incomplete data: reporting, analysis, reproducibility, and replicability. Soc Sci Med 209:169–173
Skeem JL, Lowenkamp CT (2016) Risk, race, and recidivism: predictive bias and disparate impact. Criminology 54:680–712
Skeem, J. L., Montoya, L., & Lowenkamp, C. (2023). Place matters: racial disparities in pretrial detention recommendations across the US. Available at SSRN 4354698.
Snowball L, Weatherburn D (2007) Does racial bias in sentencing contribute to Indigenous overrepresentation in prison? Aust N Z J Criminol 40:272–290
Spohn C (2000) Thirty years of sentencing reform: The quest for a racially neutral sentencing process. In: Horney J (ed) Policies, processes, and decisions of the criminal justice system. National Institute of Justice, Washington, DC, pp 427–501
Spohn C (2015) Evolution of sentencing research. Criminol Public Policy 14:225–232
Spohn C, Holleran D (2000) The imprisonment penalty paid by young, unemployed black and Hispanic male offenders. Criminology 38:281–306
Spohn C, StGeorge S. (2022). Women lifers: what the United States Sentencing Commission data tell us about women eligible for and sentenced to life without parole. Victims Offenders. 1–21
Steffensmeier D, Demuth S (2000) Ethnicity and sentencing outcomes in US federal courts: who is punished more harshly? Am Sociol Rev 65:705–729
Steffensmeier D, Ulmer J, Kramer J (1998) The interaction of race, gender, and age in criminal sentencing: the punishment cost of being young, black, and male. Criminology 36:763–798
Steinmetz KF, Henderson H (2016) Inequality on probation: an examination of differential probation outcomes. J Ethnicity Criminal Justice 14:1–20
Stevens T, Morash M (2015) Racial/ethnic disparities in boys’ probability of arrest and court actions in 1980 and 2000: the disproportionate impact of “getting tough” on crime. Youth Violence Juvenile Justice 13:77–95
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–540. https://doi.org/10.2307/2289457
Ulmer JT (2012) Recent developments and new directions in sentencing research. Justice Q 29:1–40
Ulmer JT, Johnson B (2004) Sentencing in context: a multilevel analysis. Criminology 42:137–178
Ulmer JT, Light M, Kramer J, Eisenstein J (2011) Does increased judicial discretion lead to increased disparity? The “liberation” of judicial sentencing discretion in the wake of the Booker/Fanfan decision. Justice Q 28:799–837
Vach W, Schumacher M (1993) Logistic regression with incompletely observed categorical covariates: a comparison of three approaches. Biometrika 80:353–362. https://doi.org/10.2307/2337205
van Buuren S, Boshuizen HC, Knook DL (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18:681–694
van der Heijden GJ, Donders AR, Stijnen T, Moons KG (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59:1102–1109. https://doi.org/10.1016/j.jclinepi.2006.01.015
von Hippel PT (2020) How many imputations do you need? A two-stage calculation using a quadratic rule. Sociol Methods Res 49:699–718. https://doi.org/10.1177/0049124117747303
White IR, Carlin JB (2010) Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 29:2920–2931. https://doi.org/10.1002/sim.3944
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30:377–399. https://doi.org/10.1002/sim.4067
Wooldredge J (2012) Distinguishing race effects on pretrial release and sentencing decisions. Justice Q 29:41–75
Wooldredge J, Frank J, Goulette N, Travis L III (2015) Is the impact of cumulative disadvantage on sentencing greater for Black defendants? Criminol Public Policy 14:187–223
Wu J (2016) Racial/ethnic discrimination and prosecution: a meta-analysis. Crim Justice Behav 43:437–458
Zane SN, Welsh BC, Mears DP, Zimmerman GM (2022) Pathways through juvenile justice: a system-level assessment of cumulative disadvantage in the processing of juvenile offenders. J Quant Criminol 38:483–514
Zatz MS, Rodriguez N (2006) Conceptualizing race and ethnicity in studies of crime and criminal justice. In: Peterson R, Hagan J, Krivo L (eds) The many colors of crime: Inequalities of race, ethnicity and crime in America, NYU Press, pp 39–53
Acknowledgements
The authors would like to thank the Pennsylvania Commission on Sentencing for providing the data that were used to construct the simulated data set. The authors would also like to thank the anonymous reviewers for their helpful feedback on earlier versions of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Simulated Data Set Construction
Categorical and Ordinal Variables
Crime type (\(CrimeType\)): Sampled from
\(X\) | Persons | Property (ref.) | Drug | DUI | Other |
---|---|---|---|---|---|
\(P\left( {CrimeType = x} \right)\) | 0.157 | 0.254 | 0.228 | 0.238 | 0.123 |
Case disposition (\(CaseDisp\)): sampled from
\(X\) | Yes | Plea (ref.) |
---|---|---|
\(P\left( {CaseDisp = x} \right)\) | 0.978 | 0.022 |
Offense gravity score (\(OGS\)): Sampled from 1,2,…,14 with the probability distribution
\(X\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
\(P\left( {OGS = x} \right)\) | 0.295 | 0.099 | 0.264 | 0.031 | 0.156 | 0.048 | 0.036 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | |
0.020 | 0.015 | 0.021 | 0.007 | 0.003 | 0.001 | < 0.001 |
Prior record score (\(PRS\)): Sampled from
\(X\) | None (ref.) | 1/2/3 | 4/5 | REVOC/RFEL |
---|---|---|---|---|
\(P\left( {PRS = x} \right)\) | 0.486 | 0.303 | 0.182 | 0.030 |
Recommended minimum (\(RecMin\)): sampled from
\(X\) | Yes | No (ref.) |
---|---|---|
\(P\left( {RecMin = x} \right)\) | 0.670 | 0.30 |
Sex (\(Sex\)): Sampled from
\(X\) | Male (ref.) | Female |
---|---|---|
\(P\left( {Sex = x} \right)\) | 0.774 | 0.226 |
Race (\(Race\)): sampled from
\(X\) | White (ref.) | Black | Latino | Other |
---|---|---|---|---|
\(P\left( {Race = x} \right)\) | 0.717 | 0.268 | 0.009 | 0.006 |
County (\(County\)): sampled from indexes 1,2,…,67
\(X\) P (County = x) | Adams (Ref.) 0.0100 | Allegheny 0.1177 | Armstrong 0.0047 | Beaver 0.0141 | Bedford 0.0044 |
---|---|---|---|---|---|
Berks 0.0318 | Blair 0.0148 | Bradford 0.0052 | Bucks 0.0474 | Butler 0.0148 | |
Cambria 0.0116 | Cameron 0.0004 | Carbon 0.0088 | Centre 0.0099 | Chester 0.0290 | |
Clarion 0.0032 | Clearfield 0.0036 | Clinton 0.0039 | Columbia 0.0038 | Crawford 0.0081 | |
Cumberland 0.0201 | Dauphin 0.0302 | Delaware 0.0644 | Elk 0.0022 | Erie 0.0203 | |
Fayette 0.0166 | Forest 0.0005 | Franklin 0.0173 | Fulton 0.0016 | Greene 0.0025 | |
Huntington 0.0038 | Indiana 0.0070 | Jefferson 0.0037 | Juniata 0.0019 | Lackawanna 0.0163 | |
Lancaster 0.0266 | Lawrence 0.0059 | Lebanon 0.0116 | Lehigh 0.0297 | Luzerne 0.0239 | |
Lycoming 0.0152 | McKean 0.0041 | Mercer 0.0099 | Mifflin 0.0047 | Monroe 0.0124 | |
Montgomery 0.0636 | Montour 0.0012 | Northampton 0.0251 | Northumberland 0.0078 | Perry 0.0040 | |
Philadelphia 0.0567 | Pike 0.0043 | Potter 0.0016 | Schuylkill 0.0123 | Snyder 0.0035 | |
Somerset 0.0047 | Sullivan 0.0004 | Susquehanna 0.0024 | Tioga 0.0025 | Union 0.0028 | |
Venango 0.0053 | Warren 0.0031 | Washington 0.0098 | Wayne 0.0031 | Westmoreland 0.0317 | |
Wyoming 0.0029 | York 0.0517 |
Year (\(Year\)): sampled from
X | 2010 | (Ref.) | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|
P (Y ear = x) | 0.103 | 0.097 | 0.100 | 0.105 | 0.104 | 0.097 | 0.101 | 0.097 | 0.096 | 0.100 |
Quantitative Variable: Let \(A_{i} = Age_{i} - min\left( {Age} \right)\) for \({\text{i }} = { }1, \ldots ,{\text{N}}\).
Draw \(\overline{{A_{i} }} \sim Gamma\left( {\hat{a},\hat{b}} \right)\) where \(\hat{a} = \frac{N - 1}{N}\frac{{\overline{{A^{2} }} }}{{\widehat{Var}\left( A \right)}}\) and \(\hat{b} = \frac{N}{N - 1}\frac{{\widehat{Var}\left( A \right)}}{{\overline{A}}}\) is the scale parameter with \(\overline{A} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} A_{i}\) and \(\widehat{Var}\left( A \right) = \frac{1}{N - 1}\mathop \sum \limits_{i = 1}^{N} \left( {A_{i} - \overline{A}} \right)^{2}\). The simulated age for individual \(i\) is then \(\overline{{Age_{i} }} = \overline{{A_{i} }} + min\left( {Age} \right)\). In the PCS data set, \(\overline{A} = 22.57\) and \(\widehat{Var}\left( A \right) = 132.434\).
Incarceration Outcome Variable: We used the estimated coefficients from the fitted logistic regression model (Eq. 1) using the PCS data as the “true” parameter values for the simulated population. We then sample sentencing decisions using the same logistic regression model evaluated on the simulated independent and control variables.
Missingness Model Parameters
The multinomial model uses two variables to determine the probability of an observation being missing: incarceration and whether the defendant’s race is Black. Two interaction variables were created for non-incarceration and race being Black and being incarcerated and race being Black. The intercept of the model is set for each of the eight missing data patterns to reflect the observed probability of that pattern in the Pennsylvania data. The parameters of the model vary by pattern and to induce over- or under-estimation of the race effect in the complete case analysis. See Table 1 for a list of the missing data patterns in the PA data.
MNAR Missingness
The missingness model assigns slightly more missingness than we saw in the administrative data while still having a relatively low number of incomplete cases.
To over-estimate the race effect, missingness in race is conditional on race being Black and incarceration so we set the parameters of the model to be \(exp\left( \beta \right) = \left( {10, 0.1, 0.001, 100} \right){\prime} \in R^{4}\). The first element in \(\beta\) emphasizes missingness based on incarceration, the second de-emphasizes missingness based on being Black while the third element corresponding to \(Z_{1}\) greatly de-emphasizes missingness for defendants who aren’t incarcerated and are Black and the fourth corresponding to \(Z_{2}\) greatly emphasizes missingness for individuals who are incarcerated and are Black.
To under-estimate the effect, missingness in race is again conditional on race being Black and incarceration, so we set the parameters of the model to be \(exp\left( \beta \right) = \left( {5,5, 10} \right){\prime} \in R^{3}\). The first element in \(\beta\) emphasizes missingness based on incarceration, the second element emphasizes missingness based on being Black while the third element greatly emphasizes missingness for defendants who aren’t incarcerated and are Black corresponding to \(Z_{1}\).
MAR Missingness
We used similar parameter settings for the MAR simulations as we did for the MNAR, but now missingness based on race is only for patterns 1,3,5, and 7 which are no missing variables, missing in age, missing in recommended minimum, and missing in both age and recommended minimum. This maintains the MAR assumption since the missingness is not contingent on race for the patterns where race is missing. However, since we are performing the analyses with complete case analysis, we are still systematically excluding data from our analysis in such a way that will dramatically bias the results based on the values of race and incarceration.
Slightly different parameter values are used for patterns 3 and 5 compared to patterns 1 and 7. See Table 1 for a list of the missing data patterns.
To get an over-estimated race effect estimate with CCA, we de-emphasize missingness based on \(Z_{1}\) for patterns where race is observed.
To under-estimate the effect with CCA, we emphasize \(Z_{1}\) and de-emphasize \(Z_{2}\) for patterns where race is observed (Tables 3 , 4 , 5 and 6 ).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Stockton, B., Strange, C.C. & Harel, O. Now You See It, Now You Don’t: A Simulation and Illustration of the Importance of Treating Incomplete Data in Estimating Race Effects in Sentencing. J Quant Criminol (2023). https://doi.org/10.1007/s10940-023-09577-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10940-023-09577-w