Skip to main content

Measurement Error in Criminal Justice Data

  • Chapter
  • First Online:

Abstract

While accurate data are critical in understanding crime and assessing criminal justice policy, data on crime and illicit activities are invariably measured with error. In this chapter, we illustrate and evaluate several examples of measurement error in criminal justice data. Errors are evidently pervasive, systematic, frequently related to behaviors and policies of interest, and unlikely to conform to convenient textbook assumptions. Using both convolution and mixing models of the measurement error generating process, we demonstrate the effects of data error on identification and statistical inference. Even small amounts of data error can have considerable consequences. Throughout this chapter, we emphasize the value of auxiliary data and reasonable assumptions in achieving informative inferences, but caution against reliance on strong and untenable assumptions about the error generating process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    An extensive literature attempts to document the validity and reliability of criminal justice data (Lynch and Addington 2007; Mosher et al. 2002). In this chapter, we make no attempt to fully summarize this literature and offer no specific suggestions for modifying surveys to improve the quality of collected data.

  2. 2.

    To accommodate proxy errors, this model has often been generalized by including a factor of proportionality linking the observed and true variables. For example, y = δy  ∗  + ν, where δ is an unknown parameter. For brevity, we focus on the pure measurement model without an unknown factor of proportionality. Including a scaling factor of unknown magnitude or sign induces obvious complications beyond those discussed here. For additional details, see Wooldridge (2002, 63–67) and Bound et al. (2001, 3715–3716).

  3. 3.

    The interested reader should consult more detailed presentations in Wooldridge (2002), Wansbeek and Meijer (2000) and Bound et al. (2001).

  4. 4.

    When the available measurement of y  ∗  is a proxy variable of the form y = δy  ∗  + ν, the probability limit of \(\hat{{\beta }}_{y,{x}^{{_\ast}}}\) is δβ. If δ is known in sign but not magnitude, then the sign, but not scale, of β is identified.

  5. 5.

    Although a full treatment of the effects of measurement error in multivariate regression is beyond the scope of this chapter, several general results are worth mentioning. First, measurement error in any one regressor will usually affect the consistency of all other parameter estimators. Second, when only a single regressor is measured with classical error, the OLS estimator of the coefficient associated with the error-ridden variable suffers attenuation bias in the standard sense (see, for example, Wooldridge 2002, 75). In general, all other OLS parameters are also inconsistent, and the direction of inconsistency can be asymptotically signed by the available data. Finally, with measurement error in multiple regressors, classical assumptions imply that the probability limit of the OLS parameter vector is usually attenuated in an average sense, but there are important exceptions (Wansbeek and Meijer 2000, 17–20).

  6. 6.

    A number of possible strategies are available, and the interested reader should consult the discussions in Wansbeek and Meijer (2000) and Bound et al. (2001).

  7. 7.

    Of the required conditions for using a second measurement as an instrumental variable, the assumption that the two errors are uncorrelated, ση, μ = 0, may be the most difficult to satisfy in practice. Even if both errors are classical in other regards, errors in different measurements of the same variable may be expected to correlate so that ση, μ > 0. When covariance between μ and η is nonzero, the IV slope estimator is no longer a consistent point estimator of β, though it may still provide an informative bound on β under certain circumstances (see, for example, Bound et al. 2001, 3730; Black et al. 2000).

  8. 8.

    Klepper and Leamer (1984) suggest a similar strategy for the case where multiple regressors are measured with error. The general approach is described by Bound et al. (2001, 3722–3723).

  9. 9.

    Similar logic suggests violation for any discrete variable, and any continuous but bounded variable.

  10. 10.

    Failure of assumption A2 affects inference regrading α, but not β (see, for example, Illustration 1).

  11. 11.

    Bollinger (1996) and Frazis and Loewenstein (2003) derive bounds when a binary regressor is measured with error.

  12. 12.

    With panel data, assumptions A1–A5 must account for correlations in both the cross-section and the time series dimension. For detailed examples, see Wooldridge (2002) and Griliches and Hausman (1985).

  13. 13.

    Note that the variance of \(\Delta {x}_{i}^{{_\ast}}\) is smaller when \({x}_{i}^{{_\ast}}\) has positive autocorrelation: \({\sigma }_{\Delta {x}^{{_\ast}}}^{2} = 2{\sigma }_{{x}^{{_\ast}}}^{2}(1 - {\rho }_{{x}_{2}^{{_\ast}},{x}_{1}^{{_\ast}}}).\) To see why the relative strength of serial correlation is a concern, suppose that ρ x2 ∗ , x1 ∗  > 1 ∕ 2 while random measurement errors exhibit no autocorrelation. This implies \({\sigma }_{\Delta {x}^{{_\ast}}}^{2} < {\sigma }_{{x}^{{_\ast}}}^{2}\) while σΔμ 2 = 2σμ 2, so attenuation bias will be greater after first differencing the data.

  14. 14.

    As discussed is the previous section, when a variable with bounded support is imperfectly classified, it is widely recognized that the classical errors-in-variables model assumption of independence between measurement error and true variable cannot hold. Molinari (2008) presents an alternative and useful conceptualization of the data error problem for discrete outcome variables. This “direct misclassification” approach allows one to focus on assumptions related to classification error rates instead of restrictions on the mixing process.

  15. 15.

    This type of restriction is used in the literatures on robust statistics (Huber 1981) and data errors with binary regressors (see, e.g., Bollinger 1996 and Frazis and Loewenstein 2003).

  16. 16.

    In this discussion, we are concerned with drawing inferences on the true rate of aggravated assault reported to the police. Inferences regarding the overall rate of aggravated assault – known and unknown to the police – are complicated by the proxy variables problem discussed previously.

References

  • Anglin MD, Caulkins JP, Hser Y (1993) Prevalence estimation: policy needs, current status, and future potential. J Drug Issues 23(2):345–360

    Google Scholar 

  • Azrael D, Cook PJ, Miller M (2001) State and local prevalence of firearms ownership: measurement structure and trends, National Bureau of Economic Research: Working Paper 8570

    Google Scholar 

  • Bennett T, Holloway K, Farrington D (2008) The statistical association between drug misuse and crime: a meta-analysis. Aggress Violent Behav 13:107–118

    Article  Google Scholar 

  • Black D (1970) Production of crime rates. Am Sociol Rev 35:733–48

    Article  Google Scholar 

  • Black DA, Berger MC, Scott FA (2000) Bounding parameter estimates with nonclassical measurement error. J Am Stat Assoc 95(451):739–748

    Article  Google Scholar 

  • Blumstein A, Rosenfeld R (2009) Factors contributing to U.S. crime trends, in understanding crime trends: workshop report Goldberger AS, Rosenfeld R (eds) Committee on Understanding Crime Trends, Committee on Law and Justice, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC

    Google Scholar 

  • Bollinger C (1996) Bounding mean regressions when a binary variable is mismeasured. J Econom 73(2):387–99

    Article  Google Scholar 

  • Bound J, Brown C, Mathiowetz N (2001) Measurement error in survey data. In: Heckman J, Leamer E (eds) Handbook of econometrics, 5, Ch. 59:3705–3843

    Google Scholar 

  • Chaiken JM, Chaiken MR (1990) Drugs and predatory crime. Crime Justice: Drugs Crime, 13:203–239

    Article  Google Scholar 

  • Dominitz J, Sherman R (2004) Sharp bounds under contaminated or corrupted sampling with verification, with an application to environmental pollutant data. J Agric Biol Environ Stat 9(3):319–338

    Article  Google Scholar 

  • Frazis H, Loewenstein M (2003) Estimating linear regressions with mismeasured, possibly endogenous, binary explanatory variables. J Econom 117:151–178

    Article  Google Scholar 

  • Frisch R (1934) Statistical confluence analysis by means of complete regression systems. University Institute for Economics, Oslo

    Google Scholar 

  • Griliches Z, Hausman JA (1985) Errors in variables in panel data: a note with an example. J Econom 31(1):93–118

    Article  Google Scholar 

  • Harrison LD (1995) The validity of self-reported data on drug use. J Drug Issues 25(1):91–111

    Google Scholar 

  • Harrison L, Hughes A (1997) Introduction – the validity of self-reported drug use: improving the accuracy of survey estimates. In: Harrison L and Hughes A (eds) The validity of self-reported drug use: improving the accuracy of survey estimates. NIDA Research Monograph, vol 167. US Department of Health and Human Services, Rockville, MD, pp 1–16

    Google Scholar 

  • Horowitz J, Manski C (1995) Identification and robustness with contaminated and corrupted data. Econometrica, 63(2):281–302

    Article  Google Scholar 

  • Hotz J, Mullins C, Sanders S (1997) Bounding causal effects using data from a contaminated natural experiment: analyzing the effects of teenage childbearing. Rev Econ Stud 64(4):575–603

    Article  Google Scholar 

  • Huber P (1981) Robust statistics. Wiley, New York

    Book  Google Scholar 

  • Johnston LD, O’Malley PM, Bachman JG (1998) National survey results on drug use from the monitoring the future study, 1975–1997, Volume I: Secondary school students. NIH Publication No. 98-4345. National Institute on Drug Abuse, Rockville, MD

    Google Scholar 

  • Kleck G, Gertz M (1995) Armed resistance to crime: The prevalence and nature of self-defense with a gun. J Crim Law Criminol 86:150–187

    Article  Google Scholar 

  • Klepper S, Leamer EE (1984) Consistent sets of estimates for regressions with errors in all variables. Econometrica 52(1):163–183

    Article  Google Scholar 

  • Koss M (1993) Detecting the scope of rape: A review of prevalence research methods. J Interpers Violence 8:198–222

    Article  Google Scholar 

  • Koss M (1996) The measurement of rape victimization in crime surveys. Crim Justice Behav 23:55–69

    Article  Google Scholar 

  • Kreider B, Hill S (2009) Partially identifying treatment effects with an application to covering the uninsured. J Hum Resour 44(2):409–449

    Google Scholar 

  • Kreider B, Pepper J (2007) Disability and employment: reevaluating the evidence in light of reporting errors. J Am Stat Assoc 102(478):432–441

    Article  Google Scholar 

  • Kreider B, Pepper J (2008) Inferring disability status from corrupt data. J Appl Econom 23(3):329–349

    Article  Google Scholar 

  • Kreider B, Pepper J. (forthcoming). Identification of expected outcomes in a data error mixing model with multiplicative mean independence. J Business Econ Stat

    Google Scholar 

  • Kreider B, Pepper J, Gundersen C, Jolliffee D (2009) Identifying the effects of food stamps on children’s health outcomes when participation is endogenous and misreported. Working Paper

    Google Scholar 

  • Lambert D, Tierney L (1997) Nonparametric maximum likelihood estimation from samples with irrelevant data and verification bias. J Am Stat Assoc 92:937–944

    Article  Google Scholar 

  • Lynch JP, Addington LA (eds) (2007) Understanding crime statistics: revisiting the divergence of the NCVS and UCR. Cambridge University Press, Cambridge

    Google Scholar 

  • Lynch J, Jarvis J (2008) Missing data and imputation in the uniform crime reports and the effects on national estimates. J Contemp Crim Justice 24(1):69–85. doi:10.1177/1043986207313028

    Article  Google Scholar 

  • Maltz M (1999) Bridging gaps in police crime data: a discussion paper from the BJS Fellows Program Bureau of Justice Statistics, Government Printing Office, Washington, DC

    Google Scholar 

  • Maltz MD, Targonski J (2002) A note on the use of county-level UCR data. J Quant Criminol. 18:297–318

    Article  Google Scholar 

  • Manski CF (2007) Identification for prediction and decisions. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Manski CF, Newman J, Pepper JV (2002) Using performance standards to evaluate social programs with incomplete outcome data: general issues and application to a higher education block grant program. 26(4), 355–381

    Google Scholar 

  • McDowall D, Loftin C (2007) What is convergence, and what do we know about it? In Lynch J, Addington LA (eds) Understanding crime statistics: revisiting the divergence of the NCVS and UCR, Ch. 4: 93–124

    Google Scholar 

  • McDowall D, Loftin C, Wierseman B (1998) Estimates of the frequency of firearm self-defense from the redesigned national crime victimization survey. Violence Research Group Discussion Paper 20.

    Google Scholar 

  • Molinari F (2008) Partial identification of probability distributions with misclassified data. J Econom 144(1):81–117

    Article  Google Scholar 

  • Mosher CJ, Miethe TD, Phillips DM (2002) The mismeasure of crime. Sage Publications, Thousand Oaks, CA

    Google Scholar 

  • Mullin CH (2005) Identification and estimation with contaminated data: When do covariate Data Sharpen Inference?” J Econom 130:253–272

    Article  Google Scholar 

  • National Research Council (2001) Informing America’s policy on illegal drugs: what we don’t know keeps hurting us. Committee on Data and Research for Policy on Illegal Drugs. In: Manski CF, Pepper JV, Petrie CV (eds) Committee on Law and Justice and Committee on National Statistics. Commission on Behavioral and Social Sciences and Education. National Academy Press, Washington, DC

    Google Scholar 

  • National Research Council (2003) Measurement problems in criminal justice research: workshop summary. Pepper JV, Petrie CV. Committee on Law and Justice and Committee on National Statistics, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC

    Google Scholar 

  • National Research Council (2005) Firearms and violence: a critical review. Committee to improve research information and data on firearms. In: Wellford CF, Pepper JV, Petire CV (eds) Committee on Law and Justice, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC

    Google Scholar 

  • National Research Council (2008) Surveying victims: Options for conducting the national crime victimization survey. Panel to review the programs of the bureau of justice statistics. In: Groves RM, Cork DL (eds). Committee on National Statistics and Committee on Law and Justice, Division of Behavioral and Social Sciences and Education. The National Academies Press, Washington, DC

    Google Scholar 

  • Office of Applied Studies. (2003). Results from the 2002 National Survey on Drug Use and Health: summary of national finding, (DHHS Publication No. SMA 03-3836, Series H-22). Substance Abuse and Mental Health Services Administration, Rockville, MD

    Google Scholar 

  • Pepper JV (2001) How do response problems affect survey measurement of trends in drug use? In: Manski CF, Pepper JV, Petrie C (eds) Informing America’s policy on illegal drugs: What we don’t know keeps hurting us. National Academy Press, Washington, DC, 321–348

    Google Scholar 

  • Rand MR, Rennison CM (2002) True crime stories? Accounting for differences in our national crime indicators. Chance 15(1):47–51

    Google Scholar 

  • Tourangeau R, McNeeley ME (2003) Measuring crime and crime victimization: methodological issues. In: Pepper JV, Petrie CV (eds) Measurement Problems in Criminal Justice Research: Workshop Summary. Committee on Law and Justice and Committee on National Statistics, Division of Behavioral and Social Sciences and Education. The National Academies Press: Washington, DC 0

    Google Scholar 

  • U.S. Department of Justice (2004) The Nation’s two crime measures, NCJ 122705, http://www.ojp.usdoj.gov/bjs/abstract/ntmc.htm.

  • U.S. Department of Justice (2006) National Crime Victimization Survey: Criminal Victimization, 2005. Bureau of Justice Statistics Bullentin. http://www.ojp.usdoj.gov/bjs/pub/pdf/cv05.pdf

  • U.S. Department of Justice (2008) Crime in the United States, 2007-+. Federal Bureau of Investigation, Washington, DC. (table 1). http://www.fbi.gov/ucr/cius2007/data/table_01.html

  • Wansbeek T, Meijer E (2000) Measurement error and latent variables in econometrics. Elsevier, Amsterdam

    Google Scholar 

  • Wooldridge JM (2002) Econometric analysis of cross section and panel data. MIT Press, Cambridge, MA

    Google Scholar 

Download references

Acknowledgments

We thank Stephen Bruestle, Alex Piquero, and David Weisburd for their helpful comments. Pepper’s research was supported, in part, by the Bankard Fund for Political Economy.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Pepper, J., Petrie, C., Sullivan, S. (2010). Measurement Error in Criminal Justice Data. In: Piquero, A., Weisburd, D. (eds) Handbook of Quantitative Criminology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77650-7_18

Download citation

Publish with us

Policies and ethics