Skip to main content

Advertisement

Log in

Multiple Imputation of the Supplementary Homicide Reports, 1976–2005

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

The Supplementary Homicide Reports (SHR), assembled by the Federal Bureau of Investigation (FBI), have for many years represented the most valuable source of information on the patterns and trends in murder and non-negligent manslaughter. Despite their widespread use by researchers and policy makers alike, these data are not completely without their limitations, the most important of which involves missing or incomplete incident reports. In this analysis, we develop methods for addressing missing data in the 1976–2005 SHR cumulative file, related to both non-reports (unit missingness) and incomplete reports (item missingness). For incomplete case data (that is, missing characteristics on victims, offenders or incidents), we implement a multiple imputation (MI) approach based on a log-linear model for incomplete multivariate categorical data. Then, to adjust for unit missingness, we adopt a weighting scheme linked to FBI annual estimates of homicide counts by state and National Center for Health Statistics mortality data on decedent characteristics in coroners’ reports for deaths classified as homicide. The result is a fully-imputed SHR database for 1976–2005. This paper examines the effects of MI and case weighting on victim/offender/incident characteristics, including standard errors of parameter estimates resulting from imputation uncertainty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. In 1980, the FBI added ethnicity of the victim and offender to the SHR. By mid-decade, the FBI ceased analyzing these data due to the substantial amount of missing information.

  2. The relationship codes are often problematic in this regard as the relationship of the offender to the victim is coded instead of the relationship of the victim to the offender. This results in some intimate homicides where the “husband” is female and the “wife” is male.

  3. Because the victim–offender relationship, weapon, and circumstance codes are linked to the offender data, this information does not vary across multiple victims. These codes may be misleading for cases with multiple victims as these codes only refer to the first victim in the data file. Fortunately, multiple victim incidents are relatively rare in the dataset and in many instances these codes would be invariant across victims (such as “stranger” or “unknown” relationships).

  4. If an agency has between 3 and 11 monthly Return A crime submissions for a calendar year, the reported offense counts are weighted upwards by the ratio 12/N, where N is the number of complete submissions. For agencies submitting returns for fewer than 3 months, the data are ignored and annual counts for Part I crimes are imputed based on similar-sized agencies with complete returns in the same state (or region for states without matching agencies). Maltz (1999) discusses these procedures in more detail.

  5. In order to examine the possibility of MCAR, homicide case solution/non-solution was regressed on several characteristics of the homicide incident and victim. The results of these logistic regressions confirmed that the MCAR assumption is untenable in the SHR (for details, see Fox and Swatt 2007).

  6. We are also assuming that the parameters that will be estimated are distinct from the parameters that govern the missing data process (see Schafer 1997, p. 11). Missing data mechanisms that meet both the MAR criterion and the distinctiveness criterion are called ignorable.

  7. The standard error formula is predicated on a sample drawn from an infinite population. If instead the sample constitutes a virtual population (as might be assumed of the SHR), one could adjust the first term of the formula to include only the variability within the portion of the sample for which the item is missing. In this particular case with over a half-million records, the adjustment to the standard error would be barely discernable. More generally, the first term of the standard error expression could be expanded to include two components: (1) the standard error of the estimated parameter among the observed cases, multiplied by the usual correction for sampling from a finite population (if such is the case) as well as the squared percentage of observed cases; and (2) the average of the variances of the estimated parameters from the M imputations calculated using only the missing cases, multiplied by the squared percentage of missing cases. Revising the usual standard error formula for multiply imputed estimates, we obtain

    $$ {\text{SE}}(\hat{\theta }_{\text{MI}} ) = \sqrt {\left( {\frac{n - m}{n}} \right)^{2} \left( {\frac{N - n}{N}} \right)s_{{\hat{\theta }_{o} }}^{2} + \left( {\frac{m}{n}} \right)^{2} \left( {\frac{1}{M}} \right)\sum\limits_{k = 1}^{M} {s_{{\hat{\theta }^{\prime}_{k} }}^{2} + \left( {1 + \frac{1}{M}} \right)\left( {\frac{1}{M - 1}} \right)\sum\limits_{k = 1}^{M} {\left( {\hat{\theta }_{k} - \hat{\theta }_{\text{MI}} } \right)} }^{2} } $$

    where N and n represent the population and sample sizes, respectively, m is the number of missing cases, \( \hat{\theta }_{o} \) is the parameter estimate based on the n − m observed (non-missing) cases, \( \hat{\theta }^{\prime}_{k} \) is the parameter estimate from the kth imputation but based only on the m cases that were imputed, and all other terms are as defined previously. Thus, for datasets that constitute virtual populations, the first of the three terms under the radical would vanish.

  8. Data imputed with EM can, for example, be used directly to derive maximum likelihood estimates of structural equation models. However, analyzing EM imputed data as if they were completely observed will result in improper significance tests having inflated Type I probabilities.

  9. As a check of this approach, we examined the weapon distribution for jurisdictions with at least 50 homicides and with weapon data at least 99.5% complete. For these 80 jurisdictions, the percentage of gun homicide was just over 64%, about the same as for the entire dataset when treating missing values as other weapons. As additional confirmation of the appropriateness of reclassifying missing weapon as other weapon, the resulting percentage of gun homicides matched precisely the figures from the NCHS mortality data drawn from death certificates.

  10. Intimate-partner homicides include episodes involving spouses, common-law spouses, ex-spouses, boyfriends/girlfriends, and homosexual relationships.

  11. We explored several alternative imputation strategies, including PROC MI in SAS, predictive score modeling in SOLAS, multiple hot-deck imputation in SOLAS, and sequential regression. Preliminary results indicated that the Bayesian IP method in S-Plus outperformed all of these alternatives (for additional details, see Fox and Swatt 2007). Horton and Lipsitz (2001) also provide a very detailed discussion of many of the other available software packages that can be used for multiple imputation.

  12. As previously mentioned, various attempts at imputation were conducted with age as a continuous variable using the mixed model options available in S-Plus. None of these imputations were satisfactory, as they resulted in too many cases assigned to the tails of the age distributions.

  13. In these initial models, the crosstabulations of all the covariates had over 2 million cells, which is approximately four times the number of available cases. As a result of sparseness in the observed table, attempts to estimate parameters simultaneously, including low level interactions, produced values at the boundary of the parameter space (see Schafer 1997).

  14. Not withstanding the weak associations reflected in Fig. 2, one might reasonably question the assumption of joint independence of age, sex, and race of offender. We compared the joint distribution of offender age, offender sex, and offender race for the known cases and imputed cases. The largest differences occurred among black males in the 18–24, 25–34, and 35–49-year-old age groups. Even though the interactions among the three variables were in effect set to zero in the imputation model, whatever interactions that do exist would have been captured by the weapon, region and especially urbanness covariates. Overall, the changes in the joint distribution after imputation are arguably more reflective of the nature of unsolved homicide pool than specification bias in the imputation model.

  15. With the assistance of Messner and Deane we also considered the log multiplicative association model for imputing missing victim–offender relationship data. Although there are no set criteria to evaluate which approach produced better imputations, preliminary analyses suggested that the Bayesian IP results were more reasonable. Further, it would be very difficult to incorporate additional covariates into the log multiplicative models. For these reasons, only the Bayesian IP results are included in the SHR archived files (for more details, see Fox and Swatt 2007).

  16. The unit-missingness imputation weights averaged 1.11 with a 0.17 standard deviation. At the maximum, the weights reached 2.34, although the vast majority of weights were close to one. Specifically, the middle 90% of case weights were in the 0.92–1.45 range.

  17. The actual data arguably represent a virtual universe (that is, there is no sampling error, not withstanding the case underreporting). If this were the case, the standard errors shown on the right in Table 3 should technically be calculated from the multiply-imputed data alone, weighted for unit missingness, and not from the cases with known values. However, because the known observations number in the hundreds of thousands (and thus contribute little to the standard error estimates), the modified standard error calculations (that treat the actual cases as having fixed values) differ by a trivial amount, and therefore are not presented here.

  18. Although not presented here, there are also some noteworthy differences in regard to the weapon use. After weighting a higher percentage of cases were assigned to the other weapon category compared to the knife or gun category for the imputed cases. The reduction in gun cases after imputation was a result of treating the small percentage of unknown weapon cases as neither gun nor knife. As it happens, the percentage of NCHS homicides involving a firearm is within 0.5 percentage points of the final result for the weighted SHR file (64.36%) and that for knives was close as well. This supports the decision to classify the 4.2% missing weapon cases as “other” weapon.

  19. In analyses not presented here, we found that the joint distribution of victim–offender age, victim–offender sex, and victim–offender race differed little after imputation. This suggested that our imputation strategy preserved the relationship between these variables as suspected. In addition, we examined several additional substantively interesting bivariate relationships to determine whether or not our imputation strategy yielded consistent results.

References

  • Allison P (2002) Missing data. Sage Publications, Thousand Oaks, CA

    Google Scholar 

  • David M, Little RJA, Samuhel ME, Triest RK (1986) Alternative methods for CPS income imputation. J Am Stat Assoc 81:29–41

    Article  Google Scholar 

  • Decker SH (1993) Exploring victim-offender relationships in homicide: the role of the individual and event characteristics. Justice Q 10:585–612

    Article  Google Scholar 

  • Flewelling RL (2004) A nonparametric imputation approach for dealing with missing variables in SHR data. Hom Stud 8:255–266

    Article  Google Scholar 

  • Fox JA (2002) Uniform crime reports [United States]: supplementary homicide reports, 1976–1998 [Computer File]. ICPSR version. Boston, MA: Northeastern University, College of Criminal Justice [producer], 1997. Inter-University Consortium for Political and Social Research [distributor], Ann Arbor, MI

  • Fox JA (2004) Missing data problems in the SHR: imputing offender and relationship characteristics. Hom Stud 8:214–254

    Article  Google Scholar 

  • Fox JA, Swatt ML (2007) Multiple imputation of the supplementary homicide reports, 1976–2005. Final report. Submitted to the Law and Justice Statistics Program of the American Statistical Association and the Bureau of Justice Statistics

  • Greenfeld LA, Rand MR, Cravan D, Klaus PA, Perkins CA, Ringel C, Warchol G, Maston M, Fox JA (1998) Violence by intimates: analysis of data on crimes by current or former spouses, boyfriends, and girlfriends. Document No. NCJ-167237. Bureau of Justice Statistics, Washington, DC

  • Horton NJ, Lipsitz SR (2001) Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am Stat 55:244–254

    Article  Google Scholar 

  • Insightful Corporation (2001) Analyzing data with missing values in S-Plus. Insightful Corporation, Seattle, WA

    Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken, NJ

    Google Scholar 

  • Maltz MD (1999) Bridging gaps in police crime data. Bureau of Justice Statistics Fellows Program Discussion Paper, Document No. NCJ-176365. Bureau of Justice Statistics, Washington, DC

  • Maxfield M (1989) Circumstances in the supplementary homicide reports: variety and validity. Criminology 27:671–695

    Article  Google Scholar 

  • Messner SF, Deane G, Beaulieu M (2002) A log-multiplicative association model for allocating homicides with unknown victim-offender relationships. Criminology 40:457–479

    Article  Google Scholar 

  • Pampel FC, Williams KR (2000) Intimacy and homicide: compensating for missing data in the SHR. Criminology 38:661–680

    Article  Google Scholar 

  • Regoeczi WC, Riedel M (2003) The application of missing data estimation models to the problem of unknown victim/offender relationships in homicide cases. J Quant Criminol 19:155–183

    Article  Google Scholar 

  • Rennison CM (2003) Intimate partner violence, 1993–2001. Document No. NCJ-197838. Bureau of Justice Statistics, Washington, DC

  • Riedel M (1987) Stranger violence: perspectives, issues, and problems. J Crim Law Criminol 78:223–258

    Article  Google Scholar 

  • Riedel M (1998) Counting stranger homicides: a case study of statistical prestidigitation. Hom Stud 2:206–219

    Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  Google Scholar 

  • Rubin DB (1977) The design of a general and flexible system for handling nonresponse in sample surveys. Manuscript prepared for the U.S. Social Security Administration. Reprinted in Am Stat 58:298–302 (2004)

  • Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374

    Article  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, Boca Raton, FL

    Google Scholar 

  • Williams KR, Flewelling RL (1987) Family, acquaintance, and stranger homicide: alternative procedures for rate calculations. Criminology 25:543–560

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Paul Allison, Glenn Deane, Michael Maltz, Steven Messner, Donald Rubin, Joseph Schafer, and several anonymous reviewers for their insightful and helpful comments on previous versions of this manuscript. Support for this project was provided by the Law and Justice Statistics Program of the American Statistical Association and the Bureau of Justice Statistics. The views expressed are those of the authors and do not necessarily reflect the position of the United States Department of Justice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Alan Fox.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fox, J.A., Swatt, M.L. Multiple Imputation of the Supplementary Homicide Reports, 1976–2005. J Quant Criminol 25, 51–77 (2009). https://doi.org/10.1007/s10940-008-9058-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-008-9058-2

Keywords

Navigation