RNR Simulation Tool: A Synthetic Datasets and Its Uses for Policy Simulations



Evidence-based practice, as the name suggests, requires evidence to support policy and practice. Typically this evidence comes in the form of data—about choices, policy options, and outcomes. However, such data can be very hard to come by for most jurisdictions or agencies. How should they utilize the available evidence to best support sound decisions? This chapter describes the use of synthetic datasets for this purpose. Synthetic datasets have, at their core, theoretically possible attribute profiles. These profiles represent the support space of a population of interest. The profiles are weighted (or re-weighted) to reflect different aggregate properties. The properties may reflect such features as means, rates, variances, covariances, correlations, etc., of various attributes. In effect, once constructed, the synthetic dataset can be analyzed in much the same way as a real sample from the population of interest.

Two aspects of synthetic datasets make them particularly appealing for policy simulations. First, the weights can be constructed to conform to disparate pieces of evidence. Evidence available from different sources can be combined and used to populate the synthetic dataset. Second, the synthetic dataset can be re-weighted to reflect jurisdiction-specific or localized attribute features. In other words, the synthetic datasets can be customized to reflect the characteristics of a local jurisdiction, thereby making it more useful for localized policy simulations. This chapter describes the methodology used in constructing and re-weighting synthetic datasets and demonstrates the procedure with real data from several jurisdictions. Finally, the chapter will describe how this synthetic data is used in the full web-based simulation tool.


  1. Abowd, J. M., & Woodcock, S. (2001). Disclosure limitation in longitudinal linked data. In P. Doyle, J. Lane, J. Theeuwes, & L. Zayatz (Eds.), Confidentiality, disclosure and data access: Theory and practical applications for statistical agencies (pp. 215–277). Amsterdam: North Holland.Google Scholar
  2. Bhati, A., Roman, J., & Chalfin, A. (2008). To treat or not to treat: Evidence on the prospects of expanding treatment to drug-involved offenders. Washington, DC: The Urban Institute.Google Scholar
  3. Golan, A., Judge, G., & Miller, D. (1996). Maximum entropy econometrics: Robust estimation with limited data. Chichester: Wiley.Google Scholar
  4. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physics Review, 106, 620–630.CrossRefGoogle Scholar
  5. Kullback, S. (1959). Information theory and statistics. New York, NY: Wiley.Google Scholar
  6. Raghunathan, T. E., Reiter, J. P., & Rubin, D. B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics., 19, 1–19.Google Scholar
  7. Reiter, J. (2002). Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18, 531–544.Google Scholar
  8. Reiter, J. (2003). Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, 168, 185–205.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Avinash Bhati
    • 1
  • Erin L. Crites
    • 2
  • Faye S. Taxman
    • 2
  1. 1.Maxarth, LLCGaithersburgUSA
  2. 2.Department of Criminology, Law and SocietyGeorge Mason UniversityFairfaxUSA

Personalised recommendations