Skip to main content

Synthetic Data for Small Area Estimation

  • Conference paper
Privacy in Statistical Databases (PSD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6344))

Included in the following conference series:

Abstract

Increasingly, researchers are demanding greater access to microdata for small geographic areas to compute estimates that may affect policy decisions at local levels. Statistical agencies are prevented from releasing detailed geographical identifiers in public-use data sets due to privacy and confidentiality concerns. Existing procedures allow researchers access to restricted geographical information through a limited number of Research Data Centers (RDCs), but this method of data access is not convenient for all. An alternative approach is to release fully-synthetic, public-use microdata files that contain enough geographical details to permit small area estimation. We illustrate this method by using a Bayesian Hierarchical model to create synthetic data sets from the posterior predictive distribution. We evaluate the analytic validity of the synthetic data by comparing small area estimates obtained from the synthetic data with estimates obtained from the U.S. American Community Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tranmer, M., Pickles, A., Fieldhouse, E., Elliot, M., Dale, A., Brown, M., Martin, D., Steel, D., Gardiner, C.: The case for small area microdata. J. Roy. Stat. Soc. A 168, 29–49 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  2. Rubin, D.B.: Satisfying confidentiality constraints through the use of synthetic multiply-imputed microdata. J. Off. Stat. 9, 461–468 (1993)

    Google Scholar 

  3. Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19, 1–16 (2003)

    Google Scholar 

  4. Platek, R., Rao, J.N.K., Sarndal, C.E., Singh, M.P.: Small area statistics. Wiley, New York (1987)

    Google Scholar 

  5. Rao, J.N.K.: Small Area Estimation. Wiley, New York (2003)

    Book  MATH  Google Scholar 

  6. Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9, 407–426 (1993)

    Google Scholar 

  7. Kennickell, A.B.: Multiple imputation and disclosure protection: the case of the 1995 Survey of Consumer Finances. In: Alvey, W., Jamerson, B. (eds.) Record Linkage Techniques 1997, pp. 248–267. National Academy Press, Washington DC (1997)

    Google Scholar 

  8. Liu, F., Little, R.J.A.: Selective multiple imputation of keys for statistical disclosure control in microdata. In: Proceedings of the Joint Statistical Meetings, pp. 2133–2138. American Statistical Association, Blacksburg (2002)

    Google Scholar 

  9. Reiter, J.P.: Inference for partially synthetic public use microdata sets. Surv. Methodol. 29, 181–188 (2003)

    Google Scholar 

  10. Drechsler, J., Bender, S., Raessler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB establishment panel. Trans. Data Priv. 1(3), 105–130 (2008)

    Google Scholar 

  11. Rodriguez, R.: Synthetic data disclosure control for American Community Survey group quarters. In: Proceedings of the Joint Statistical Meetings, pp. 1439–1450. American Statistical Association, Salt Lake City (2007)

    Google Scholar 

  12. Abowd, J.M., Stinson, M., Benedetto, G.: Final report to the Social Security Administration on the SIPP/SSA/IRS public use file project. Technical report, U.S. Census Bureau Longitudinal Employer-Household Dynamics Program (2006)

    Google Scholar 

  13. Kinney, S.K., Reiter, J.P.: Making public use, synthetic files of the Longitudinal Business Database. In: Privacy in Statistical Databases: UNESCO Chair in Data Privacy International Conference Proceedings, Istanbul, Turkey (2008)

    Google Scholar 

  14. Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27, 85–95 (2001)

    Google Scholar 

  15. Reiter, J.P.: Releasing multiply-imputed, synthetic public use microdata: an illustration and empirical study. J. Roy. Stat. Soc. A 168, 185–205 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Reiter, J.P.: Simultaneous use of multiple imputation for missing data and disclosure limitation. Surv. Methodol. 30, 235–242 (2004)

    Google Scholar 

  17. Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)

    Google Scholar 

  18. Fay III., R.E., Herriot, R.A.: Estimates of income for small places: an application of James-Stein procedures to Census data. J. Amer. Stat. Assoc. 74(366), 269–277 (1979)

    Article  MathSciNet  Google Scholar 

  19. Malec, D., Sedransk, J., Moriarity, C.L., LeClere, F.B.: Small area inference for binary variables in the National Health Interview Survey. J. Amer. Stat. Assoc. 92(439), 815–826 (1997)

    Article  MATH  Google Scholar 

  20. Yucel, R.M.: Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Phil. Trans. R. Soc. A 366(2008), 2389–2403 (1874)

    MathSciNet  Google Scholar 

  21. Reiter, J.P., Raghunathan, T.E., Kinney, S.: The importance of modeling the sampling design in multiple imputation for missing data. Surv. Methodol. 32, 143–150 (2006)

    Google Scholar 

  22. Yu, M.: Disclosure Risk Assessments and Control. Doctoral Dissertation, University of Michigan (2008)

    Google Scholar 

  23. Datta, G.S., Fay, R.E., Ghosh, M.: Hierarchical and empirical Bayes multivariate analysis in small area estimation. In: Proceedings of the Bureau of the Census 1991 Annual Research Conference, pp. 63–79. U.S. Bureau of the Census, Washington (1991)

    Google Scholar 

  24. Rao, J.N.K.: Some recent advances in model-based small area estimation. Surv. Methodol. 25, 175–186 (1999)

    Google Scholar 

  25. Lindley, D.V., Smith, A.F.M.: Bayes estimates for the linear model. J. Roy. Stat. Soc. B 34(1), 1–41 (1972)

    MATH  MathSciNet  Google Scholar 

  26. Binder, D.A.: On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  27. Skinner, C.J., Holt, D., Smith, T.M.F.: Analysis of complex surveys. Wiley, Chichester (1989)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sakshaug, J.W., Raghunathan, T.E. (2010). Synthetic Data for Small Area Estimation. In: Domingo-Ferrer, J., Magkos, E. (eds) Privacy in Statistical Databases. PSD 2010. Lecture Notes in Computer Science, vol 6344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15838-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15838-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15837-7

  • Online ISBN: 978-3-642-15838-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics