Abstract
Increasingly, researchers are demanding greater access to microdata for small geographic areas to compute estimates that may affect policy decisions at local levels. Statistical agencies are prevented from releasing detailed geographical identifiers in public-use data sets due to privacy and confidentiality concerns. Existing procedures allow researchers access to restricted geographical information through a limited number of Research Data Centers (RDCs), but this method of data access is not convenient for all. An alternative approach is to release fully-synthetic, public-use microdata files that contain enough geographical details to permit small area estimation. We illustrate this method by using a Bayesian Hierarchical model to create synthetic data sets from the posterior predictive distribution. We evaluate the analytic validity of the synthetic data by comparing small area estimates obtained from the synthetic data with estimates obtained from the U.S. American Community Survey.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tranmer, M., Pickles, A., Fieldhouse, E., Elliot, M., Dale, A., Brown, M., Martin, D., Steel, D., Gardiner, C.: The case for small area microdata. J. Roy. Stat. Soc. A 168, 29–49 (2005)
Rubin, D.B.: Satisfying confidentiality constraints through the use of synthetic multiply-imputed microdata. J. Off. Stat. 9, 461–468 (1993)
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19, 1–16 (2003)
Platek, R., Rao, J.N.K., Sarndal, C.E., Singh, M.P.: Small area statistics. Wiley, New York (1987)
Rao, J.N.K.: Small Area Estimation. Wiley, New York (2003)
Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9, 407–426 (1993)
Kennickell, A.B.: Multiple imputation and disclosure protection: the case of the 1995 Survey of Consumer Finances. In: Alvey, W., Jamerson, B. (eds.) Record Linkage Techniques 1997, pp. 248–267. National Academy Press, Washington DC (1997)
Liu, F., Little, R.J.A.: Selective multiple imputation of keys for statistical disclosure control in microdata. In: Proceedings of the Joint Statistical Meetings, pp. 2133–2138. American Statistical Association, Blacksburg (2002)
Reiter, J.P.: Inference for partially synthetic public use microdata sets. Surv. Methodol. 29, 181–188 (2003)
Drechsler, J., Bender, S., Raessler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB establishment panel. Trans. Data Priv. 1(3), 105–130 (2008)
Rodriguez, R.: Synthetic data disclosure control for American Community Survey group quarters. In: Proceedings of the Joint Statistical Meetings, pp. 1439–1450. American Statistical Association, Salt Lake City (2007)
Abowd, J.M., Stinson, M., Benedetto, G.: Final report to the Social Security Administration on the SIPP/SSA/IRS public use file project. Technical report, U.S. Census Bureau Longitudinal Employer-Household Dynamics Program (2006)
Kinney, S.K., Reiter, J.P.: Making public use, synthetic files of the Longitudinal Business Database. In: Privacy in Statistical Databases: UNESCO Chair in Data Privacy International Conference Proceedings, Istanbul, Turkey (2008)
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27, 85–95 (2001)
Reiter, J.P.: Releasing multiply-imputed, synthetic public use microdata: an illustration and empirical study. J. Roy. Stat. Soc. A 168, 185–205 (2005)
Reiter, J.P.: Simultaneous use of multiple imputation for missing data and disclosure limitation. Surv. Methodol. 30, 235–242 (2004)
Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)
Fay III., R.E., Herriot, R.A.: Estimates of income for small places: an application of James-Stein procedures to Census data. J. Amer. Stat. Assoc. 74(366), 269–277 (1979)
Malec, D., Sedransk, J., Moriarity, C.L., LeClere, F.B.: Small area inference for binary variables in the National Health Interview Survey. J. Amer. Stat. Assoc. 92(439), 815–826 (1997)
Yucel, R.M.: Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Phil. Trans. R. Soc. A 366(2008), 2389–2403 (1874)
Reiter, J.P., Raghunathan, T.E., Kinney, S.: The importance of modeling the sampling design in multiple imputation for missing data. Surv. Methodol. 32, 143–150 (2006)
Yu, M.: Disclosure Risk Assessments and Control. Doctoral Dissertation, University of Michigan (2008)
Datta, G.S., Fay, R.E., Ghosh, M.: Hierarchical and empirical Bayes multivariate analysis in small area estimation. In: Proceedings of the Bureau of the Census 1991 Annual Research Conference, pp. 63–79. U.S. Bureau of the Census, Washington (1991)
Rao, J.N.K.: Some recent advances in model-based small area estimation. Surv. Methodol. 25, 175–186 (1999)
Lindley, D.V., Smith, A.F.M.: Bayes estimates for the linear model. J. Roy. Stat. Soc. B 34(1), 1–41 (1972)
Binder, D.A.: On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292 (1983)
Skinner, C.J., Holt, D., Smith, T.M.F.: Analysis of complex surveys. Wiley, Chichester (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sakshaug, J.W., Raghunathan, T.E. (2010). Synthetic Data for Small Area Estimation. In: Domingo-Ferrer, J., Magkos, E. (eds) Privacy in Statistical Databases. PSD 2010. Lecture Notes in Computer Science, vol 6344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15838-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-15838-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15837-7
Online ISBN: 978-3-642-15838-4
eBook Packages: Computer ScienceComputer Science (R0)