Skip to main content
Log in

Inference for Singly Imputed Synthetic Data Based on Posterior Predictive Sampling under Multivariate Normal and Multiple Linear Regression Models

  • Published:
Sankhya B Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Likelihood-based finite sample inference for singly imputed synthetic data generated via posterior predictive sampling is developed in this paper for multivariate normal and multiple linear regression models. Currently available methodology for drawing valid inference on population parameters using synthetic data is based on concepts of multiple imputation for missing data, and therefore requires the release of multiple synthetic datasets. The methodology developed in this paper demonstrates that, contrary to the usual belief, valid inference about meaningful model parameters can indeed be drawn based on a singly imputed synthetic dataset under the multivariate normal and multiple linear regression models, by fully utilizing the model structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, T.W. (2003). An introduction to multivariate statistical analysis, (third edition). Wiley.

  • Drechsler, J. (2011). Synthetic datasets for statistical disclosure control: theory and implementation. Springer.

  • Hawala, S. (2008). Producing partially synthetic data to avoid disclosure. Proceedings of the joint statistical meetings, american statistical association.

  • Kinney, S.K., Reiter, J.P. and Miranda, J. (2014). SynLBD 2.0: improving the synthetic longitudinal business database. Statistical Journal of the International Association for Official Statistics 30, 129–135.

    Google Scholar 

  • Kinney, S.K., Reiter, J.P., Reznek, A.P., Miranda, J., Jarmin, R.S. and Abowd, J.M. (2011). Towards unrestricted public use business microdata: the synthetic longitudinal business database. International Statistical Review 79, 362–384.

    Article  Google Scholar 

  • Kshirsagar, A.M. (1972). Multivariate analysis, Marcel Dekker.

  • Little, R.J.A. (1993). Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.

    Google Scholar 

  • Muirhead, R.J. (1982). Aspects of multivariate statistical theory. Wiley.

  • Raghunathan, T.E., Reiter, J.P. and Rubin, D.B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19, 1–16.

    Google Scholar 

  • Reiter, J.P. (2003). Inference for partially synthetic, public use microdata sets. Survey Methodology 29, 181–188.

    Google Scholar 

  • Reiter, J.P. and Raghunathan, T.E. (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association 102, 1462–1471.

    Article  MathSciNet  MATH  Google Scholar 

  • Reiter, J.P. and Kinney, S.K. (2012). Inferentially valid, partially synthetic data: generating from posterior predictive distributions not necessary. Journal of Official Statistics 28, 583–590.

    Google Scholar 

  • Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. Wiley.

  • Rubin, D.B. (1993). Discussion: Statistical Disclosure Limitation. Journal of Official Statistics 9, 461–468.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Klein.

Additional information

Disclaimer. This article is released to inform interested parties of ongoing research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Klein, M., Sinha, B. Inference for Singly Imputed Synthetic Data Based on Posterior Predictive Sampling under Multivariate Normal and Multiple Linear Regression Models. Sankhya B 77, 293–311 (2015). https://doi.org/10.1007/s13571-015-0100-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-015-0100-8

Keywords and phrases

AMS (2000) subject classification

Navigation