Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Synthetic Microdata

  • Josep Domingo-Ferrer
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1501

Synonyms

Imputed data; Multiple imputation; Simulated data

Definition

Publication of synthetic – i.e., simulated – data is an alternative to masking for statistical disclosure control of microdata. The idea is to randomly generate data with the constraint that certain statistics or internal relationships of the original dataset should be preserved.

Key Points

The operation of the original proposal by Rubin [2] is next outlined. Consider an original microdata set X of size n records drawn from a much larger population of N individuals, where there are background attributes A, non-confidential attributes B and confidential attributes C. Background attributes are observed and available for all N individuals in the population, whereas B and C are only available for the n records in the sample X. The first step is to construct from X a multiply-imputed population of N individuals. This population consists of the n records in X and M(the number of multiple imputations, typically between 3...

This is a preview of subscription content, log in to check access.

Recommeded Reading

  1. 1.
    Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Longhurst J, Nordholt ES, Seri G, De Wolf P-P. Handbook on statistical disclosure control. CENEX SDC Project, November 2006 (manuscript version 1.0). http://neon.vb.cbs.nl/CENEX/
  2. 2.
    Rubin DB. Discussion of statistical disclosure limitation. J Off Stat. 1993;9(2):461–8.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Universitat Rovira i VirgiliTarragonaSpain

Section editors and affiliations

  • Elena Ferrari
    • 1
  1. 1.DiSTAUniv. of InsubriaVareseItaly