Advertisement

Statistical Methods & Applications

, Volume 20, Issue 3, pp 383–407 | Cite as

Simulation of close-to-reality population data for household surveys with application to EU-SILC

  • Andreas Alfons
  • Stefan Kraft
  • Matthias Templ
  • Peter Filzmoser
Article

Abstract

Statistical simulation in survey statistics is usually based on repeatedly drawing samples from population data. Furthermore, population data may be used in courses on survey statistics to explain issues regarding, e.g., sampling designs. Since the availability of real population data is in general very limited, it is necessary to generate synthetic data for such applications. The simulated data need to be as realistic as possible, while at the same time ensuring data confidentiality. This paper proposes a method for generating close-to-reality population data for complex household surveys. The procedure consists of four steps for setting up the household structure, simulating categorical variables, simulating continuous variables and splitting continuous variables into different components. It is not required to perform all four steps so that the framework is applicable to a broad class of surveys. In addition, the proposed method is evaluated in an application to the European Union Statistics on Income and Living Conditions (EU-SILC).

Keywords

Synthetic data Simulation Survey statistics EU-SILC 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfons A (2010) \({\tt{simFrame}}\): simulation framework. R package version 0.3.7Google Scholar
  2. Alfons A, Kraft S (2010) \({\tt{simPopulation}}\): simulation of synthetic populations for surveys based on sample data. R package version 0.2.1Google Scholar
  3. Alfons A, Templ M, Filzmoser P (2010a) An object-oriented framework for statistical simulation: the R package \({\tt{simFrame}}\). J Stat Softw 37(3): 1–36Google Scholar
  4. Alfons A, Templ M, Filzmoser P (2010b) Simulation of EU-SILC population data: using the R package \({\tt{simPopulation}}\). Research Report CS-2010-5, Department of Statistics and Probability Theory, Vienna University of TechnologyGoogle Scholar
  5. Atkinson T, Cantillon B, Marlier E, Nolan B (2002) Social indicators: the EU and social inclusion. Oxford University Press, New York ISBN 0-19-925349-8Google Scholar
  6. Clarke G (1996) Microsimulation: an introduction. In: Clarke G (ed) Microsimulation for urban and regional policy analysis. Pion, LondonGoogle Scholar
  7. Drechsler J, Bender S, Rässler S (2008) Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans Data Priv 1(3): 105–130MathSciNetGoogle Scholar
  8. Embrechts P, Klüppelberg G, Mikosch T (1997) Modelling extremal events for insurance and finance. Springer, New York ISBN 3-540-60931-8zbMATHGoogle Scholar
  9. Eurostat (2004) Description of target variables: cross-sectional and longitudinal. EU-SILC 065/04, Eurostat, LuxembourgGoogle Scholar
  10. Horvitz D, Thompson D (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260): 663–685MathSciNetzbMATHCrossRefGoogle Scholar
  11. Kendall M, Stuart A (1967) The advanced theory of statistics, vol 2, 2nd edn. Charles Griffin & Co. Ltd, LondonGoogle Scholar
  12. Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. Wiley, Hoboken ISBN 0-471-15064-9zbMATHCrossRefGoogle Scholar
  13. Kraft S (2009) Simulation of a population for the European living and income conditions survey. Master’s thesis, Vienna University of TechnologyGoogle Scholar
  14. Meyer D, Zeileis A, Hornik K (2006) The \({\tt{strucplot}}\) framework: visualizing multi-way contingency tables with \({\tt{vcd}}\). J Stat Softw 17(3): 1–48Google Scholar
  15. Meyer D, Zeileis A, Hornik K (2010) \({\tt{vcd}}\): visualizing categorical data. R package version 1.2–9Google Scholar
  16. Münnich R, Schürle J (2003) On the simulation of complex universes in the case of applying the German Microcensus. DACSEIS research paper series No. 4, University of TübingenGoogle Scholar
  17. Münnich R, Schürle J, Bihler W, Boonstra HJ, Knotterus P, Nieuwenbroek N, Haslinger A, Laaksonen S, Eckmair D, Quatember A, Wagner H, Renfer JP, Oetliker U, Wiegert R (2003) Monte Carlo simulation study of European surveys. DACSEIS Deliverables D3.1 and D3.2, University of TübingenGoogle Scholar
  18. Raghunathan T, Reiter J, Rubin D (2003) Multiple imputation for statistical disclosure limitation. J Off Stat 19(1): 1–16Google Scholar
  19. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0Google Scholar
  20. Reiter J (2009) Using multiple imputation to integrate and disseminate confidential microdata. Int Stat Rev 77(2): 179–195CrossRefGoogle Scholar
  21. Rubin D (1993) Discussion: statistical disclosure limitation. J Off Stat 9(2): 461–468Google Scholar
  22. Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York ISBN 978-0-387-75968-5zbMATHGoogle Scholar
  23. Sarkar D (2011) \({\tt{lattice}}\): lattice graphics. R package version 0.19-17Google Scholar
  24. Simonoff J (2003) Analyzing categorical data. Springer, New York ISBN 0-387-00749-0zbMATHGoogle Scholar
  25. Templ M, Alfons A (2010) Disclosure risk of synthetic population data with application in the case of EU-SILC. In: Domingo-Ferrer J, Magkos E (eds) Privacy in statistical databases. Lecture notes in computer science, vol 6344. Springer, Heidelberg, pp 174–186Google Scholar
  26. Walker A (1977) An efficient method for generating discrete random variables with general distributions. ACM Trans Math Softw 3(3): 253–256zbMATHCrossRefGoogle Scholar
  27. Weisberg S (2005) Applied linear regression, 3rd edn. Wiley, Hoboken ISBN 0-471-66379-4zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Andreas Alfons
    • 1
  • Stefan Kraft
    • 1
    • 2
  • Matthias Templ
    • 1
    • 3
  • Peter Filzmoser
    • 1
  1. 1.Department of Statistics and Probability TheoryVienna University of TechnologyViennaAustria
  2. 2.Institute for Quantitative Asset Management (IQAM)ViennaAustria
  3. 3.Methods Unit, Statistics AustriaViennaAustria

Personalised recommendations