Skip to main content
Log in

Estimation within the new integrated system of household surveys in Germany

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In 2015, the European Commission has drafted a framework regulation for integrated European social statistics. This integration covers the Labour Force Survey, the Statistics on Income and Living conditions, and others. In order to avoid an inappropriate response burden, administrative and other sources shall be considered to achieve accurate survey estimates. Combining information from different data sources has become a field of growing research interest among statistical offices and other institutions. In the statistical literature this problem is known as data fusion or statistical matching, and is widely considered as a particular missing-data pattern. Assuming that budgets are limited, and that only some additional information can be obtained to improve the quality of the data fusion, we investigate different scenarios of using these limited resources within an integrated system of household surveys. Our main objective is to develop a framework that fosters on the one hand the estimation of statistical models using several surveys, and on the other hand classical totals for different sub-classes and areas which are of special interest for official statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Barnard J, Rubin DB (1999) Small-sample degrees of freedom with multiple imputation. Biometrika 86(4):948–955

    Article  MathSciNet  Google Scholar 

  • Battese GE, Harter RM, Fuller WA (1988) An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 83(401):28–36

    Article  Google Scholar 

  • Burgard JP, Kolb JP, Merkle H, Münnich R (2017) Synthetic data for open and reproducible methodological research in social sciences and official statistics. AStA Wirtsch Soz Arch 11(3):233–244. https://doi.org/10.1007/s11943-017-0214-8

    Article  Google Scholar 

  • Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, New York

    MATH  Google Scholar 

  • Das K, Jiang J, Rao JNK (2004) Mean squared error of empirical predictor. Ann Stat 32(2):818–840

    Article  MathSciNet  Google Scholar 

  • Datta GS, Lahiri P (2000) A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Stat Sin 10(2):613–627

    MathSciNet  MATH  Google Scholar 

  • European Commission (2016) Proposal for a regulation of the European parliament and of the council, establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples. COM(2016) 551 final, 2016/0264 (COD)

  • Fay RE, Herriot RA (1979) Estimates of income for small places: an application of James-Stein procedures to census data. J Am Stat Assoc 74(366):269–277

    Article  MathSciNet  Google Scholar 

  • Gelman A, King G, Liu C (1998) Not asked and not answered: multiple imputation for multiple surveys. J Am Stat Assoc 93(443):846–857

    Article  Google Scholar 

  • Goldstein H (2011) Multilevel statistical models. Wiley, New York

    MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685

    Article  MathSciNet  Google Scholar 

  • Jiang J, Lahiri P (2006) Mixed model prediction and small area estimation. Test 15:1–96

    Article  MathSciNet  Google Scholar 

  • Kamgar S, Navvabpour H (2017) An efficient method for estimating population parameters using split questionnaire design. J Stat Res Iran 14(1):77–99

    Article  Google Scholar 

  • Kennickell AB (1991) Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. In: Proceedings of the survey research methods section of the American Statistical Association, pp. 1–10

  • Koller-Meinfelder F (2009) Analysis of incomplete survey data—multiple imputation via Bayesian bootstrap predictive mean matching. PhD thesis, University of Bamberg, Germany

  • Lehtonen R, Veijanen A (2009) Design-based methods of estimation for domains and small areas. In: Pfeffermann D, Rao C (eds) Sample surveys: inference and analysis, handbook of statistics, vol 29B, chap 31, pp 219–249. North-Holland, Amsterdam

  • Li H, Liu Y, Zhang R (2017) Small area estimation under transformed nested-error regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0879-7

  • Little RJ (1988) Missing-data adjustments in large surveys. J Bus Econ Stat 6(3):287–296

    Google Scholar 

  • McCulloch CE, Searle SR (2001) Generalized, linear and mixed models. Wiley, New York

    MATH  Google Scholar 

  • Münnich R, Burgard J (2012) On the influence of sampling design on small area estimates. J Indian Soc Agric Stat 66(1):145–156

    MathSciNet  Google Scholar 

  • Münnich R, Burgard JP, Vogt M (2013) Small area-statistik: methoden und anwendungen. AStA Wirtsch Soz Archiv 6:149–191

    Article  Google Scholar 

  • Pfeffermann D, Sverchkov M (1999) Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhyā: Indian J Stat Ser B (1960-2002) 61(1):166–186

  • R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Raghunathan TE, Grizzle JE (1995) A split questionnaire survey design. J Am Stat Assoc 90:54–63

    Article  Google Scholar 

  • Rao J, Molina I (2015) Small area estimation, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Rässler S (2002) Statistical matching. Lecture Notes in Statistics. Springer, New York

  • Riede T (2013) Die Weiterentwicklung des Systems der amtlichen Haushaltsstatistiken. In: Riede T, Bechtold S, Ott N (eds) Weiterentwicklung der amtlichen Haushaltsstatistiken. SciVero, Berlin

    Google Scholar 

  • Rodgers WL (1984) An evaluation of statistical matching. J Bus Econ Stat 2:91–102

    Google Scholar 

  • Rubin DB (1978) Multiple imputation in sample surveys—a phenomological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Method Section of the American Statistical Association, pp 20–34

  • Rubin DB (1986) Statistical matching using file concatenation with adjusted weights and multiple imputation. J Bus Econ Stat 4:87–95

    Google Scholar 

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  Google Scholar 

  • Särndal CE, Swensson B, Wretman J (2003) Model assisted survey sampling. Springer, New York

    MATH  Google Scholar 

  • Schmid T, Münnich R (2014) Spatial robust small area estimation. Stat Pap 55(3):653–670

    Article  MathSciNet  Google Scholar 

  • Schmid T, Tzavidis N, Münnich R, Chambers R (2016) Outlier robust small-area estimation under spatial correlation. Scand J Stat 43(3):806–826

    Article  MathSciNet  Google Scholar 

  • Sims CA (1972) Comments (on Okner 1972). Ann Econ Soc Meas 1:343–345

    Google Scholar 

  • Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67

    Article  Google Scholar 

  • Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76(12):1049–1064

    Article  MathSciNet  Google Scholar 

  • Verret F, Rao J, Hidiroglou MA (2015) Model-based small area estimation under informative sampling. Surv Methodol 41(2):333–347

    Google Scholar 

  • Ządło T (2009) On MSE of EBLUP. Stat Pap 50(1):101–118

    Article  MathSciNet  Google Scholar 

  • Zhu J, Raghunathan TE (2015) Convergence properties of a sequential regression multiple imputation algorithm. J Am Stat Assoc 110(511):1112–1124

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported within the RIFOSS project, financially supported by the German Federal Statistical Office. The first author wishes to thank Allameh Tabatabai University, Tehran, Iran, for providing financial support while working on this paper and during the six months visit at Trier University. Further, we thank the editor and two anonymous reviewers for providing very valuable comments that helped improving the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralf Münnich.

Appendices

Appendix A: simulation study steps

Here, we briefly describe all steps used in our simulation study.

  1. 1.

    A fixed population (size N), variables of interest and auxiliary information are defined.

  2. 2.

    The parameters of interest including the model parameters (regression coefficients of a specified model) and the small area means (as local parameters) are defined .

  3. 3.

    The proposed designs, D0, D1, D2, D3 and D4 are defined.

  4. 4.

    A sample, called MC sample, of size n is selected from the fixed population (SRSWOR).

  5. 5.

    For D0, all parameters of interest are estimated based on the complete information from the MC sample.

  6. 6.

    For each design, D1, D2, D3 or D4, the following steps are performed.

  7. 7.

    The subsample sizes and the overlap size (if needed) are determined. For D1 and D2, the subsample sizes are n1 and n2, where \(n1+n2=n\). For D3 and D4, the subsample sizes are n1 and n2, and the horizontal overlap sample size is n3, where \(n1+ n2-n3=n\).

  8. 8.

    The MC sample (S) is randomly split into two disjoint (for D1 and D2) or overlap subsamples (for D3 and D4), called \(S_1\) and \(S_2\), where \(S_1 \cup S_2 = S\). The \(S_1 \cap S_2\) is denoted as \(S_3\), where \(S_3=\emptyset \), for D1 and D2, and \(S_3 \ne \emptyset \) for D3 and D4.

  9. 9.

    The design is applied on the complete information (available from the MC sample). According to definition of the design, NA values are inserted into dataset for those variables which are not asked from the corresponding sample units.

  10. 10.

    We use the function mice in the mice package of the statistical software R to impute the NA values of dataset. Here, the number of imputations (M), number of iterations, and method of imputation (e.g. predictive mean matching) are determined. Then, M completed datasets are constructed by mice function.

  11. 11.

    In order to estimate the model parameters, the combined point estimates and the corresponding confidence intervals and fractions of missing information are obtained based on M completed dataset, using the function pool of the mice package.

  12. 12.

    In order to estimate the small area means based on M completed datasets, we obtain the estimator for each completed dataset, using different methods (HT, GREG, SAE under unit level model, SAE under area-level model). Then, for each method, the resulting estimators have been combined (using Rubin’s combination formula defined in Sect. 3.1.3 ) to obtain the overall small area estimates, \(\hat{\mu }_{d, \mathrm{HT}}\), \(\hat{\mu }_{d, \mathrm{GREG}}\), \(\hat{\mu }_{d, \mathrm{BHF}}\) and \(\hat{\mu }_{d, \mathrm{FH}}\).

  13. 13.

    As a design-based Monte-Carlo simulation study, we repeat steps 4–12, R times.

  14. 14.

    Finally, all measures of interest are derived.

Appendix B: convergence diagnostics

Boxplots for groups of 10 subsequent iterations (for the version with 100 iteration) and groups of 100 subsequent iterations (for the version with 1000 iterations) help to assess if convergence in distribution can be assumed (Figs. 9 and 10).

Fig. 9
figure 9

Convergence diagnostics (100 and 1000 iterations) for \(\beta _1\), based on D3 and D4

Fig. 10
figure 10

Convergence diagnostics (100 and 1000 iterations) for standard deviations of \(\beta _1\), based on D3 and D4

Appendix C: coverage probabilities for small area estimates

The area-specific sample sizes vary mainly around 200–300 with outliers of 16, 68, and 97 for small areas (shown as red crosses) as well as 406 and 800 for large areas (shown as blue triangles). The separation between small, medium-size, and large areas was done by the first and third quartile of area-specific sample sizes (Fig. 11).

Fig. 11
figure 11

Coverage probabilities (CP) versus log mean length of confidence intervals by design and method for \(\mu _{d}\) of Y (dashed lines denote nominal coverage of 95%). (Color figure online)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamgar, S., Meinfelder, F., Münnich, R. et al. Estimation within the new integrated system of household surveys in Germany. Stat Papers 61, 2091–2117 (2020). https://doi.org/10.1007/s00362-018-1023-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1023-z

Keywords

Navigation