Estimation within the new integrated system of household surveys in Germany

Kamgar, Saeideh; Meinfelder, Florian; Münnich, Ralf; Navvabpour, Hamidreza

doi:10.1007/s00362-018-1023-z

Estimation within the new integrated system of household surveys in Germany

Regular Article
Published: 01 August 2018

Volume 61, pages 2091–2117, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Saeideh Kamgar¹,
Florian Meinfelder²,
Ralf Münnich ORCID: orcid.org/0000-0001-8285-5667³ &
…
Hamidreza Navvabpour¹

318 Accesses
2 Citations
Explore all metrics

Abstract

In 2015, the European Commission has drafted a framework regulation for integrated European social statistics. This integration covers the Labour Force Survey, the Statistics on Income and Living conditions, and others. In order to avoid an inappropriate response burden, administrative and other sources shall be considered to achieve accurate survey estimates. Combining information from different data sources has become a field of growing research interest among statistical offices and other institutions. In the statistical literature this problem is known as data fusion or statistical matching, and is widely considered as a particular missing-data pattern. Assuming that budgets are limited, and that only some additional information can be obtained to improve the quality of the data fusion, we investigate different scenarios of using these limited resources within an integrated system of household surveys. Our main objective is to develop a framework that fosters on the one hand the estimation of statistical models using several surveys, and on the other hand classical totals for different sub-classes and areas which are of special interest for official statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Measuring objective and subjective well-being: dimensions and data sources

Article Open access 29 June 2020

Sampling Techniques for Qualitative Research

References

Barnard J, Rubin DB (1999) Small-sample degrees of freedom with multiple imputation. Biometrika 86(4):948–955
Article MathSciNet Google Scholar
Battese GE, Harter RM, Fuller WA (1988) An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 83(401):28–36
Article Google Scholar
Burgard JP, Kolb JP, Merkle H, Münnich R (2017) Synthetic data for open and reproducible methodological research in social sciences and official statistics. AStA Wirtsch Soz Arch 11(3):233–244. https://doi.org/10.1007/s11943-017-0214-8
Article Google Scholar
Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, New York
MATH Google Scholar
Das K, Jiang J, Rao JNK (2004) Mean squared error of empirical predictor. Ann Stat 32(2):818–840
Article MathSciNet Google Scholar
Datta GS, Lahiri P (2000) A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Stat Sin 10(2):613–627
MathSciNet MATH Google Scholar
European Commission (2016) Proposal for a regulation of the European parliament and of the council, establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples. COM(2016) 551 final, 2016/0264 (COD)
Fay RE, Herriot RA (1979) Estimates of income for small places: an application of James-Stein procedures to census data. J Am Stat Assoc 74(366):269–277
Article MathSciNet Google Scholar
Gelman A, King G, Liu C (1998) Not asked and not answered: multiple imputation for multiple surveys. J Am Stat Assoc 93(443):846–857
Article Google Scholar
Goldstein H (2011) Multilevel statistical models. Wiley, New York
MATH Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685
Article MathSciNet Google Scholar
Jiang J, Lahiri P (2006) Mixed model prediction and small area estimation. Test 15:1–96
Article MathSciNet Google Scholar
Kamgar S, Navvabpour H (2017) An efficient method for estimating population parameters using split questionnaire design. J Stat Res Iran 14(1):77–99
Article Google Scholar
Kennickell AB (1991) Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. In: Proceedings of the survey research methods section of the American Statistical Association, pp. 1–10
Koller-Meinfelder F (2009) Analysis of incomplete survey data—multiple imputation via Bayesian bootstrap predictive mean matching. PhD thesis, University of Bamberg, Germany
Lehtonen R, Veijanen A (2009) Design-based methods of estimation for domains and small areas. In: Pfeffermann D, Rao C (eds) Sample surveys: inference and analysis, handbook of statistics, vol 29B, chap 31, pp 219–249. North-Holland, Amsterdam
Li H, Liu Y, Zhang R (2017) Small area estimation under transformed nested-error regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0879-7
Little RJ (1988) Missing-data adjustments in large surveys. J Bus Econ Stat 6(3):287–296
Google Scholar
McCulloch CE, Searle SR (2001) Generalized, linear and mixed models. Wiley, New York
MATH Google Scholar
Münnich R, Burgard J (2012) On the influence of sampling design on small area estimates. J Indian Soc Agric Stat 66(1):145–156
MathSciNet Google Scholar
Münnich R, Burgard JP, Vogt M (2013) Small area-statistik: methoden und anwendungen. AStA Wirtsch Soz Archiv 6:149–191
Article Google Scholar
Pfeffermann D, Sverchkov M (1999) Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhyā: Indian J Stat Ser B (1960-2002) 61(1):166–186
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Raghunathan TE, Grizzle JE (1995) A split questionnaire survey design. J Am Stat Assoc 90:54–63
Article Google Scholar
Rao J, Molina I (2015) Small area estimation, 2nd edn. Wiley, New York
Book Google Scholar
Rässler S (2002) Statistical matching. Lecture Notes in Statistics. Springer, New York
Riede T (2013) Die Weiterentwicklung des Systems der amtlichen Haushaltsstatistiken. In: Riede T, Bechtold S, Ott N (eds) Weiterentwicklung der amtlichen Haushaltsstatistiken. SciVero, Berlin
Google Scholar
Rodgers WL (1984) An evaluation of statistical matching. J Bus Econ Stat 2:91–102
Google Scholar
Rubin DB (1978) Multiple imputation in sample surveys—a phenomological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Method Section of the American Statistical Association, pp 20–34
Rubin DB (1986) Statistical matching using file concatenation with adjusted weights and multiple imputation. J Bus Econ Stat 4:87–95
Google Scholar
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Book Google Scholar
Särndal CE, Swensson B, Wretman J (2003) Model assisted survey sampling. Springer, New York
MATH Google Scholar
Schmid T, Münnich R (2014) Spatial robust small area estimation. Stat Pap 55(3):653–670
Article MathSciNet Google Scholar
Schmid T, Tzavidis N, Münnich R, Chambers R (2016) Outlier robust small-area estimation under spatial correlation. Scand J Stat 43(3):806–826
Article MathSciNet Google Scholar
Sims CA (1972) Comments (on Okner 1972). Ann Econ Soc Meas 1:343–345
Google Scholar
Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67
Article Google Scholar
Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76(12):1049–1064
Article MathSciNet Google Scholar
Verret F, Rao J, Hidiroglou MA (2015) Model-based small area estimation under informative sampling. Surv Methodol 41(2):333–347
Google Scholar
Ządło T (2009) On MSE of EBLUP. Stat Pap 50(1):101–118
Article MathSciNet Google Scholar
Zhu J, Raghunathan TE (2015) Convergence properties of a sequential regression multiple imputation algorithm. J Am Stat Assoc 110(511):1112–1124
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported within the RIFOSS project, financially supported by the German Federal Statistical Office. The first author wishes to thank Allameh Tabatabai University, Tehran, Iran, for providing financial support while working on this paper and during the six months visit at Trier University. Further, we thank the editor and two anonymous reviewers for providing very valuable comments that helped improving the paper.

Author information

Authors and Affiliations

Department of Statistics, Allameh Tabataba’i University, PO Box 14155-8473, Tehran, Iran
Saeideh Kamgar & Hamidreza Navvabpour
Lehrstuhl für Statistik und Ökonometrie, Universität Bamberg, Feldkirchenstraße 21, 96052, Bamberg, Germany
Florian Meinfelder
Wirtschafts- und Sozialstatistik, Universität Trier, FB IV, VWL, Universitätsring 15, 54296, Trier, Germany
Ralf Münnich

Authors

Saeideh Kamgar
View author publications
You can also search for this author in PubMed Google Scholar
Florian Meinfelder
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Münnich
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Navvabpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ralf Münnich.

Appendices

Appendix A: simulation study steps

Here, we briefly describe all steps used in our simulation study.

1.
A fixed population (size N), variables of interest and auxiliary information are defined.
2.
The parameters of interest including the model parameters (regression coefficients of a specified model) and the small area means (as local parameters) are defined .
3.
The proposed designs, D0, D1, D2, D3 and D4 are defined.
4.
A sample, called MC sample, of size n is selected from the fixed population (SRSWOR).
5.
For D0, all parameters of interest are estimated based on the complete information from the MC sample.
6.
For each design, D1, D2, D3 or D4, the following steps are performed.
7.
The subsample sizes and the overlap size (if needed) are determined. For D1 and D2, the subsample sizes are n1 and n2, where \(n1+n2=n\). For D3 and D4, the subsample sizes are n1 and n2, and the horizontal overlap sample size is n3, where \(n1+ n2-n3=n\).
8.
The MC sample (S) is randomly split into two disjoint (for D1 and D2) or overlap subsamples (for D3 and D4), called \(S_1\) and \(S_2\), where \(S_1 \cup S_2 = S\). The \(S_1 \cap S_2\) is denoted as \(S_3\), where \(S_3=\emptyset \), for D1 and D2, and \(S_3 \ne \emptyset \) for D3 and D4.
9.
The design is applied on the complete information (available from the MC sample). According to definition of the design, NA values are inserted into dataset for those variables which are not asked from the corresponding sample units.
10.
We use the function mice in the mice package of the statistical software R to impute the NA values of dataset. Here, the number of imputations (M), number of iterations, and method of imputation (e.g. predictive mean matching) are determined. Then, M completed datasets are constructed by mice function.
11.
In order to estimate the model parameters, the combined point estimates and the corresponding confidence intervals and fractions of missing information are obtained based on M completed dataset, using the function pool of the mice package.
12.
In order to estimate the small area means based on M completed datasets, we obtain the estimator for each completed dataset, using different methods (HT, GREG, SAE under unit level model, SAE under area-level model). Then, for each method, the resulting estimators have been combined (using Rubin’s combination formula defined in Sect. 3.1.3 ) to obtain the overall small area estimates, \(\hat{\mu }_{d, \mathrm{HT}}\), \(\hat{\mu }_{d, \mathrm{GREG}}\), \(\hat{\mu }_{d, \mathrm{BHF}}\) and \(\hat{\mu }_{d, \mathrm{FH}}\).
13.
As a design-based Monte-Carlo simulation study, we repeat steps 4–12, R times.
14.
Finally, all measures of interest are derived.

Appendix B: convergence diagnostics

Boxplots for groups of 10 subsequent iterations (for the version with 100 iteration) and groups of 100 subsequent iterations (for the version with 1000 iterations) help to assess if convergence in distribution can be assumed (Figs. 9 and 10).

Appendix C: coverage probabilities for small area estimates

The area-specific sample sizes vary mainly around 200–300 with outliers of 16, 68, and 97 for small areas (shown as red crosses) as well as 406 and 800 for large areas (shown as blue triangles). The separation between small, medium-size, and large areas was done by the first and third quartile of area-specific sample sizes (Fig. 11).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamgar, S., Meinfelder, F., Münnich, R. et al. Estimation within the new integrated system of household surveys in Germany. Stat Papers 61, 2091–2117 (2020). https://doi.org/10.1007/s00362-018-1023-z

Download citation

Received: 30 August 2017
Revised: 14 July 2018
Published: 01 August 2018
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00362-018-1023-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation within the new integrated system of household surveys in Germany

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Measuring objective and subjective well-being: dimensions and data sources

Sampling Techniques for Qualitative Research

References

Acknowledgements