Abstract
In multipurpose surveys several interest variables and a very large number of auxiliary variables are collected. Auxiliary variables are usually considered in calibration for improving estimates. But, very often, some of them are included for the sole purpose of increasing consistency. Consistency is an important point for National Statistical Institutes especially as a means for promoting credibility in published statistics. As a direct result, the number of auxiliary variables considered in calibration continue to grow over time. In literature, several methods show how to manage many auxiliary variables in order to prevent some unpleasant consequences on the accuracy of estimates. They consist mainly in variable selection or dimension reduction and they are very useful for deriving calibrated estimates more accurately. However, looking at them, it is not easy to infer how much the contribution of each auxiliary variable is, especially when there are plenty of them. The Shapley decomposition applied in the calibration context could be a useful tool to better understand the net effects of auxiliary variables, and, in addition, it provides further information for supporting researchers in choosing the best calibration system. It provides a direct measure of the change with respect to Horvitz–Thompson estimates and to related sampling variances due to the introduction of each auxiliary variable in the calibration. The method has been applied to real data of the Italian Labour Force Survey that makes an extensive use of auxiliary variables in calibration.
Similar content being viewed by others
Notes
For further details on the Italian LFS see, e.g., ISTAT (2014).
It stands for Computer Assisted Personal Interview. Except in particular circumstances in LFS the first occasion the household (first rotational group) are interviewed with this survey mode.
It stands for Computer Assisted Telephone Interview. Except in particular circumstances in LFS from second occasion the household (second, third and fourth rotational group) are interviewed with this survey mode.
References
Bankier MD, Rathwell S, Majkowski M (1992) Two step generalized least squares estimation in the 1991 Canadian Census. Statistics Canada, Ottawa
Bardsley P, Chambers RL (1984) Multipurpose estimation from unbalanced samples. J R Stat Soc C-Appl 33(3):290–299
Beaumont JF (2008) A new approach to weighting and inference in sample surveys. Biometrika 95(3):539–553
Bethlehem JG, Keller JW (1987) Linear weighting of sample survey data. J off Stat 3(2):141–153
Cardot H, Goga C, Shehzad MA (2017) Calibration and partial calibration on principal components when the number of auxiliary variables is large. Stat Sin 2017:243–260
Cassel CM, Särndal CE, Wretman JH (1979) Prediction theory for finite populations when model-based and design-based principles are combined: with an application to a study of choice of transportation mode across the Öresund Straits. Scand J Stat 1979:97–106
Ceccarelli C, Guandalini A (2011) Increasing the accuracy of it-silc estimates through the use of auxiliary variables from labour force survey. Ital J Appl Stat Stat Appl 24(1):103–115
Chambers RL (1996) Robust case-weighting for multipurpose establishment surveys. J off Stat 12(1):3–32
Chambers RL, Skinner C, Wang S (1999) Intelligent calibration. Bull Int Stat Inst 58(2):321–324
Clark RG, Chambers RL (2008) Adaptive calibration for prediction of finite population totals. Surv Methodol 34(2):163–172
Deutsch J, Silber J (2007) Decomposing income inequality by population subgroups: a generalization. Inequality and poverty. Emerald Group Publishing Limited, pp. 237–253
Devaud D, Tillé Y (2019) Deville and Särndal’s calibration: revisiting a 25-years-old successful optimization problem. TEST 28:1033–1065. https://doi.org/10.1007/s11749-019-00681-3
Dever JA, Valliant R (2010) A comparison of variance estimators for poststratification to estimated control totals. Sur Methodol 36(1):45–56
Deville J-C, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87(418):376–382
Deville J-C, Särndal C-E, Sautory O (1993) Generalized raking procedures in survey sampling. J Am Stat Assoc 88(423):1013–1020
Deville JC (1999) Simultaneous calibration of several surveys. In: Proceedings of statistics Canada symposium 99 - Combining data from different sources, Ottawa, Canada. Statistics Canada, pp 207–212
Fuller WA (2002) Regression estimation for survey samples. Sur Methodol 28(1):5–23
Giorgi GM, Guandalini A (2016) Bonferroni index decomposition and the Shapley method. Rivista Italiana Di Economia Demografia e Statistica 70(4):67–78
Giorgi G, Guandalini A (2018) Decomposing the Bonferroni inequality index by subgroups: Shapley value and balance of inequality. J Econom 6(2):1–16
Guandalini A, Tillé Y (2017) Design-based estimators calibrated on estimated totals from multiple surveys. Int Stat Rev 85(2):250–269
Guggemos F, Tillé Y (2010) Penalized calibration in survey sampling: design-based estimation assisted by mixed models. J Stat Plan Inference 140(11):3199–3212
Hausken K, Mohr M (2001) The value of a player in n-person games. Soc Choice Welf 18(3):465–483
Isaki CT, Fuller WA (1982) Survey design under the regression superpopulation model. J Am Stat Assoc 77(377):89–96
Israeli O (2007) A Shapley based decomposition of the R-square of a linear regression. J Econ Inequal 5:199–212
ISTAT (2014) Rilevazione sulle Forze di Lavoro: Aspetti metodologici dell’indagine. https://www.istat.it/microdata/download.php?id=/import/fs/pub/wwwarmida/2/2019/4/Nota.pdf (Accessed 22 Jan 2021)
Kassaye MH, Demir Y (2012) Calibration based on principal components. Thesis dissertation. Orebro University School of Business
Kish L (1992) Weighting for unequal Pi. J off Stat 8(2):183–200
McConville KS, Breidt FJ, Lee TMC, Moisen GG (2017) Model-assisted survey regression estimation with the LASSO. J Surv Stat Methodol 5:131–158
Nascimento Silva PLD, Skinner CJ (1997) Variable selection for regression estimation in finite populations. Surv Methodol 23(1):23–32
Rao JNK, Singh AC (1997) A ridge-regression method for range-restricted weight calibration in survey sampling. In: Proceedings of the selection on survey research methods, pp 57–65
Rota BJ, Laitila T (2017) Calibrating on principal components in the presence of multiple auxiliary variables for non-response adjustment. S Afr Stat J 51(1):103–125
Särndal C-E (1980) On π-inverse weighting versus best linear unbiased weighting in probability sampling. Biometrika 67(3):639–650
Särndal C-E (2007) The calibration approach in survey theory and practice. Surv Methodol 33(2):99–119
Särndal C-E, Lundström S (2005) Estimation in surveys with nonresponse. John Wiley and Sons
Särndal C-E, Lundström S (2008) Assessing auxiliary vectors for control nonresponse bias in the Official Statistics. J off Stat 24(2):167–191
Särndal C-E, Swensson B, Wretman J (1989) The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76(3):527–537
Sastre M, Trannoy A (2002) Shapley inequality decomposition by factor components: some methodological issues. J Econ 77(1):51–89
Shapley LS (1953) A value for n-person games. Contrib Theory Games 2(28):307–317
Shorrocks AF (2013) Decomposition procedures for distributional analysis: a unified framework based on the Shapley value. J Econ Inequal 11:1–28
Singh AC, Mohl CA (1996) Understanding calibration estimators in survey sampling. Surv Methodol 22(2):107–116
Swold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58(2):109–130
Théeberge A (1999) Calibration and restricted weights. Surv Methodol 26(1):99–107
Théeberge A (2000) Extension of calibration in survey sampling. J Am Stat Assoc 94(446):635–644
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58(1):267–288
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Series B Stat Methodol 73(3):273–282
Tillé Y, Wilhelm M (2017) Probability sampling designs: Balancing and principles for choice of design. Stat Sci 32(2):176–189
Traat I, Särndal CE (2011) Domain estimators calibrated on information from other surveys. Acta e Commentationes Universitatis Tatuensis de Mathemathica 2, University of Tartu, Tartu, Estonia
Wright RL (1983) Finite population sampling with multivariate auxiliary information. J Am Stat Assoc 78(384):879–884
Wu C, Sitter RR (2001) A model-calibration approach to using complete auxiliary information from survey data. J Am Stat Assoc 96(453):185–193
Zardetto D (2015) ReGenesees: an advanced R system for calibration, estimation and sampling error assessment in complex sample surveys. J off Stat 31(2):177–203
Acknowledgements
The authors express their gratitude for the meticulous reading and the constructive comments from the the Editor and two anonymous referees.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guandalini, A., Ceccarelli, C. Impact measurement and dimension reduction of auxiliary variables in calibration estimator using the Shapley decomposition. Stat Methods Appl 31, 759–784 (2022). https://doi.org/10.1007/s10260-021-00616-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00616-z