Skip to main content
Log in

Impact measurement and dimension reduction of auxiliary variables in calibration estimator using the Shapley decomposition

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript


In multipurpose surveys several interest variables and a very large number of auxiliary variables are collected. Auxiliary variables are usually considered in calibration for improving estimates. But, very often, some of them are included for the sole purpose of increasing consistency. Consistency is an important point for National Statistical Institutes especially as a means for promoting credibility in published statistics. As a direct result, the number of auxiliary variables considered in calibration continue to grow over time. In literature, several methods show how to manage many auxiliary variables in order to prevent some unpleasant consequences on the accuracy of estimates. They consist mainly in variable selection or dimension reduction and they are very useful for deriving calibrated estimates more accurately. However, looking at them, it is not easy to infer how much the contribution of each auxiliary variable is, especially when there are plenty of them. The Shapley decomposition applied in the calibration context could be a useful tool to better understand the net effects of auxiliary variables, and, in addition, it provides further information for supporting researchers in choosing the best calibration system. It provides a direct measure of the change with respect to Horvitz–Thompson estimates and to related sampling variances due to the introduction of each auxiliary variable in the calibration. The method has been applied to real data of the Italian Labour Force Survey that makes an extensive use of auxiliary variables in calibration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others


  1. Sometimes auxiliary totals estimated from surveys can also be considered. See Deville (1999), Dever and Valliant (2010), Traat and Särndal (2011), Ceccarelli and Guandalini (2011), Guandalini and Tillé (2017) and references therein.

  2. For further details on the Italian LFS see, e.g., ISTAT (2014).

  3. It stands for Computer Assisted Personal Interview. Except in particular circumstances in LFS the first occasion the household (first rotational group) are interviewed with this survey mode.

  4. It stands for Computer Assisted Telephone Interview. Except in particular circumstances in LFS from second occasion the household (second, third and fourth rotational group) are interviewed with this survey mode.


  • Bankier MD, Rathwell S, Majkowski M (1992) Two step generalized least squares estimation in the 1991 Canadian Census. Statistics Canada, Ottawa

    Google Scholar 

  • Bardsley P, Chambers RL (1984) Multipurpose estimation from unbalanced samples. J R Stat Soc C-Appl 33(3):290–299

    MATH  Google Scholar 

  • Beaumont JF (2008) A new approach to weighting and inference in sample surveys. Biometrika 95(3):539–553

    Article  MathSciNet  MATH  Google Scholar 

  • Bethlehem JG, Keller JW (1987) Linear weighting of sample survey data. J off Stat 3(2):141–153

    Google Scholar 

  • Cardot H, Goga C, Shehzad MA (2017) Calibration and partial calibration on principal components when the number of auxiliary variables is large. Stat Sin 2017:243–260

    MathSciNet  MATH  Google Scholar 

  • Cassel CM, Särndal CE, Wretman JH (1979) Prediction theory for finite populations when model-based and design-based principles are combined: with an application to a study of choice of transportation mode across the Öresund Straits. Scand J Stat 1979:97–106

    MATH  Google Scholar 

  • Ceccarelli C, Guandalini A (2011) Increasing the accuracy of it-silc estimates through the use of auxiliary variables from labour force survey. Ital J Appl Stat Stat Appl 24(1):103–115

    Google Scholar 

  • Chambers RL (1996) Robust case-weighting for multipurpose establishment surveys. J off Stat 12(1):3–32

    Google Scholar 

  • Chambers RL, Skinner C, Wang S (1999) Intelligent calibration. Bull Int Stat Inst 58(2):321–324

    Google Scholar 

  • Clark RG, Chambers RL (2008) Adaptive calibration for prediction of finite population totals. Surv Methodol 34(2):163–172

    Google Scholar 

  • Deutsch J, Silber J (2007) Decomposing income inequality by population subgroups: a generalization. Inequality and poverty. Emerald Group Publishing Limited, pp. 237–253

  • Devaud D, Tillé Y (2019) Deville and Särndal’s calibration: revisiting a 25-years-old successful optimization problem. TEST 28:1033–1065.

    Article  MathSciNet  MATH  Google Scholar 

  • Dever JA, Valliant R (2010) A comparison of variance estimators for poststratification to estimated control totals. Sur Methodol 36(1):45–56

    Google Scholar 

  • Deville J-C, Särndal C-E (1992) Calibration estimators in survey sampling. J Am Stat Assoc 87(418):376–382

    Article  MathSciNet  MATH  Google Scholar 

  • Deville J-C, Särndal C-E, Sautory O (1993) Generalized raking procedures in survey sampling. J Am Stat Assoc 88(423):1013–1020

    Article  MATH  Google Scholar 

  • Deville JC (1999) Simultaneous calibration of several surveys. In: Proceedings of statistics Canada symposium 99 - Combining data from different sources, Ottawa, Canada. Statistics Canada, pp 207–212

  • Fuller WA (2002) Regression estimation for survey samples. Sur Methodol 28(1):5–23

    Google Scholar 

  • Giorgi GM, Guandalini A (2016) Bonferroni index decomposition and the Shapley method. Rivista Italiana Di Economia Demografia e Statistica 70(4):67–78

    Google Scholar 

  • Giorgi G, Guandalini A (2018) Decomposing the Bonferroni inequality index by subgroups: Shapley value and balance of inequality. J Econom 6(2):1–16

    Google Scholar 

  • Guandalini A, Tillé Y (2017) Design-based estimators calibrated on estimated totals from multiple surveys. Int Stat Rev 85(2):250–269

    Article  MathSciNet  Google Scholar 

  • Guggemos F, Tillé Y (2010) Penalized calibration in survey sampling: design-based estimation assisted by mixed models. J Stat Plan Inference 140(11):3199–3212

    Article  MathSciNet  MATH  Google Scholar 

  • Hausken K, Mohr M (2001) The value of a player in n-person games. Soc Choice Welf 18(3):465–483

    Article  MathSciNet  MATH  Google Scholar 

  • Isaki CT, Fuller WA (1982) Survey design under the regression superpopulation model. J Am Stat Assoc 77(377):89–96

    Article  MathSciNet  MATH  Google Scholar 

  • Israeli O (2007) A Shapley based decomposition of the R-square of a linear regression. J Econ Inequal 5:199–212

    Article  Google Scholar 

  • ISTAT (2014) Rilevazione sulle Forze di Lavoro: Aspetti metodologici dell’indagine. (Accessed 22 Jan 2021)

  • Kassaye MH, Demir Y (2012) Calibration based on principal components. Thesis dissertation. Orebro University School of Business

  • Kish L (1992) Weighting for unequal Pi. J off Stat 8(2):183–200

    MathSciNet  Google Scholar 

  • McConville KS, Breidt FJ, Lee TMC, Moisen GG (2017) Model-assisted survey regression estimation with the LASSO. J Surv Stat Methodol 5:131–158

    Article  Google Scholar 

  • Nascimento Silva PLD, Skinner CJ (1997) Variable selection for regression estimation in finite populations. Surv Methodol 23(1):23–32

    Google Scholar 

  • Rao JNK, Singh AC (1997) A ridge-regression method for range-restricted weight calibration in survey sampling. In: Proceedings of the selection on survey research methods, pp 57–65

  • Rota BJ, Laitila T (2017) Calibrating on principal components in the presence of multiple auxiliary variables for non-response adjustment. S Afr Stat J 51(1):103–125

    MATH  Google Scholar 

  • Särndal C-E (1980) On π-inverse weighting versus best linear unbiased weighting in probability sampling. Biometrika 67(3):639–650

    MathSciNet  MATH  Google Scholar 

  • Särndal C-E (2007) The calibration approach in survey theory and practice. Surv Methodol 33(2):99–119

    Google Scholar 

  • Särndal C-E, Lundström S (2005) Estimation in surveys with nonresponse. John Wiley and Sons

    Book  MATH  Google Scholar 

  • Särndal C-E, Lundström S (2008) Assessing auxiliary vectors for control nonresponse bias in the Official Statistics. J off Stat 24(2):167–191

    Google Scholar 

  • Särndal C-E, Swensson B, Wretman J (1989) The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76(3):527–537

    Article  MathSciNet  MATH  Google Scholar 

  • Sastre M, Trannoy A (2002) Shapley inequality decomposition by factor components: some methodological issues. J Econ 77(1):51–89

    Article  MATH  Google Scholar 

  • Shapley LS (1953) A value for n-person games. Contrib Theory Games 2(28):307–317

    MathSciNet  MATH  Google Scholar 

  • Shorrocks AF (2013) Decomposition procedures for distributional analysis: a unified framework based on the Shapley value. J Econ Inequal 11:1–28

    Article  Google Scholar 

  • Singh AC, Mohl CA (1996) Understanding calibration estimators in survey sampling. Surv Methodol 22(2):107–116

  • Swold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58(2):109–130

    Article  Google Scholar 

  • Théeberge A (1999) Calibration and restricted weights. Surv Methodol 26(1):99–107

    Google Scholar 

  • Théeberge A (2000) Extension of calibration in survey sampling. J Am Stat Assoc 94(446):635–644

    MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Series B Stat Methodol 73(3):273–282

    Article  MathSciNet  MATH  Google Scholar 

  • Tillé Y, Wilhelm M (2017) Probability sampling designs: Balancing and principles for choice of design. Stat Sci 32(2):176–189

    Article  MATH  Google Scholar 

  • Traat I, Särndal CE (2011) Domain estimators calibrated on information from other surveys. Acta e Commentationes Universitatis Tatuensis de Mathemathica 2, University of Tartu, Tartu, Estonia

  • Wright RL (1983) Finite population sampling with multivariate auxiliary information. J Am Stat Assoc 78(384):879–884

    Article  MathSciNet  MATH  Google Scholar 

  • Wu C, Sitter RR (2001) A model-calibration approach to using complete auxiliary information from survey data. J Am Stat Assoc 96(453):185–193

    Article  MathSciNet  MATH  Google Scholar 

  • Zardetto D (2015) ReGenesees: an advanced R system for calibration, estimation and sampling error assessment in complex sample surveys. J off Stat 31(2):177–203

    Article  Google Scholar 

Download references


The authors express their gratitude for the meticulous reading and the constructive comments from the the Editor and two anonymous referees.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alessio Guandalini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guandalini, A., Ceccarelli, C. Impact measurement and dimension reduction of auxiliary variables in calibration estimator using the Shapley decomposition. Stat Methods Appl 31, 759–784 (2022).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: