Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error

Watson, Samantha K.; Elliot, Mark

doi:10.1007/s11135-015-0235-8

Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error

Published: 20 June 2015

Volume 50, pages 1781–1797, (2016)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

Samantha K. Watson^1,2 &
Mark Elliot²

1214 Accesses
14 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a newly available technique to adjust for bias in non-probabilistically selected samples. To date, applications of this innovative technique—termed entropy balancing—have been restricted to evaluation settings, where the goal is to reduce model dependence prior to the estimation of treatment effects. In a novel application, we demonstrate the technique’s utility in cases where the goal is to correct for sample bias originating in coverage error. The appeal of entropy balancing in this latter setting lies in its capacity to optimise the twin goals of improved balance in covariate distribution and maximum retention of information. Entropy balancing combines the opportunity to incorporate a large set of moment conditions in the calculation of weights, with the ability to directly implement exact balance. The technique thus builds upon the theoretical appeal of the more widely known and applied propensity score adjustment method, while addressing that method’s practical limitations. We demonstrate the utility of the entropy balancing technique empirically, through an example using the Young Lives Project survey data for rural Andhra Pradesh, South India. We conclude by summarising the potential of this procedure to contribute to robust survey-based research more widely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration

Article 29 November 2017

Weight smoothing for nonprobability surveys

Article Open access 20 December 2021

Variance estimation procedures in the presence of singly imputed survey data: a critical review

Article 18 August 2020

Notes

See Hainmueller (2012) for a comprehensive presentation of the theoretical framework.
In cases where only marginal population probabilities are available (from summarised census data for example) the ebalance procedure allows for values to be manually specified to reweight the non-probability sample covariates in line with available known population targets.
All analysis is conducted in STATA 13 software; Hainmueller’s “ebalance” suite of commands to perform the entropy balance procedure can be imported to STATA in the usual manner, i.e. “ssc install ebalance, all replace”.
The survey was sponsored by the UK Department for International Development (DFID), and is led by the Oxford Department of International Development at the University of Oxford, in collaboration with academic institutions in each of the four project countries.
In the second round of data collection all individuals resident in a selected household were included in the survey.
Andersson (1996) discusses the general method of sentinel site sampling in some detail.
At the all India level a total of 124,680 households and 602,833 individuals took part in the survey for schedule 10 of the 61st round of the NSS.
Household class is calculated on the basis of household landholding and dominant labour relations.
The default iteration number is 20, the default tolerance level 0.015, and both can be increased if convergence fails.

References

Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29(1), 1–11 (2011)
Article Google Scholar
Andersson, N.: Evidence-based planning: the philosophy and methods of sentinel community surveillance. Economic Development Institute of the World Bank, Washington (1996)
Google Scholar
Duffy, B., Terhanian, G., Bremer, J., Smith, K.: Comparing data from online and face-to-face surveys. Int. J. Market Res. 47(6), 615–639 (2005)
Google Scholar
Frölich, M.: Propensity score matching without conditional independence assumption—with an application to the gender wage gap in the United Kingdom. Econ. J. 10(2), 359–407 (2007)
Article Google Scholar
Galab, S., Reddy, G. M., Antony, P., McCoy, A., Ravi, C., Raju, D. S., Mayuri, K., Reddy, P. P.: Young Lives Preliminary Country Report: India. Young Lives, Oxford (2003). http://www.younglives.org.uk/files/country-reports/country-report-1-india-2003. Retrieved 12 Oct 2013
Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20(1), 25–46 (2012)
Article Google Scholar
Hainmueller, J., Xu, Y.: Ebalance: a stata package for entropy balancing. J. Stat. Softw. 54(7), (2013). http://www.jstatsoft.org/v54/i07/paper. Retrieved 3 Mar 2014
Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator. Rev. Econ. Stud. 65(2), 261–294 (1998)
Article Google Scholar
Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15(3), 199–236 (2007)
Article Google Scholar
Isaksson, A., Forsman, G.: A comparison between using the web and using the telephone to survey political opinions. In: Proceedings of the section on survey research methods, American Statistical Association, Alexandria (2003)
Kalton, G.: Models in the practice of survey sampling (revisited). J. Off. Stat. 18(2), 129–154 (2002)
Google Scholar
Kumra, N.: An assessment of the young lives sampling approach in Andhra Pradesh, India. Young lives technical note 2. Young lives, Oxford (2008). http://www.younglives.org.uk/files/technical-notes/an-assessment-of-the-young-lives-sampling-approach-in-andhra-pradesh-india. Retrieved 11 Mar 2014
Lee, S.: Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 22(2), 329–349 (2006)
Google Scholar
MSPI: Annexure II—population projection. National Sample Survey Organisation, New Delhi (2006)
Google Scholar
MSPI: How to use unit level data. Ministry of Statistics and Programme Implementation, New Delhi (2008)
Google Scholar
NSSO: Note on estimation procedure of NSS 61st round. Government of India, New Delhi (2004)
Rivers, D.: Sampling for web surveys. Joint statistical meetings, Salt Lake City (2012). http://www.laits.utexas.edu/txp_media/html/poll/files/Rivers_matching.pdf. Retrieved 1 Dec 2012
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Article Google Scholar
Schonlau, M., Van Soest, A., Kapteyn, A., Couper, M.: Selection bias in web surveys and the use of propensity scores. Sociol. Methods Res. 37(3), 291–318 (2009)
Article Google Scholar
Sekhon, J.S.: Opiates for the matches: matching methods for causal inference. Annu. Rev. Polit. Sci. 12, 487–508 (2009)
Article Google Scholar
Steinmetz, S.M., Tijdens, K.: Can weighting improve the representativeness of volunteer online panels? Insights from the German wage indicator data. Concepts and Methods 5(1), 7–11 (2009). http://www.concepts-methods.org/newsletters/20091215_55_C&M_Newsletter_2009_2.pdf. Retrieved 9 Dec 2012
Steinmetz, S.M., Bianchi, A., Tijdens, K.G., Biffignandi, S.: Improving web survey quality—potentials and constraints of propensity score weighting. In: Callegaro, M., Baker, R., Bethlehem, J., Göritz, A., Krosnick, J., Lavrakas, P. (eds.) Online panel research: a data quality perspective. Wiley, New York (2014)
Google Scholar
Stuart, E., Cole, S.R., Cole, Stephen R., Bradshaw, C.P., Leaf, P.J.: The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Stat. Soc. 174(2), 369–386 (2011)
Article Google Scholar
UN: Household sample surveys in developing and transition countries. New York (2006). http://unstats.un.org/unsd/hhsurveys/. Retrieved 2 Mar 2014
Vavreck, L., Rivers, D.: The 2006 cooperative congressional election study. J. Elect. 18(4), 355–366 (2008)
Google Scholar
Wilson, I., Huttly, S.R., Fenn, B.: A case study of sample design for longitudinal research: Young Lives. Int. J. Soc. Res. Methodol. 9(5), 351–365 (2006)
Article Google Scholar
Yoshimura, O.: Adjusting responses in a non-probability web panel survey by the propensity score weighting. In: Proceedings of the American Statistical Association (2004). http://www.amstat.org/sections/srms/Proceedings/y2004f.html. Retrieved 11 Mar 2014
Zhao, Z.: Sensitivity of propensity score methods to the specifications. IZA forschungsinstitut zur Zukunft der Arbeit (Institute for the Study of Labour). Discussion Paper No. 1873 (2005) Bonn. Available at ftp.iza.org/dp1873.pdf

Download references

Acknowledgments

We are grateful to Natalie Shlomo for her detailed comments on an earlier draft. This paper is an outcome of research funded by the UK Economic and Social Research Council (ESRC). Grant number: ES/G015473/1.

Author information

Authors and Affiliations

Department of Global Health & Development, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
Samantha K. Watson
Institute for Social Research, University of Manchester, Manchester, M13 9PL, UK
Samantha K. Watson & Mark Elliot

Authors

Samantha K. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Mark Elliot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samantha K. Watson.

Appendix : A condensed version of the theoretical framework for entropy balancing

Under ebalance, weights are selected to minimize the entropy distance metric:

$$ {\mathop {\hbox{min} \,H(w)}\limits_{{W_{1} }}} = \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} \log (w_{i} /q_{i} )} $$

(1)

where w _i is the weight selected for each non-random sample units.

D_i $ \in \left\{ {1,0} \right\} $ is a binary indicator coded 1 unit i is drawn from the reference sample or 0 if it is drawn from the non-random sample. q _i = 1/n₀ and is a base weight.

The selection of weights is subjected to the balance constraints defined in Eq. 2.1, the normalising constraints defined in Eq. 2.2, and the non-negativity constraints defined in Eq. 2.3:

$$ \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} c_{ri} (X_{i} )\,\, = \,\,m_{i} } \;\rm with\;r \in 1, \ldots ,{\text{R}} $$

(2.1)

$$ \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} = 1} $$

(2.2)

$$ w_{i} \ge 0 \quad {\text{for}}\;{\text{all}}\;i\;{\text{such}}\;{\text{that}}\;D = 0 $$

(2.3)

X is a matrix that contains the data of J exogenous pre-treatment covariates with X _ij denoting the values of the j-th covariate characteristic for unit i.

$ C_{ri} (X_{i} ) = m_{r} $ describes a set of R balance constraints imposed on the covariate moments of the reweighted non-random sample group.

The ebalance approach accommodates high dimensionality to assign one weight to each control unit. The weights that solve the entropy balancing scheme are computed from a dual problem that is unconstrained and reduced to a system of non-linear equations in R Langrange multipliers. The dual problem is given by:

$$ \mathop {\hbox{min}\; L^{d} }\limits_{Z} = \log (Q^{\prime}\exp - (C^{\prime}Z)) + M^{\prime}Z $$

(3)

where $ {\text{Z = }}\left\{ {\lambda_{1} , \ldots ,\lambda_{\text{R}} } \right\}^{\prime } $ is a vector (Z*) of Langrange multiplier for the balance constraints, rewritten in matrix form as CW=M with the $ ({\text{R}}\; \times \;{\text{n}}_{0} ) $ constraint matrix, C=[c₁(X_i),…,c_R(X_i)]′, and the moment vector, $ M\, = \,[m_{1} , \ldots ,m_{R} ]^{\prime} $. Thevector Z* that solves the dual problem also the primal problem. The solution weights are recover using:

$$ W^{*} = \frac{{Q.\exp ( - C^{\prime}Z^{*} )}}{{Q^{\prime}.\exp ( - C^{\prime}Z).}} $$

(4)

An iterative Levenberg–Marquardt scheme exploits second order information to solve the dual problem:

$$ {\text{Z}}^{\text{new}} = \;\;{\text{Z}}^{\text{old}} - {\text{I}}\nabla^{2} {}_{\text{Z}}({\text{L}}^{\text{d}} )^{ - 1} \nabla {}_{\text{Z}}({\text{L}}^{\text{d}} ) $$

(5)

Here,I is a scalar denoting the step length. The optimal step length (either the full newton step or I) is selected for each iteration (Hainmueller and Xu 2013).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Watson, S.K., Elliot, M. Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error. Qual Quant 50, 1781–1797 (2016). https://doi.org/10.1007/s11135-015-0235-8

Download citation

Published: 20 June 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11135-015-0235-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error

Abstract

Access this article

Similar content being viewed by others

Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration

Weight smoothing for nonprobability surveys

Variance estimation procedures in the presence of singly imputed survey data: a critical review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix : A condensed version of the theoretical framework for entropy balancing

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error

Abstract

Access this article

Similar content being viewed by others

Using entropy balancing to strengthen an observational cohort study design: lessons learned from an evaluation of a complex multi-state federal demonstration

Weight smoothing for nonprobability surveys

Variance estimation procedures in the presence of singly imputed survey data: a critical review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix : A condensed version of the theoretical framework for entropy balancing

Appendix : A condensed version of the theoretical framework for entropy balancing

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation