Skip to main content
Log in

Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

This paper presents a newly available technique to adjust for bias in non-probabilistically selected samples. To date, applications of this innovative technique—termed entropy balancing—have been restricted to evaluation settings, where the goal is to reduce model dependence prior to the estimation of treatment effects. In a novel application, we demonstrate the technique’s utility in cases where the goal is to correct for sample bias originating in coverage error. The appeal of entropy balancing in this latter setting lies in its capacity to optimise the twin goals of improved balance in covariate distribution and maximum retention of information. Entropy balancing combines the opportunity to incorporate a large set of moment conditions in the calculation of weights, with the ability to directly implement exact balance. The technique thus builds upon the theoretical appeal of the more widely known and applied propensity score adjustment method, while addressing that method’s practical limitations. We demonstrate the utility of the entropy balancing technique empirically, through an example using the Young Lives Project survey data for rural Andhra Pradesh, South India. We conclude by summarising the potential of this procedure to contribute to robust survey-based research more widely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. See Hainmueller (2012) for a comprehensive presentation of the theoretical framework.

  2. In cases where only marginal population probabilities are available (from summarised census data for example) the ebalance procedure allows for values to be manually specified to reweight the non-probability sample covariates in line with available known population targets.

  3. All analysis is conducted in STATA 13 software; Hainmueller’s “ebalance” suite of commands to perform the entropy balance procedure can be imported to STATA in the usual manner, i.e. “ssc install ebalance, all replace”.

  4. The survey was sponsored by the UK Department for International Development (DFID), and is led by the Oxford Department of International Development at the University of Oxford, in collaboration with academic institutions in each of the four project countries.

  5. In the second round of data collection all individuals resident in a selected household were included in the survey.

  6. Andersson (1996) discusses the general method of sentinel site sampling in some detail.

  7. At the all India level a total of 124,680 households and 602,833 individuals took part in the survey for schedule 10 of the 61st round of the NSS.

  8. Household class is calculated on the basis of household landholding and dominant labour relations.

  9. The default iteration number is 20, the default tolerance level 0.015, and both can be increased if convergence fails.

References

  • Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29(1), 1–11 (2011)

    Article  Google Scholar 

  • Andersson, N.: Evidence-based planning: the philosophy and methods of sentinel community surveillance. Economic Development Institute of the World Bank, Washington (1996)

    Google Scholar 

  • Duffy, B., Terhanian, G., Bremer, J., Smith, K.: Comparing data from online and face-to-face surveys. Int. J. Market Res. 47(6), 615–639 (2005)

    Google Scholar 

  • Frölich, M.: Propensity score matching without conditional independence assumption—with an application to the gender wage gap in the United Kingdom. Econ. J. 10(2), 359–407 (2007)

    Article  Google Scholar 

  • Galab, S., Reddy, G. M., Antony, P., McCoy, A., Ravi, C., Raju, D. S., Mayuri, K., Reddy, P. P.: Young Lives Preliminary Country Report: India. Young Lives, Oxford (2003). http://www.younglives.org.uk/files/country-reports/country-report-1-india-2003. Retrieved 12 Oct 2013

  • Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20(1), 25–46 (2012)

    Article  Google Scholar 

  • Hainmueller, J., Xu, Y.: Ebalance: a stata package for entropy balancing. J. Stat. Softw. 54(7), (2013). http://www.jstatsoft.org/v54/i07/paper. Retrieved 3 Mar 2014

  • Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator. Rev. Econ. Stud. 65(2), 261–294 (1998)

    Article  Google Scholar 

  • Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15(3), 199–236 (2007)

    Article  Google Scholar 

  • Isaksson, A., Forsman, G.: A comparison between using the web and using the telephone to survey political opinions. In: Proceedings of the section on survey research methods, American Statistical Association, Alexandria (2003)

  • Kalton, G.: Models in the practice of survey sampling (revisited). J. Off. Stat. 18(2), 129–154 (2002)

    Google Scholar 

  • Kumra, N.: An assessment of the young lives sampling approach in Andhra Pradesh, India. Young lives technical note 2. Young lives, Oxford (2008). http://www.younglives.org.uk/files/technical-notes/an-assessment-of-the-young-lives-sampling-approach-in-andhra-pradesh-india. Retrieved 11 Mar 2014

  • Lee, S.: Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 22(2), 329–349 (2006)

    Google Scholar 

  • MSPI: Annexure II—population projection. National Sample Survey Organisation, New Delhi (2006)

    Google Scholar 

  • MSPI: How to use unit level data. Ministry of Statistics and Programme Implementation, New Delhi (2008)

    Google Scholar 

  • NSSO: Note on estimation procedure of NSS 61st round. Government of India, New Delhi (2004)

  • Rivers, D.: Sampling for web surveys. Joint statistical meetings, Salt Lake City (2012). http://www.laits.utexas.edu/txp_media/html/poll/files/Rivers_matching.pdf. Retrieved 1 Dec 2012

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    Article  Google Scholar 

  • Schonlau, M., Van Soest, A., Kapteyn, A., Couper, M.: Selection bias in web surveys and the use of propensity scores. Sociol. Methods Res. 37(3), 291–318 (2009)

    Article  Google Scholar 

  • Sekhon, J.S.: Opiates for the matches: matching methods for causal inference. Annu. Rev. Polit. Sci. 12, 487–508 (2009)

    Article  Google Scholar 

  • Steinmetz, S.M., Tijdens, K.: Can weighting improve the representativeness of volunteer online panels? Insights from the German wage indicator data. Concepts and Methods 5(1), 7–11 (2009). http://www.concepts-methods.org/newsletters/20091215_55_C&M_Newsletter_2009_2.pdf. Retrieved 9 Dec 2012

  • Steinmetz, S.M., Bianchi, A., Tijdens, K.G., Biffignandi, S.: Improving web survey quality—potentials and constraints of propensity score weighting. In: Callegaro, M., Baker, R., Bethlehem, J., Göritz, A., Krosnick, J., Lavrakas, P. (eds.) Online panel research: a data quality perspective. Wiley, New York (2014)

    Google Scholar 

  • Stuart, E., Cole, S.R., Cole, Stephen R., Bradshaw, C.P., Leaf, P.J.: The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Stat. Soc. 174(2), 369–386 (2011)

    Article  Google Scholar 

  • UN: Household sample surveys in developing and transition countries. New York (2006). http://unstats.un.org/unsd/hhsurveys/. Retrieved 2 Mar 2014

  • Vavreck, L., Rivers, D.: The 2006 cooperative congressional election study. J. Elect. 18(4), 355–366 (2008)

    Google Scholar 

  • Wilson, I., Huttly, S.R., Fenn, B.: A case study of sample design for longitudinal research: Young Lives. Int. J. Soc. Res. Methodol. 9(5), 351–365 (2006)

    Article  Google Scholar 

  • Yoshimura, O.: Adjusting responses in a non-probability web panel survey by the propensity score weighting. In: Proceedings of the American Statistical Association (2004). http://www.amstat.org/sections/srms/Proceedings/y2004f.html. Retrieved 11 Mar 2014

  • Zhao, Z.: Sensitivity of propensity score methods to the specifications. IZA forschungsinstitut zur Zukunft der Arbeit (Institute for the Study of Labour). Discussion Paper No. 1873 (2005) Bonn. Available at ftp.iza.org/dp1873.pdf

Download references

Acknowledgments

We are grateful to Natalie Shlomo for her detailed comments on an earlier draft. This paper is an outcome of research funded by the UK Economic and Social Research Council (ESRC). Grant number: ES/G015473/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samantha K. Watson.

Appendix : A condensed version of the theoretical framework for entropy balancing

Appendix : A condensed version of the theoretical framework for entropy balancing

Under ebalance, weights are selected to minimize the entropy distance metric:

$$ {\mathop {\hbox{min} \,H(w)}\limits_{{W_{1} }}} = \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} \log (w_{i} /q_{i} )} $$
(1)

where w i is the weight selected for each non-random sample units.

Di \( \in \left\{ {1,0} \right\} \) is a binary indicator coded 1 unit i is drawn from the reference sample or 0 if it is drawn from the non-random sample. q i  = 1/n0 and is a base weight.

The selection of weights is subjected to the balance constraints defined in Eq. 2.1, the normalising constraints defined in Eq. 2.2, and the non-negativity constraints defined in Eq. 2.3:

$$ \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} c_{ri} (X_{i} )\,\, = \,\,m_{i} } \;\rm with\;r \in 1, \ldots ,{\text{R}} $$
(2.1)
$$ \sum\limits_{{\left\{ {i\left| {D = 0} \right.} \right\}}} {w_{i} = 1} $$
(2.2)
$$ w_{i} \ge 0 \quad {\text{for}}\;{\text{all}}\;i\;{\text{such}}\;{\text{that}}\;D = 0 $$
(2.3)

X is a matrix that contains the data of J exogenous pre-treatment covariates with X ij denoting the values of the j-th covariate characteristic for unit i.

\( C_{ri} (X_{i} ) = m_{r} \) describes a set of R balance constraints imposed on the covariate moments of the reweighted non-random sample group.

The ebalance approach accommodates high dimensionality to assign one weight to each control unit. The weights that solve the entropy balancing scheme are computed from a dual problem that is unconstrained and reduced to a system of non-linear equations in R Langrange multipliers. The dual problem is given by:

$$ \mathop {\hbox{min}\; L^{d} }\limits_{Z} = \log (Q^{\prime}\exp - (C^{\prime}Z)) + M^{\prime}Z $$
(3)

where \( {\text{Z = }}\left\{ {\lambda_{1} , \ldots ,\lambda_{\text{R}} } \right\}^{\prime } \) is a vector (Z*) of Langrange multiplier for the balance constraints, rewritten in matrix form as CW=M with the \( ({\text{R}}\; \times \;{\text{n}}_{0} ) \) constraint matrix, C=[c1(Xi),…,cR(Xi)]′, and the moment vector, \( M\, = \,[m_{1} , \ldots ,m_{R} ]^{\prime} \). Thevector Z* that solves the dual problem also the primal problem. The solution weights are recover using:

$$ W^{*} = \frac{{Q.\exp ( - C^{\prime}Z^{*} )}}{{Q^{\prime}.\exp ( - C^{\prime}Z).}} $$
(4)

An iterative Levenberg–Marquardt scheme exploits second order information to solve the dual problem:

$$ {\text{Z}}^{\text{new}} = \;\;{\text{Z}}^{\text{old}} - {\text{I}}\nabla^{2} {}_{\text{Z}}({\text{L}}^{\text{d}} )^{ - 1} \nabla {}_{\text{Z}}({\text{L}}^{\text{d}} ) $$
(5)

Here,I is a scalar denoting the step length. The optimal step length (either the full newton step or I) is selected for each iteration (Hainmueller and Xu 2013).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Watson, S.K., Elliot, M. Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error. Qual Quant 50, 1781–1797 (2016). https://doi.org/10.1007/s11135-015-0235-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-015-0235-8

Keywords

Navigation