Skip to main content

Advertisement

Log in

Sampling Weights for Analyses of Couple Data: Example of the Demographic and Health Surveys

  • Published:
Demography

Abstract

In some surveys, women and men are interviewed separately in selected households, allowing matching of partner information and analyses of couples. Although individual sampling weights exist for men and women, sampling weights specific for couples are rarely derived. We present a method of estimating appropriate weights for couples that extends methods currently used in the Demographic and Health Surveys (DHS) for individual weights. To see how results vary, we analyze 1912 estimates (means; proportions; linear regression; and simple and multinomial logistic regression coefficients, and their standard errors) with couple data in each of 11 DHS surveys in which the couple weight could be derived. We used two measures of bias: absolute percentage difference from the value estimated with the couple weight and ratio of the absolute difference to the standard error using the couple weight. The latter shows greater bias for means and proportions, whereas the former and a combination of both measures show greater bias for regression coefficients. Comparing results using couple weights with published results using women’s weights for a logistic regression of couple contraceptive use in Turkey, we found that 6 of 27 coefficients had a bias above 5 %. On the other hand, a simulation of varying response rates (27 simulations) showed that median percentage bias in a logistic regression was less than 3 % for 17 of 18 coefficients. Two proxy couple weights that can be calculated in all DHS surveys perform considerably better than either male or female weights. We recommend that a couple weight be calculated and made available with couple data from such surveys.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The husband’s weight is preferred rather than the wife’s weight because response rates are usually lower and more variable for men than for women, so the inclusion of the couple in the sample depends more on completing an interview with the husband than with the wife.

  2. The probability that an eligible couple resides in the household is not needed.

  3. For DHS surveys, probabilities p1 to p3 and rh are already incorporated in the household weight, which ICF International provides with the household survey data. Therefore, using DHS data, it is necessary only to estimate rc and then multiply it by the inverse of the household weight. To form the couple weight, this result is inverted and normalized to sum to the sample size for couples with completed information. Details of the algebra are shown in the appendix.

  4. In the DHS, the sampling domains in a survey usually correspond to regions of the country.

  5. The use of sampling weights in regression is open to debate. DuMouchel and Duncan (1983) and Deaton (1998) showed that coefficients estimated from both sample-weighted and sample-unweighted analyses are not consistent estimators in the general case (e.g., when coefficients vary across sampling strata). However, Deaton (1998) argued that for regressions that are meant to be descriptive, the weighted estimates are preferred. Also, for the results to retain representativeness at the national level, weights are essential. Winship and Radbill (1994) showed that if the weights are solely a function of independent variables, then unweighted analysis is more efficient. This condition can be met by inclusion of indicator variables for sampling strata in the model. Of course, the assumption that coefficients for other covariates do not vary across strata is also implicit.

  6. The value of 5 % is only somewhat arbitrary. Specifically, of the over 100 DHS surveys available in which couples can be matched, the largest sample was in Nigeria (2008) with 8,731 couples. An exception is The India National Family Health Survey of 2005/2006, which actually had 39,000 couples. Using the average design effect for the survey of 3.3, this yields an effective sample size of 8,731 / 3.3 = 2,646. One-half of the width of the 95 % confidence interval for a proportion in a sample of this size is given by \( 1.96\times \sqrt{p\times \left(1-p\right)/2,646} \). Choosing p = .5 maximizes the estimated variance and gives a value of 0.019, or about 4 % (0.019 / 0.50). For surveys with smaller effective sample sizes, standard errors would be greater than this. Thus biases of less than 4 % or 5 % would nearly always be within sampling error for the usual DHS surveys. Other surveys with couples usually have the same order of magnitude or smaller sample sizes so the same calculation probably applies.

  7. The true bias of these survey estimates with any given weight is actually unknown because that would necessitate population-level data for comparison, which are not available in most developing countries. Even the couple weight will be correct only if all of the following are true:

    1. 1.

      The sampling was accurate: (a) the sampling frame was up-to-date; (b) the sampling probabilities were correctly calculated; and (c) the sample implementation was correctly done.

    2. 2.

      All eligible couples in selected households were correctly identified.

    3. 3.

      The characteristics of nonresponse households and nonresponse couples have the same distributions as those of sampled households and couples with completed questionnaires.

    Assumptions 1a, 1c, and 2 can be checked by completing these steps again with independent interview teams, although it is still only reliability that is measured rather than validity. Assumption 3 is likely to be violated, although judicious formation of weighting cells can reduce the bias. However, given these assumptions, the couple weights are the correct weights for analyses of couple data rather than weights for all women or all men. For conciseness of language, in the exposition of results, we will treat the estimates using the couple weights as unbiased and often refer simply to bias in other estimates compared with the estimated value using the couple weight.

  8. Kulczycki (2008:131) noted that data were “weighted by the couple weight provided by DHS,” but this turned out to be wrong; the author had used female weights (personal communication, Andrzej Kulczycki, 2011). Also, because the DHS public-use couple data file was updated between the time of the published analyses and the present analyses (the number of couples was reduced from 1,971 to 1,906) and because there were several inconsistencies in coding in the article (e.g., his table 5 contains no variable to identify 399 couples with both partners younger than 30 years of age), we could not replicate exactly those results. However, our results with women’s weights are close.

  9. If response rates were to reach 100 %, then all the weights would be the product of some constant and the household weight. That is, separate weights for women, men, and couples would not be needed; only normalization to equal the sample size of the women, men, or couples would be needed.

  10. The simulation for the scenario with the original female and male response rates matched with the original analyses to three significant digits.

  11. For reasons of confidentiality, DHS does not retain the household sampling probabilities. However, in Burkina Faso the sampling was done such that the probabilities were identical for all households in a strata and with information from the survey report, they could be reconstructed, albeit tediously.

  12. To explore the extent to which our estimates of bias were dependent on the scale of covariates used in the regressions, we reestimated the regressions that have number of children ever born as a covariate but this time using an ordinal recoded covariate (parity 0–2 = 0; 3–4 = 1; 5+ = 2). The results were nearly identical for absolute percentage difference in standard errors, the ratio of standard errors, and the ratio of MSE. However, the absolute percentage differences in estimates (from the result with the couple weight) were different (not shown). With the ordinal variable, the median of these percentage differences was higher for the grouped variable for all weights except the female weight (an average of 16 % higher). Interestingly, the means of these percentage differences were consistently lower for the grouped variable. In summary, although the estimates of standard errors and MSEs were quite robust to the scale of covariates, the estimates of percentage bias alone varied somewhat according to the scale of the covariate. Because covariates in our analyses include both binary (urban/rural) and continuous (e.g., age) scales, the results summarized across surveys are probably close to what would be found with other variables and surveys. This also further strengthens the usefulness of the deviation measure.

  13. Of course, the values of the deviation measure depend on the sample size of couples. However, a regression of the proportion of the deviation measures above 0.08 by sample size across the 11 surveys does not produce a significant coefficient (p = .94).

References

  • Allendorf, K. (2007). Couples’ reports of women’s autonomy and health-care use in Nepal. Studies in Family Planning, 38, 35–46.

    Article  Google Scholar 

  • Bankole, A. (1995). Desired fertility and fertility among the Yoruba of Nigeria: A study of couple preferences and subsequent fertility. Population Studies, 49, 317–328.

    Article  Google Scholar 

  • Bankole, A., & Singh, S. (1997). Couples’ fertility and contraceptive decision-making in developing countries: Hearing the man’s voice. International Family Planning Perspectives, 24, 15–24.

    Article  Google Scholar 

  • Becker, S. (1999). Measuring unmet need: Wives, husbands and/or couples. International Family Planning Perspectives, 25, 172–180.

    Article  Google Scholar 

  • Becker, S., & Costenbader, E. (2001). Husbands’ and wives’ reports of contraceptive use. Studies in Family Planning, 32, 111–129.

    Article  Google Scholar 

  • Becker, S., Hossain, M. B., & Thomson, E. (2006). Disagreement in spousal reports of current contraceptive use in sub-Saharan Africa. Journal of Biosocial Science, 38, 779–796.

    Article  Google Scholar 

  • Chandra Sekar, C., & Deming, W. E. (1949). On a method of estimating birth and death rates and the extent of registration. Journal of the American Statistical Association, 44, 101–115.

    Article  Google Scholar 

  • Chemaitelly, H., & Abu-Raddad, L. J. (2016). Characterizing HIV epidemiology in stable couples in Cambodia, the Dominican Republic, Haiti, and India. Epidemiology & Infection, 144, 90–96.

    Article  Google Scholar 

  • Chemaitelly, H., Cremin, I., Shelton, J., Hallett, T. B., & Abu-Raddad, L. J. (2012). Distinct HIV discordancy patterns by epidemic size in stable sexual partnerships in Sub-Saharan Africa. Sexually Transmitted Infections, 88, 51–57.

    Article  Google Scholar 

  • Chiao, C., Mishra, V., & Ksobiech, K. (2011). Spousal communication about HIV prevention in Kenya. Journal of Health Communication, 16, 1088–1105.

    Article  Google Scholar 

  • Deaton, A. (1998). The analysis of household surveys. Baltimore, MD: Johns Hopkins University Press.

    Google Scholar 

  • DHS Program User Forum. (2015). Sampling and weighting [Webinar]. Washington, DC: U.S. Agency for International Development. Available at: http://userforum.dhsprogram.com/index.php?t=thread&frm_id=65&S=e46003ffddd267d2d25ebc06ad5d927d

  • DuMouchel, W. H., & Duncan, G. J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. Journal of the American Statistical Association, 76, 535–543.

    Article  Google Scholar 

  • Eyawo, O., de Walque, D., Ford, N., Gakii, G., Lester, R. T., & Mills, E. J. (2010). HIV status in discordant couples in sub-Saharan Africa: A systematic review and meta-analysis. Lancet Infectious Diseases, 10, 770–777.

    Article  Google Scholar 

  • Ezeh, A. C., Seroussi, M., & Raggers, H. (1996). Men’s fertility, contraceptive use, and reproductive preferences (DHS Comparative Studies No. 18). Calverton, MD: Macro International.

  • Gipson, J., & Hindin, M. (2009). The effect of husbands’ and wives’ fertility preferences on the likelihood of a subsequent pregnancy, Bangladesh 1998–2003. Population Studies, 63, 135–146.

    Article  Google Scholar 

  • Gouskova, E., Heeringa, S. G., McGonagle, K., & Schoeni, R. F. (2008) Panel Study of Income Dynamics: Revised longitudinal weights, 1993–2005 (Technical Series Paper, No. 08-05). Ann Arbor: Institute for Social Research, University of Michigan.

  • Hertz, R. (1995). Separate but simultaneous interviewing of husbands and wives: Making sense of their stories. Qualitative Inquiry, 1, 429–451.

    Article  Google Scholar 

  • ICF International. (2012). Demographic and Health Survey methodology: Sampling and household listing manual. Calverton, MD: MEASURE DHS/ICF International.

  • ICF International. (2017). Demographic and Health Surveys: Countries [Map illustration]. Retrieved from http://dhsprogram.com/where-we-work/

  • Kreuter, F., Olson, K., Wagner, J., Yan, T., Ezzati-Rice, T. M., Casas-Cordero, C., . . . Raghunathan, T. E. (2010). Using proxy measures and other correlates of survey outcomes to adjust for non-response: Examples from multiple surveys. Journal of the Royal Statistical Society A, 173, 389–407.

  • Kulczycki, A. (2008). Husband-wife agreement, power relations and contraceptive use in Turkey. International Family Planning Perspectives, 34, 127–137.

    Article  Google Scholar 

  • Lasee, A., & Becker, S. (1997). Husband-wife communication on family planning and couple’s current contraceptive use in Kenya. International Family Planning Perspectives, 23, 15–20.

    Article  Google Scholar 

  • Little, R. (1993). Post-stratification: A modeler’s perspective. Journal of the American Statistical Association, 88, 1001–1012.

    Article  Google Scholar 

  • Little, R., & Rubin, D. (1987). Statistical analysis with missing data. New York, NY: Wiley.

    Google Scholar 

  • McClintock, E. (2017). Occupational sex composition and gendered housework performance: Compensation or conventionality? Journal of Marriage and the Family, 79, 475–510.

    Article  Google Scholar 

  • Ngom, P. (1997). Men's unmet need for family planning: Implications for African fertility transitions. Studies in Family Planning, 28, 192–202.

    Article  Google Scholar 

  • Schoen, R., Astone, N. M., Kim, Y. J., Nathanson, C. A., & Fields, J. M. (1999). Do fertility intentions predict behavior? Journal of Marriage and the Family, 61, 790–799.

    Article  Google Scholar 

  • StataCorp. (2013). Stata statistical software: Release 13. College Station, TX: StataCorp LP.

    Google Scholar 

  • StataCorp. (2015). Stata statistical software: Release 14. College Station, TX: StataCorp LP.

    Google Scholar 

  • Story, W. T., & Burgard, S. A. (2012). Couples’ reports of household decision-making and the utilization of maternal health services in Bangladesh. Social Science & Medicine, 75, 2403–2411.

    Article  Google Scholar 

  • Taylor, M. F., Brice, J., Buck, N., & Prentice-Lane, E. (Eds.). (2010). British Household Panel Survey user manual. Volume A: Introduction, Technical Report and Appendices. Colchester, UK: University of Essex.

    Google Scholar 

  • Thomson, D. R., Bah, A. B., Rubanzana, W. G., & Mutesa, L. (2015). Correlates of intimate partner violence against women during a time of rapid social transition in Rwanda: Analysis of the 2005 and 2010 Demographic and Health Surveys. BMC Women’s Health, 15, 96. https://doi.org/10.1186/s12905-015-0257-3

  • Thomson, E. (1990). Two into one: Structural models of couple behavior. In T. Draper & A. Marcos (Eds.), Family variables: Conceptualization, measurement and use (pp. 129–142). Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Thomson, E. (1997). Couple childbearing desires, intentions, and births. Demography, 34, 343–354.

    Article  Google Scholar 

  • Thomson, E., & Hoem, J. M. (1998). Couple childbearing plans and births in Sweden. Demography, 35, 315–322.

    Article  Google Scholar 

  • UNICEF. (2014). Multiple Indicator Cluster Surveys—Sample weights calculation template. New York, NY: UNICEF. Retrieved from http://mics.unicef.org/tools?round=mics5

  • Upadhyay, U. D., & Karasek, D. (2012). Women's empowerment and ideal family size: An examination of DHS empowerment measures in sub-Saharan Africa. International Perspectives on Sexual and Reproductive Health, 38, 78–89.

    Article  Google Scholar 

  • U.S. Census Bureau. (2006). Current Population Survey: Design and methodology (Technical Paper No. 66). Washington, DC: U.S. Bureau of Labor Statistics and U.S. Census Bureau.

  • U.S. Census Bureau. (2008). Appendix C: Computing the SIPP sampling weights. In SIPP users guide (pp. C-1–C-22). Washington, DC: U.S. Census Bureau.

  • Valliant, R., Dever, J. A., & Kreuter, F. (2013). Practical tools for designing and weighting survey samples. New York, NY: Springer.

    Book  Google Scholar 

  • Wilcox, W. B., & Dew, J. (2016). The social and cultural predictors of generosity in marriage: Gender egalitarianism, religiosity, and familism. Journal of Family Issues, 37, 97–118.

    Article  Google Scholar 

  • Winship, C., & Radbill, L. (1994). Sampling weights and regression analysis. Sociological Methods & Research, 23, 230–257.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Tom Pullum, Ren Ruilen, and Mahmoud Elkasabi of ICF International for comments on an earlier draft of the manuscript; Bryan Sayer for helping with the original formulation of this research; Qingfeng Li, Chuck Rohde, and Saifuddin Ahmed for giving advice on the equations; and Scott Zeger and Larry Moulton for advice on the simulations. Also thanks go to Abishek Singh and Visseho Adjiwanou for trying out these methods already. We are grateful that funding for this research was provided by Grant R03HD068716 from the National Institute for Child Health and Development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stan Becker.

Electronic supplementary material

ESM 1

(PDF 61 kb)

Appendices

Appendix: Derivation of Couple Weight and Relationship to Household Weight Available in DHS

Here we derive a couple weight for the general sampling design of DHS of two-stage sampling within each strata—that is, sampling of clusters within strata and then households within a selected cluster. Algebra to derive the couple weight from the normalized household weight provided with the DHS surveys is also given. Women’s, men’s, and couple weights differ from the household weight and each other only because of different response rates of these groups.

Notation

i = strata identifier

j = cluster identifier

h = household identifier

l = identifier of person or couple in the household

nh = total number of completed households in the sample

nw = total number of completed women in the sample

nm = total number of completed men in the sample

nc = total number of completed couples in the sample

I = number of strata

Ji = number of clusters in strata i that are selected

Hij = number of households selected in cluster j in strata i

z = identifier for woman (z = w), man (z = m), or couple (z = c)

Lijh = Number of eligible z in household ijh

We assume clusters and households are selected from all strata:

$$ {p}_{ij}^1=\Pr \left\{j\mathrm{th}\ \mathrm{cluster}\ \mathrm{in}\ \mathrm{stratum}\ i\ \mathrm{is}\ \mathrm{selected}\right\} $$

\( {p}_{ijh}^2=\Pr \left\{\mathrm{household}\ h\ \mathrm{in}\ \mathrm{cluster}\ j\ \mathrm{in}\ \mathrm{stratum}\ i\ \mathrm{is}\ \mathrm{selected}\ \right|\mathrm{cluster}\ j\ \mathrm{is}\ \mathrm{selected} \)}

$$ {p}_{ijh}^3=\Pr \left(\begin{array}{c}\mathrm{household}\ h\ \mathrm{in}\ \mathrm{cluster}\ j\ \mathrm{in}\ \mathrm{strata}\ i\ \mathrm{is}\ \mathrm{selected}\ \mathrm{for}\ \mathrm{male}\ \mathrm{interview}\mid \mathrm{household}\ \\ {}h\ \mathrm{in}\ \mathrm{cluster}\ j\ \mathrm{in}\ \mathrm{stratum}\ i\ \mathrm{is}\ \mathrm{selected}\end{array}\right). $$

However, by design in DHS, \( {p}_{ijh}^3 \) = p3, a fixed constant, given that the sampling design calls for a fixed proportion of households to be sampled for the male interview in addition to the household and female interviews.

\( {r}_i^h=\Pr \Big\{\mathrm{household}\ \mathrm{is}\ \mathrm{completed}\ \mathrm{in}\ \mathrm{stratum}\ i \) | household is selected in strata i}

$$ {r}_i^z=\Pr \left\{\mathrm{interview}\ \mathrm{with}\ z\ \mathrm{is}\ \mathrm{completed}\ \mathrm{in}\ \mathrm{stratum}\ i\ \right|\ \mathrm{household}\ \mathrm{is}\ \mathrm{completed}\ \mathrm{in}\ \mathrm{stratum}\ i\Big\}. $$

Whereas \( {p}_{ij}^1 \), \( {p}_{ijh}^2 \), and p3 are probabilities from the sampling design, \( {r}_i^h \) and \( {r}_i^z \)are proportions estimated ex post facto. These latter proportions are typically computed at the strata level in DHS, but clearly the equations could be modified for other designs.

The household weight is derived as follows:

$$ {x}_{ij h}^h=\frac{1}{p_{ij}^1\times {p}_{ij h}^2\ {r}_i^h\ }. $$

However, for many applications, it is desirable to have a normalized household weight (that sums to the sample size of households). So let the normalized weight be \( {w}_{ijh}^h \). Let S denote the set of all households with completed questionnaires. Then define the indicator function:

1[hS] = 1 if selected household h (in stratum i and cluster j) had a completed household interview, and 0 otherwise.

Then also define

$$ {T}^h={\sum}_{i=1}^I{\sum}_{j=1}^{J_i}{\sum}_{h=1}^{H_{ij}}1\left[h\in S\right]\times {x}_{ij h}^h. $$

Thus, for completed households, \( {w}_{ijh}^h=\frac{n^h\times {x}_{ijh}^h}{T^h} \).

Similarly, to derive women’s weights, men’s weights or couple weights,

$$ {x}_{ij h}^z=\frac{1}{p_{ij}^1\times {p}_{ij h}^2\times {p}^3\times {r}_i^h\times {r}_i^z}. $$

Note that for women’s weights we take p3 = 1.0.

Define B to be the set of all completed interviews of individuals/couples z, and let 1[lB] = 1 if selected household h in stratum i and cluster j had a completed interview(s) of the lth individual/couple, and 0 otherwise; this is also 0 if there is no eligible woman/male/couple in the household.

Then let

$$ {T}^z={\sum}_{i=1}^I{\sum}_{j=1}^{J_i}{\sum}_{h=1}^{H_{ij}}{\sum}_{l=1}^{Lijh}1\left[l\in B\right]\times {x}_{ij h}^z, $$

and

$$ {w}_{ijh}^z=\frac{x_{ijh}^z\ {n}^z}{T^z}. $$

Obtaining the Couple Weight From \( {w}_{ijh}^h \) and \( {r}_i^c \)

With the DHS data files that are publicly available, the original sampling probabilities are not given. Thus, in deriving the couple weight for these data, the normalized household weight must be used as the starting point. Using this weight then,

$$ {w}_{ij h}^h=\frac{n^h\times {x}_{ij h}^h}{T^h}={n}^h\times \frac{1}{p_{ij}^1\times {p}_{ij h}^2\times {r}_i^h\ }/{T}^h. $$

As described in the text, we invert this and multiply by \( {r}_i^c \) and a constant k = (p3 × nh / Th); that is,

\( k\times {r}_i^c\times \left(1/{w}_{ij h}^h\right)={p}_{ij}^1\times {p}_{ij k}^2\times {p}^3\times {r}_i^h\times {r}_i^c=1/{x}_{ij h}^c. \)So now for couples, the normalized weight will be

$$ {w}_{ijh}^c={x}_{ijh}^c\ {n}^c/{T}^c, $$

which by definition will sum across the entire completed sample of couples, to nc.

In our analyses, we utilize women’s, men’s, and the couple weights as well as two proxy couple weights. The two proxies estimate the proportion of couples completed from information available on individual in-union persons. Specifically, following the Venn diagram of Fig. 1, information on couples in which both partners were nonrespondents is not available in the DHS survey data (other than in these 11 surveys). However, in virtually all DHS surveys, it is possible to enumerate wives whose husbands are nonrespondents and husbands whose wives are nonrespondents because spouse’s line number is given in the completed individual questionnaires. Thus, one proxy (ALT) adds these two numbers to the completed couples to form the denominator of the couple response rate—that is, the estimate of \( {r}_i^c \). For the other proxy (EST), the Chandra-Sekar–Deming method is used to estimate the number of couples in which both partners were nonrespondents, assuming independence of the probabilities of nonresponse.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Becker, S., Kalamar, A. Sampling Weights for Analyses of Couple Data: Example of the Demographic and Health Surveys. Demography 55, 1447–1473 (2018). https://doi.org/10.1007/s13524-018-0688-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13524-018-0688-1

Keywords

Navigation