Using a complex multistage sampling design, the NLSY 1979 cohort recruited men and women, age 14–21 years in 1979, and has followed them since through in-person and telephone interviews [6]. Pregnancy data were collected beginning in 1986: pregnancies prior to 1986 were recorded in that survey year, with subsequent pregnancies recorded prospectively [6]. We excluded women with non-singleton births, and, to maintain temporal ordering, considered only births before age 40, the age at which the main outcome (obesity) was recorded. This yielded 4,780 eligible women, who had 10,908 births (Figure 1).
Following notation of VanderWeele [7], we considered six types of variables (Figure 2), described below. Outcome (Y) was obesity at midlife, defined as body mass index (BMI) ≥30 kg/m2 at age 40 or 41 (collected 2002–2010). Exposure (A) was early-life socioeconomic disadvantage, as reflected by baseline SEP. We used seven different binary variables to categorize disadvantage in separate models: education of the respondent’s father <12 years; education of the respondent’s mother <12 years; education of both parents <12 years; income of household where the respondent lived as a dependent in 1978 < 200% of federal poverty level; or <100% of federal poverty level; respondent’s father/stepfather worked in a blue-collar occupation in 1979; and respondent’s father/stepfather worked less than full-time in 1979 (part-time or unemployed). Variables were chosen to provide multiple measures of early life socioeconomic position, given that the relationship between SEP and health outcomes often differs depending on the factor considered [8]. We dichotomized exposure variables in part due to what would be a limited number of observations in some SEP categories when stratified by race/ethnicity (e.g. few parents of black and Hispanic NLSY respondents attained ≥16 years of education). The putative mediator (Z) was history of ever experiencing ≥1 excessive GWG event in any birth prior to age 40 versus never having gained excessively. Total GWG was calculated for each birth by subtracting self-reported pre-pregnancy weight from self-reported delivery weight and then categorizing the GWG according to the 2009 Institute of Medicine guidelines [2]. Excessive GWG was defined as weight gain above the guideline upper limit, based on pre-pregnancy body mass index (BMI).
We restricted estimates of the A ~ Y and A ~ Z associations to women with measured exposure information (Figure 1), rather than imputing missing childhood experiences. GWG data were available for 9,347 of 10,908 births (85.7%), and ever-never GWG status was known for 4,124 of 4,780 women (86.3%). Compared to exposure variables, GWG data had fewer missing values and a richer set of predictors; thus, we multiply imputed missing GWG data at the level of each pregnancy in order to assign missing ever-never excessive GWG status. As a sensitivity check, we repeated all analyses restricted only to those women with measured GWG data for all recorded births, which did not appreciably alter estimates.
We differentiated between confounders of the A ~ Y and A ~ Z relationships (denoted X) and confounders of the Z ~ Y relationship (denoted W), both determined a priori (Figure 2). X confounders were all binary individual-specific variables: birth outside the US, urban residence as a child, and residence in the South as a child; but excluded variables plausibly on a causal path between childhood SEP and adult obesity (e.g. adult SEP, pre-pregnancy BMI). W confounders included all X confounders, plus additional birth-specific and potentially time-varying maternal variables: age, marital status, smoking during pregnancy, educational attainment (<12, 12–15, ≥16 years), pre-pregnancy BMI (linear and quadratic terms), equivalized household income [9], and previous excessive or inadequate GWG. Missing values for X and W confounders were addressed using multiple imputation. Race/ethnicity (denoted V) was defined as non-Hispanic black, non-black Hispanic, and non-black non-Hispanic (of which 98.4% self-identified as white). We present separate results by race/ethnicity, because early-life socioeconomic factors, as measured by educational attainment, for example, may have different associations with adult health by race/ethnicity [10,11].
Using the potential outcomes framework, we refer to Ya1 as an individual’s midlife obesity status had, possibly counter-to-fact, her early-life SEP been disadvantaged, and Ya0 as that same individual’s midlife obesity status had, possibly counter-to-fact, her early-life SEP been not disadvantaged. Averaged over the population, we define the cumulative incidence ratio (risk ratio, RR) for midlife obesity under these two settings of early-life SEP as: E[Ya1/Ya0]. We had seven measures of early-life SEP and three race/ethnicity subgroups, and, therefore, were interested in 21 different A ~ Y associations. Additionally, we wished to estimate the effect of ever experiencing excessive GWG on midlife obesity: E[Yz1/Yz0], in each race/ethnicity subgroup, as well as the effect of early-life socioeconomic disadvantage on ever experiencing excessive GWG: E[Za1/Za0], for each of our seven SEP measures and three race/ethnicity subgroups.
We estimated the above parameters using marginal structural models (MSM) and inverse probability of treatment weighting estimators [12-14]. Briefly, observations were up-weighted that, based on covariates, were less likely to obtain their observed exposure status. The weighting provides balance across the exposed and unexposed populations with respect to the confounding variables used to estimate the weights. In this weighted “pseudo-population,” the adjusted exposure-outcome association carries a population-average interpretation. Causal inference requires further assumptions: consistency, positivity, exchangeability, and correct specification of the treatment models that generated the weights [14,15]. We also assume correct specification of the imputation model specification and missingness at random. We specifically refer to our results as “associations” rather than “effects” to emphasize the strength of these assumptions.
Weights were the inverse probability of observed exposure status, given putative confounding variables. We obtained these probabilities from regression models for the exposure, using general estimating equations with exchangeable correlation structure (clustering on household). For estimating the Z ~ Y association, where the “exposure” was ever experiencing excessive GWG, exposure probability was based on estimating the probability of excessive GWG in observed births (clustered within women) and then using chain probabilities for ever-never status, which allowed for time-varying W confounders, such as age and previous excessive GWG events. Using notation from VanderWeele [16], for each individual i, the exposure weights for A \( \left({w}_i^A\right) \) and for Z \( \left({w}_i^Z\right) \) were
$$ {w}_i^A=\frac{P\left(A={a}_i\Big|V={v}_i\right)}{P\left(A={a}_i\Big|X={x}_i,V={v}_i\right)} $$
and
$$ {w}_i^Z=\frac{P\left(Z={z}_i\mathit{\Big|}V={v}_i\right)}{P\left(Z={z}_i\mathit{\Big|}A={a}_i,X={x}_i,W={w}_i,V={v}_i\right)} $$
Having seven measures of early-life SEP, we estimated seven MSMs for the A ~ Y association, seven MSMs for the A ~ Z association, as well as one MSM for the Z ~ Y association. All MSMs were log-linear models and included interaction terms for race-ethnicity (V).
To assess potential mediation by excessive GWG, we estimated controlled direct effects following methodology proposed by VanderWeele [7]. Pearl [17] defines the controlled direct effect as the effect of exposure on outcome under a hypothetical intervention to hold the mediator at a specific value. Of interest was the associations between early-life SEP and midlife obesity if excessive GWG were prevented in all pregnancies, or E[Ya1z0 – Ya0z0]. If this controlled direct effect is reduced in magnitude with respect to the total effect, this suggests that the potential mediator is part of a pathway between exposure and outcome [7]. The controlled direct effect is estimated from an MSM containing exposure weights for both A and Z, and model terms for A, Z, and A-by-Z interaction [7].
All inverse probability weights were stabilized [14]. In order to reduce the variability of estimates, final weights were truncated at the 1st and 99th percentiles [14]. In addition to exposure and mediator weights, as appropriate, all MSMs included NLSY baseline (year 1979) sampling weights to be nationally representative. We included inverse probability censoring weights to account for losses to follow-up [14]. Variables in the censoring models were childhood SEP measures and all X and W confounders. Point estimates were averaged over 25 multiple imputations. We used bootstrap re-sampling of households to estimate 95% confidence intervals (at the 0.025 and 0.975 quantile). Associations were considered statistically significant if 95% confidence intervals for the risk ratio excluded one. We did not adjust for multiple comparisons [18]. Analyses were completed in R version 3.1.0 (http://www.r-project.org/). The University of California Berkeley Committee for Protection of Human Subjects did not consider this study human subjects research because data were de-identified and openly available on the Internet.