1 Introduction

The gender pay gap remains both a salient social problem and a puzzle to social scientists as it persists despite the institutionalization of egalitarian gender norms in the labor market and the reversal of the male advantage in education. To resolve the puzzle, scholars have sought to identify the relative contribution of different factors to the gender gap. Whereas research on work-family compatibility, overwork, and the motherhood penalty (Waldfogel 1997; Gangl and Ziefle 2009; Cha and Weeden 2014; Goldin 2014) suggests that the strongly gendered effect of family formation accounts for a substantial part of the gap, a highly-cited 2007 Sociology of Education article that scrutinized the contribution of educational factors relative to family formation among the college-educated concluded that “family formation has virtually no effect on the income gap” (Bobbitt-Zeher 2007, p. 13). It also found that “values appear to matter only modestly” (Ibid.). According to that study, gender differences in fields of study instead explain the lion’s share of the income gap. In this article, I assess the robustness of these results by way of replication. This is an endeavor worth undertaking given that the original findings have important policy implications, as they suggest that improvements in work-family compatibility would do nothing to reduce economic gender inequality.

I first conduct a pure replication. Fitting statistical models that emulate those of the original study closely reproduce the original findings. However, I argue that these estimates are biased because Bobitt-Zeher restricts her sample to year-round full-time workers and thus conditions on labor force participation which, according to her own theoretical argument, mediates the effect of family formation on income. I show that her estimates for the importance of values are negatively biased for the same reason. Equally important, I also show that the original study misspecified and misallocated the hypothesized gendered effect of family formation in both relevant decompositions. The results from a series of models that subsequently correct for endogeneous sample selection and misspecification suggest the original study indeed severely underestimated the importance of family formation, and also that of values. According to my findings these two factors explain nearly a third of the gender income gap in the sample at hand, and thus about as much as the educational factors emphasized in the original article. My re-analysis corroborates Bobbitt-Zeher’s finding that education-related factors in absolute terms explain a sizeable share of the gender income gap.

For the sake of brevity, I choose not to debate here Bobbitt-Zeher’s decision to analyze incomes rather than wages or the issue of omitted variable bias in testing the theorized devaluation of college majors. These issues have been discussed elsewhere (Petersen 1989; Morgan and Arthur 2005; Gerber and Cheung 2008; Ochsenfeld 2014).

My intention is not to single out Bobbitt-Zeher’s analysis for criticism. In fact, studies published in the top U. S. and German sociology journals have drawn similar conclusions based on very similar designs before and after (Marini and Fan 1997; Leuze and Strauß 2009). The purpose of this replication is thus to critically reflect on this enduring research tradition rather than to fault an individual author.

2 Overview of the original study

Bobbitt-Zeher (2007) studies the causes of income inequality between women and men in a cohort of young college graduates. She places particular emphasis on scrutinizing comprehensively the role of education and the role of family formation, arguing that the two have rarely been studied in conjunction (Ibid., p. 6). To “understand the weight of the two sets of influences relative to one another” (Ibid.) is thus the declared aim of her study. The author investigates this issue with data from the National Education Longitudinal Study of 1988 (NELS:88) which she restricts to persons with 4‑year college degrees who work at least 35 h per week throughout 1999 (Ibid., p. 7). She derives her main findings from two regression-based income decompositions.

The first decomposition (Table 1; Bobbitt-Zeher 2007, p. 13) starts from a baseline model with gender as the only predictor and sequentially adds variable groups that measure respondents’ background, values, education, family status, and work characteristics to decompose the unconditional gender income gap into these five components. The order in which the author adds these variable groups to the set of predictors is based on their position in the life course trajectory. A reduction in the coefficient for the dummy variable female can thus be interpreted as a mediation and explanation of the gender income gap by the added set of covariates. Bobbitt-Zeher finds that differences in how much women and men value ‘having lots of money’ explain only 7% of the gender income gap in addition to differences in background, whereas adding education-related covariates explains a third of the gap. In her analysis, the variables that measure respondents’ family formation status do not further reduce the conditional income gap, but work-related factors such as hours worked, industry, and occupation account for additional 25% of the gap.

Table 1 Regression coefficients for female and percentage of the gender income gap explained with alternate models, including background, values, education, family formation, and work factors

The second (Oaxaca-Blinder) decomposition (Table 2; Bobbitt-Zeher 2007, p. 14) fits income regressions separately for women and men using the same measures for background, values, education, family formation, and job characteristics as covariates. In order to calculate the percentage explained by gender differences in endowment with the covariates, this procedure averages over women’s and men’s coefficients to obtain hypothetical gender-neutral rates of return to the covariates. It then uses these hypothetical rates to estimate the degree to which gender differences in income are due to gender differences in endowment with the various covariates and to which degree the difference in outcome is instead due to gender differences in the rates of return to these endowments. Compared with the first decomposition, Bobbitt-Zeher’s Oaxaca-Blinder decomposition attributes less explanatory power to education-related factors and more to work-related variables. This is because the Oaxaca-Blinder decomposition ignores the causal order of variable groups and hence ignores the fact that work-related variables mediate the effects of the education-related characteristics on income. As in the second decomposition, family formation seemingly plays no role whatsoever in the generation of women’s economic disadvantage.

Table 2 Regression decompositions showing contributions of background, values, education, family formation, and work characteristics to the gender income gap

Based on the results from the first and second decomposition, the author concludes that besides differences in job characteristics, horizontal sex segregation in fields of study was crucial to understand income differences between college-educated women and college-educated men, whereas “values appear to matter only modestly, while family formation has virtually no effect on the income gap for this sample of young workers.” (Bobbitt-Zeher 2007, p. 13).

3 Pure replication

My first set of analyses (replications 1A and 2A) aims to mimic as closely as possible the original model specifications and sample restrictions in order to reproduce the original results and to provide a reference for assessing the impact of corrections on the original design later on. An exact replication in the strict sense was not feasible because I could not obtain the original code. When preparing the analysis sample I therefore emulated the original article whenever possible and resorted to qualified guesses when instructions therein were insufficiently detailed.Footnote 1 A replication package documents these decisions in detail and is publicly and permanently available at the Harvard Dataverse (Ochsenfeld 2016a). The resulting analysis sample is slightly smaller (N = 1924) than that of the original study (N = 1946). A comparison of descriptive statistics (see Table 3 in the appendix and Bobbitt-Zeher [2007, pp. 11, 18 f.]) indicates that the independent variables are distributed very similarly but not identically. Average annual incomes are slightly lower in my sample for both women ($ 32,573) and men ($ 39,243) compared to the original study ($ 32,953 and $ 39,891, respectively). The resulting unconditional gender income gap is thus slightly smaller in my sample ($ 6670) than in the original study ($ 6938).

The original analyses were conducted on the NELS:88 Restricted Use Data. Because researchers outside the U. S. cannot access these data, my replication is based on the NELS:88 Public Use Data which is identical to the Restricted Use Data except that it does not include students’ SAT scores and a measure for the selectivity of the college attended. However, because these variables were shown to be entirely peripheral in the original analysis, it is nevertheless possible to very closely replicate the original results.

With regards to percentage of the income gap explained by the various variable groups in the sequential decomposition, my results almost exactly reproduce those of the original study, except for work variables where explanatory power is lower than in the original study (Table 1). I can only speculate that this may be due to potential differences in how I recoded certain industries and occupations to arrive at the categories reported in Bobbitt-Zeher (2007, p. 18 f.).

The estimates from the Oaxaca-Blinder decomposition (Table 2) are nearly identical to those reported in Bobbitt-Zeher (2007). Only for grades is this not the case, in all likelihood because this variable group encompasses 12th grade standardized test scores, 12th grade grades, and SAT scores in the original study whereas my analysis cannot include SAT scores.

If I were to draw conclusions about the importance of education-related factors relative to family formation and values based on the results of this replication exercise, these would echo Bobbitt-Zeher’s. Family formation seems to explain none of the gender income gap; gender differences in the importance of ‘having lots of money’ are almost irrelevant, but education-related factors – most notably segregation in college majors – explain a sizeable share (nearly a third) of women’s lower incomes. In the following, I will argue that this conclusion needs to be revoked with respect to family formation and values.

4 Replication with corrections for endogeneous sample restriction and misspecification

To explicate why I consider both the restriction Bobbitt-Zeher imposes on her sample and her model specifications to be at odds with her theoretical argument, I briefly summarize her argument regarding the potential effect of family formation on the gender income gap. She states that

The effects of family formation, particularly marriage and parenthood and their impact on participation in paid labor, are implicated in gender income disparities. For example, net of other factors, such as education, women with children make 10 percent to 15 percent less than do women without children, and there is a 7 percent wage penalty for each child that a young woman has. […] The same patterns do not hold for men; fathers experience no comparable wage penalty for their parental status. (Bobbitt-Zeher 2007, p. 4)

This argument suggests that gender moderates the effect of family formation on income: Whereas the effect is thought to be negative for women, it is thought to be non-existing for men. Furthermore, the author argues that labor force participation is a key mechanism that brings about the negative effect of motherhood on income:

The impact of family formation on gender differences in earnings appears to operate through women’s decreased labor force participation. Both length of job experience and part-time employment contribute to lower earnings. (Bobbitt-Zeher 2007, p. 5)

Fig. 1 illustrates these statements.

Fig. 1
figure 1

Causal diagram for the effect of family formation on income for women and men. (F family formation, I income, H hours worked, E job experience. Based on Bobbitt-Zeher [2007, p. 5])

In her study, Bobbitt-Zeher chose to restrict her sample to persons working full time (≥35 h per week throughout the year). She provides no justification for this decision other than to “avoid part-time and inconsistent workers from biasing the analysis” (Bobbitt-Zeher 2007, p. 7). However, the reverse is likely to be true because she thereby conditions on hours worked which, according to her own theoretical argument, mediate part of the motherhood penalty (Fig. 1). In consequence, her estimates for the effect of family formation suffer from overcontrol bias (Elwert and Winship 2014).Footnote 2

Fig. 2 suggests that Bobbitt-Zeher’s decision to delete persons who work less than full-time year-round indeed disproportionately removes mothers from the sample and thus underreports the frequency of motherhood. Worse, this only retains mothers that experienced no or merely a weak effect of motherhood on hours worked: The difference in hours worked between mothers and fathers is significantly larger in the sample that includes part-time (≥10 h per week) and non-year-round workers than in the restricted sample (see Fig. 2).

Fig. 2
figure 2

Hours worked by mothers and fathers, separately for sample restricted to year-round full-time working persons (lower row) and sample including part-time and non-year-round working persons (upper row)

The effect of values, too, is likely affected by overcontrol bias: Persons who did not find ‘making lots of money’ important during high school are prone to work fewer hours and earn lower annual incomes later in life for that reason. Hours worked thus mediate the effect of values (i. e. importance of ‘making lots of money’) on income. In this case Bobbitt-Zeher’s decision to restrict the sample to persons working full-time year-round puts downward bias on her estimate for the explanatory power of this factor. Finding ‘making lots of money’ unimportant in 12th grade is indeed weakly associated with not working full-time year-round 7 years later.Footnote 3

To assess how much Bobbitt-Zeher’s decision to restrict her sample to full-time year-round workers has biased the estimated explanatory power of family formation and values in her study, Table 1 (replication 1B) reports the results from a sequential decomposition that also includes persons who work less than year-round full-time and thus allows hours worked to bring about the effects of family formation and values. The explanatory power attributed to values in the sequential decomposition doubles from 8 to 16% – nearly half of what all education-related variables taken together contribute (34%).

The estimate for the contribution of family formation, however, continues to suggest the complete irrelevance of this factor (Table 1, replication 1B). This is, however, because Bobbitt-Zeher’s models do not reflect her theoretical argument. Her statement that gender moderates the effect of family formation on income (see above) suggests a model that includes gender and the family formation variables (parenthood and marriage) as well as terms that interact them with gender (model 5b). However, Bobbitt-Zeher’s models do not include the interaction terms (model 5).

$$\text{Model }5\colon \text{ Income}\, =\beta _{0}+\beta _{1}\text{female}+\ldots +\beta _{k-1}\text{parenthood}+\beta _{k}\text{marriage}+\varepsilon$$
$$\text{Model }5\mathrm{b}\colon \text{ Income}=\beta _{0}+\beta _{1}\text{female}+\ldots +\beta _{k-1}\text{parenthood}+\beta _{k}\text{marriage}+\beta _{k+1}\text{female}\times \text{parenthood}+\beta _{k+2}\text{female}\times \text{marriage}+\varepsilon$$

Once I use model 5b instead of model 5 in the sequential decomposition on the sample that includes part-time workers, family formation adds 15 percentage points to the explanation of the gender income gap rather than nothing at all (Table 1, replication 1C). The work-related covariates in turn add much less explanatory power (13 instead of 23%) because part of their association with income stems from them mediating the motherhood penalty (Fig. 1) and therefore already gets picked up by model 5b (but not model 5).Footnote 4

The same issue plays out somewhat differently in the Oaxaca-Blinder decomposition (Blinder 1973; Oaxaca 1973) that allows for a decomposition of the gender gap into membership, coefficients, and endowments components plus an interaction between coefficients and endowments (Fig. 3; Jones and Kelley 1984; Jann 2008).

Fig. 3
figure 3

Oaxaca-Blinder decomposition

The membership, coefficients and interaction components are often summarized into a single ‘unexplained’ or ‘discriminatory’ component. In the resulting two-fold decomposition the endowments component is conventionally referred to as the ‘explained’ or ‘non-discriminatory’ and the other components taken together as the ‘unexplained’ or ‘discriminatory’ component (Fig. 3). However, the interpretation of the sum of the membership, interaction and coefficients components as ‘unexplained’ is warranted if and only if indeed no theory of interest explains any part of these components (or as ‘discriminatory’ if indeed all components can entirely and unambiguously be interpreted in the light of discrimination theory). This condition usually holds in the most common application of the decomposition where human capital theory is tested against discrimination theory and wages are regressed solely on productivity-related characteristics (e. g. Braakmann 2013).

It does not hold in the study at hand, however, because Bobbitt-Zeher’s statements concerning the gendered effect of family formation provide a strong theoretical ground for attributing the coefficients effects for the variables parenthood and marriage to family formation. Given Bobbitt-Zeher’s above summarized argument (and given that she does not control for job experience), we do expect a direct effect of family formation on income that is more negative for women than for men. The coefficients effect regarding the parenthood and marriage variables should thus be treated as predicted by the family formation argument. In the original study, however, they were not. Instead, they remained unreported. The negligible percentage of total gap explained for family formation in the original study and in replications 2A and 2B (Table 2) merely refers to the endowment effect for family formation – the potentially higher incidence of parenthood and marriage among women compared with men. Table 2 reports both the endowments and coefficients effect for the family formation variables. For the unrestricted sample (Table 2, replication 2B), the coefficients effect of family formation explains 5% of the income gap.

This estimate, however, still does not reflect the full explanatory power of the family formation argument. To the degree that the gender difference in hours worked is an outcome of family formation, the endowment component for hours worked should be attributed to family formation.

Because virtually all other work variables (occupation, industry, sector, job training, job autonomy) can be expected to mediate the effects of education, values, and family formation, too, their inclusion in the Oaxaca-Blinder decomposition renders the decomposition results uninformative for its designed purpose, an assessment of the explanatory power of education-related variables relative to family formation. Table 2 (replication 2C) therefore shows results from a decomposition that excludes the work-related variables. These more meaningful results are in line with those from the previous sequential decomposition (Table 1, replication C) as they suggest that education, family formation, and values all independently explain sizeable shares of the income gap (Table 2, replication 2C).

Based on my preferred decomposition (Table 1, replication 1C), I conclude that values and family formation each explain approximately 15% of the gender income gap. Given that only few respondents (12%) have entered parenthood yet, I consider this to be a sizeable contribution of family formation.Footnote 5

5 Conclusion

In a highly cited 2007 article, Bobbitt-Zeher (2007) assessed the roles of education and family formation for generating income inequality between young college-educated women and men. Her analysis produced results which led her to conclude that educational factors, in particular gender segregation in college majors, clearly dominate over family formation which she found to be entirely irrelevant for the explanation of economic gender inequality.

I argued here that Bobbitt-Zeher’s conclusions concerning the effects of family formation and values must be revoked because her analysis misspecified and misallocated the hypothesized gendered effect of family formation in both relevant decompositions and, in addition, was conducted on a sample restricted in a way that negatively biases the estimated contributions of family formation and values. By way of replication I showed that when correcting for these shortcomings, the combined importance of family formation and values is comparable in magnitude to that of education-related factors. The results from another replication I conducted on comparable German data further corroborate my argument (Ochsenfeld 2016b).

Bobbitt-Zeher’s original finding of a null effect for family formation can be interpreted to suggest that improvements to the work-family compatibility of workplaces or to family policies would not have the potential to reduce the gender income gap. My finding that family formation and values in sum explain almost as much as education suggests the opposite. The analysis was conducted on a sample of young persons who only recently graduated from college and of whom only a small minority have children yet. Hence, the findings refer to a point in the life course when the importance of educational factors for gender inequality is at its maximum and the importance of family formation still close to its minimum. The more persons enter parenthood and the more years pass after graduation from college, the more will education’s relative importance diminish (Braakmann 2013) and family formation’s role increase. The mechanisms that bring about the motherhood penalty range from discrimination against mothers (Correll et al. 2007) to work organization (Goldin and Katz 2016), to gendered parenthood roles (Grunow et al. 2012), their interaction with family policy (Gangl and Ziefle 2009, 2015; Ochsenfeld 2012), and employment mismatch (Soerenson and Dahl 2016). These mechanisms deserve a prominent place both in our search for the causes of economic gender inequality and the design of solutions to it.