In this section, we provide evidence that infertility insurance mandates cause a strategic delay of motherhood. We start in Sect. 4.1 by showing that relatively young women are less likely to have children after the mandates. Although this exercise is similar to Buckles’ (2005), we use more adequate data and a different empirical specification. The premise in Buckles (2005) is that insurance for infertility treatment allows women to postpone motherhood and invest in their careers.Footnote 19 Buckles uses the enactments of mandates to cover infertility treatment in several states of the United States during the late 1980s and 1990s as natural experiments and reports that the mandates increased the probability that relatively old women (40–49) would have young children. We refer to this non-strategic response as a short-term compositional effect or as ex-post moral hazard.Footnote 20 Additionally, Buckles (2005) shows that the mandates reduced the probability that relatively young women (22–26 and 26–30) would have young children. Showing a decrease in the probability of having children while young, however, falls short of proving delay because it fails to consider what these women do when they grow older. Indeed, these women could decide to remain childless. To demonstrate delay, one needs to combine this evidence with evidence of the timing of first-time motherhood, i.e., when (if at all) women decide to become mothers and stop delaying motherhood. This is exactly what we do in Sect. 4.2. The idea is as follows: if young women postpone childbearing because of the mandates, then we should see an increase in the average age at first birth several years after the policies are implemented. We compute the average age of first-time mothers at the state-year level from birth certificate data and show that mandates not only increase the average age in the short-run—consistent with a compositional effect—but, more importantly, that the first motherhood age-gap relative to the counterfactual increases with time since enactment—consistent with a behavioural effect. The counterfactual in this exercise is constructed using the synthetic control method recently developed by Abadie et al. (2010) (henceforth ADH), which exhibits several advantages over the conventional DID estimator. The ADH methodology is explained in Sect. 4.2 in detail.
The probability of having a first child by 30 and 35 years of age
In this section, we estimate the effect of time since the enactment of a mandate on the probability of having at least one biological child by the ages of 30 and 35 using the June CPS data.
Table 2 presents the estimated marginal effects from probit estimations of the number of years of mandated coverage at age 30 on the probability of having at least one biological child by that age. In all regressions, women are, by definition, older than 30. The first four columns show the marginal effects for all strongly treated states, whereas the last four columns show the marginal effects for IL-MA-RI. Each set of four columns is further split between the effects for ‘all’ women and the effects for ‘whites’ only. Finally, the table presents the results of regressions where the control group is composed of all non-treated states (columns labelled ‘control’) and of regressions where the control group is restricted to the states belonging to the synthetic control group constructed in Sect. 4.2 with the ADH methodology (columns labelled ‘synth’; see the note to Table 2 for a complete list of states in this control group).Footnote 21 Panel A shows the marginal effects when the independent variables of interest are year intervals since enactment by the age of 30, i.e., ‘1–5’ and ‘6–10 years’. Panel B shows the marginal effects when the independent variable of interest is instead expressed as a quadratic polynomial of years since enactment by the age of 30. In a given state and year, the number of years of mandated coverage by 30 varies by women according to age. For example, a woman from Maryland who turned 30 before 1985 had zero years of mandated coverage by 30; a woman from Maryland who turned 30 in 1990 had 5 years of mandated coverage at age 30 but would have had 10 years of mandated coverage by 30 had she turned 30 in 1995. Therefore, the coefficient on the variable ‘1–5 years of mandated coverage at age 30’ is being identified by relatively older women, while the coefficient on the variable ‘6–10 years of mandated coverage at age 30’ is being identified by the younger cohorts. Additionally, note that because states enacted their mandates in different years, the number of years of mandated coverage by a certain age is not collinear with age (e.g., a woman who experienced 5 years of mandated coverage by age 30 in Illinois would be 6 years younger than a woman from Maryland with the same duration of coverage by age 30). A large set of controls are included in all regressions, such as state fixed effects, year fixed effects, educational attainment dummy variables (viz., high school, beyond high school), working status, married status, and age dummy variables in 5-year intervals. Standard errors are clustered at the state level.
The results in Panel A of Table 2 show that having a strong or comprehensive mandate enacted by age 30 for 1–5 years is associated with a higher probability of having at least one child before age 30, and this effect is statistically significant for the strong mandates (‘whites’ sample) and for the comprehensive mandates (both for ‘all’ and for ‘whites’). As noted above, the 1–5 years effect is identified by the relatively older cohorts, who did not act strategically and who increased their fertility due to the mandates. This result, which is consistent with that of Bundorf et al. (2007), is suggestive of a moral hazard effect among relatively fertile couples.Footnote 22 However, facing a strong or comprehensive mandate by the age of 30 for longer than 6 years is associated with a lower probability of having a biological child by 30. The marginal effects for ‘6–10 years’ are identified by the relatively younger cohorts, and, hence, constitute evidence of strategic delay. The magnitudes of these marginal effects are, as expected, larger for IL-MA-RI and are generally statistically significant. Moreover, the marginal effects are not small in magnitude, as they imply a reduction of 2.9–3.3 percentage points (pp) in the probability of having a child by age 30 for all women (representing a decrease of 4.2–5 % in the average probability of the treated presented at the bottom of Table 2) and between 1.1 and 2.2 pp for white women (or a decrease of 1.7–3.4 % in the average probability of the treated presented at the bottom of Table 2).
The fit obtained with the quadratic polynomial specification (Panel B) is essentially identical to the one obtained in Panel A, as can be observed from the values of the log-likelihood. The marginal effects of the mandates are now consistently negative when evaluated at 5 and 10 years after their enactment and are always statistically significant, except for the strong-mandates ‘all’-women sample. The quadratic specification also delivers more intuitive results; for example, the effect for ‘whites’ is always larger than the effect for ‘all’. The magnitudes of the marginal effects evaluated at 10 years are also quite large, especially for the strong-mandates ‘whites’ sample as well as for the comprehensive mandates.
A general concern with policy evaluation studies based on non-experimental designs is the possibility that such studies are flawed because the adoption of policies is often endogenous. This would be the case if, for example, the enactment of infertility treatment mandates was linked to low fertility or to a systematic pattern of motherhood delay. As briefly explained in Sect. 2, the endogeneity of the infertility treatment mandates is rejected by several authors (Bitler and Schmidt 2012; Hamilton and McManus 2012; Abramowitz 2014). Nonetheless, because our approach and variables are different from theirs, we assess the extent to which endogeneity might be an issue in our own data. Similar to Bitler and Schmidt (2012), we have included leads of the mandate variables in our analysis of the probability of at least one child by age 30. In particular, we have experimented with a linear measure of ‘years to enactment of mandate’ as well as with indicator variables for ‘future mandate’ together with indicator dummies for 1–3 or 1–5 years to the enactment of the mandates (these results are not displayed for the sake of brevity). None of these variables is statistically significantly different from zero.
Finally, Table 3 reports the marginal effects of the probability of having at least one child by age 35. The results displayed show that most marginal effects are not statistically significantly different from zero, indicating that delay is no longer statistically significant by the age of 35.
Our estimations, although similar, offer some advantages over those of Buckles (2005). First, our dependent variable does not suffer from two important shortcomings present in her indicator ‘own children younger than six present in the household’. Her variable, constructed from the March CPS, does not distinguish between biological and non-biological children, and its universe is restricted to children who live in the household. To construct our variable, we use data from the June CPS on biological children ever born. Second, in our case, the relevant number of years since the mandates is measured at age 30, and all women in the sample are, by definition, older than 30. Buckles (2005), uses the number of years since the mandates at the time of the interview in a sample which includes women who were children—as young as 8-year-old—at the time the mandates were enacted. Her linear specification forces the coefficients on the interaction terms between the years since the mandates and age to be close to zero and not significant because a 22-year-old woman who experiences a mandate for 14 years, for example, is hardly more affected than a 22-year-old who has only been under a mandate for 2 years. Unfortunately, our approach comes at a cost: the variable from the June CPS used to construct the dependent variable used in the regressions of Tables 2 and 3 was not recorded beyond 1995, thereby restricting the estimation of the effects of the mandates to the medium run.Footnote 23
Using the March CPS proxy ‘age of own eldest child in the household’, which was recorded until 2008, to increase the number of available periods would be unlikely to bias the results for relatively young women (whose children have not yet left the household and whose probability of adoption is smaller), but it would likely bias the results for older women whose eldest child has already left the household. Indeed, the latter could be wrongly classified as having zero children by the ages of 30/35. Moreover, older women are also more likely to have zero years of mandated coverage by 30/35 (the omitted category in the regressions of Panel A). Consequently, using the March CPS data would imply mistakenly assigning less relative fertility by age 30/35 to older women who have had less exposure to the mandates by the age of 30 or 35, causing an upward bias in the estimated effect of the number of years of coverage on the probability of having at least one child by ages 30/35. Buckles (2005) estimates of the probability of the ‘presence of small children in the household’ for older women (Table 4 of her paper) are likely to suffer from this bias.
It is important to recognise that the June and March CPS data share a shortcoming due to the lack of information on past states of residence. This limitation would be problematic if women with infertility issues were more likely to travel to mandated states to pay lower prices for infertility treatments. Abramowitz (2014) claims that this is unlikely because interstate migration during 1981–2010 was only 3 %, and mandated states in general had lower immigration than non-mandated states. Finally, some readers may wonder whether the welfare reform enacted in 1996 [the Personal Responsibility and Work Opportunity Act (PRWORA)] affects or biases our results; therefore in the “Appendix”, we explain why it does not.
Average maternal age at first birth
We begin this part of the analysis by providing some descriptive evidence that broadly characterises the patterns of fertility timing across groups of states and over time. Figure 2a–c plot the evolution of the average age of new mothers in control states versus all treated states, all strongly treated states, and IL-MA-RI (the states with ‘comprehensive coverage’), respectively. The two vertical lines in each figure indicate the years in which the first and last of the corresponding mandates were passed for all of the treated states (1977; 1991), for the strongly treated states (1985; 1991) and for IL-MA-RI (1987; 1991). Although the average age of first-time mothers was higher in treated states than in control states even before any mandate was enacted, Fig. 2b, c show that, for states with ‘strong mandates to cover’ and for IL-MA-RI, the treated-control gap became larger after the passage of the mandates. For example, in 2001, the age gap between IL-MA-RI and the control states was slightly more than 16 months, that is, nearly 4 months longer than in 1991, the year in which the latest strong mandate passed in Illinois.Footnote 24
More interestingly, from the viewpoint of this paper, the observed increase in the treated-control gap is statistically significant at standard levels of testing, which suggests that the effect of the mandates is larger in the long run than in the short run.Footnote 25 The observed treated-control gap follows an analogous pattern, and its magnitude is similar, albeit somewhat larger, when the sample is restricted to white mothers.Footnote 26 It is also worth noting that the increasing trend that we have documented is not so evident when all of the thirteen treated states are considered together (Fig. 2a). The reason lies in the much more limited scope of the ‘weak mandates to cover’ and the ‘mandates to offer’, described in Sect. 2. Interestingly, visual inspection of Fig. 2 indicates that the treated-control gap may have been increasing even before the mandates were enacted, especially for IL-MA-RI. Although this evidence is merely descriptive, it highlights the importance of selecting a control group that successfully mimics the dynamics of the treated states to estimate the true impact of the mandates.
To construct a control group that maximises the similarities between women in treated and control states, we use the synthetic control method (Abadie et al. 2010)Footnote 27 which benefits from several advantages over the conventional DID estimator. The synthetic control group approach limits the discretion of researchers in the choice of the control units by offering a procedure for the construction of an ‘ideal’ control group denoted as the ‘synthetic’ control group. The synthetic control group uses a weighted average of the potential control units, which provides a better counterpart for the treated units than any single actual control unit or set of actual control units. The weights assigned to each control unit are chosen to minimise the differences in pre-treatment trends and other predictors between the treated unit and the synthetic control group. This estimation procedure is very transparent because it reports the estimated relative contribution, which may be zero, of each control unit to the synthetic group. It is worth noting that, although the synthetic control group approach is obviously related to the standard DID estimator, which it extends, the synthetic control group approach also has features in common with matching estimators insofar as both approaches attempt to minimise observable differences between the treatment and control units. Indeed, some of the latest developments in the literature attempt to minimise the chances of selection into treatment based on unobservables.Footnote 28 The synthetic control approach is a step in this direction because it relies on more general identifying assumptions than the standard DID model, allowing the effects of unobserved variables on the outcome to vary with time.
To apply the synthetic control group, the birth certificate data on the age of new mothers for the period 1972–2001 must be aggregated at the state and year levels.Footnote 29 This aggregation is advantageous in our case because it allows us to control for socioeconomic characteristics by merging the aggregated birth certificate data with socioeconomic variables available in the March CPS for the period 1977–2001 (also aggregated at the state and year levels).Footnote 30 Moreover, births from all strongly treated states are also aggregated as if they belonged to the same state with initial treatment in the year 1985, the year the first strong mandate was enacted. Similarly, we aggregated the data for the subset of comprehensive states (IL-MA-RI) with initial treatment in the year 1987 when the first mandate was enacted in Massachusetts.Footnote 31 The synthetic control group is constructed as the convex combination of control states that are most similar to the states with strong coverage and comprehensive coverage with respect to various socioeconomic predictors as well as lagged values of the average age of first motherhood before treatment (i.e., before 1985 and 1987, respectively). More precisely, the predictors chosen include the following: (1) variables that control for the demographic and family structure of the female population, such as the percentage of new mothers older than 35 and the percentage of married women in the state; (2) variables that control for the state’s race composition, such as percentage of white and black females; (3) variables that control for the education level of the female population, such as the percentage of highly educated women; (4) variables related to the female labour market, such as the participation rate and employment rate, the average logarithm of the hourly wage, and the percentage of women covered by Employment Sponsored Insurance (ESI); (5) variables that control for differences in abortion laws or attitudes, such as the abortion rate per 1000 women by state of residency; and (6) several lags of average age at first birth.Footnote 32 All of these predictors are averaged over different periods to maximise the fit of the estimation. Although the predictors are roughly the same for the four estimations (strong, strong whites only, IL-MA-RI, IL-MA-RI whites only), the composition of the synthetic control group is not exactly the same. It is always the case, however, that New Jersey is systematically the most important state in the composition of the four synthetic control groups, representing between 26 and 41 %, followed by Minnesota, whose contribution ranges between 11 and 17 % of the estimated synthetic control group.Footnote 33
Table 4 presents the pre-treatment (i.e., before 1985) sample averages of all predictors for the states with strong coverage (column 2), as well as for the synthetic control group (column 3), and for the full group of control states (column 4). As shown, prior to the passage of the first strong mandate to cover, new mothers in control states were already younger than in states where strong mandates to cover eventually passed. These mothers also earned lower wages on average and, were less educated, more likely to be married, less likely to have an abortion, less likely to participate in the labour market, less likely to be employed and less likely to have employer-provided health insurance coverage. The predictors’ pre-treatment values for the strongly treated states resemble the pre-treatment values of the synthetic control group (column 3) much more than the pre-treatment values for the full set of control states (column 4 ). Hence, the synthetic control group should be a better counterfactual for the treated groups. Tables for the white sample and for IL-MA-RI are similar but are not reported here in the interest of brevity.
Our synthetic control estimate of the impact of the infertility coverage mandates on the timing of the first child is the difference between the average age of new mothers in states with strong mandates to cover (or the subset of IL-MA-RI) and the synthetic control group at a given date. Panel A of Table 5 shows the estimates for the group of states with strong mandates to cover, whereas Panel B shows the same estimates for IL-MA-RI. The second column reports the synthetic control group estimate in 2001, that is, 16 and 10 years after the first and the last strong mandates were passed, respectively. We refer to this estimate as the long-term effect of the mandates. For the group of states with strong mandates, the long-term effect amounts to 0.266 and 0.317 years, approximately 3.2 months for all women and 3.8 months for white women, respectively. For IL-MA-RI, the effects are larger despite the shorter period since the first mandate: an increase of approximately 4.1–5.4 months in the average age at first child for all and for white new mothers, respectively. The estimated long-term effects of the mandates are considerable—between 15.7 and 18.8 % of the total increase from 1985 to 2001 for the group with strong coverage and between 24.8 and 34.3 % for IL-MA-RI. The synthetic control estimates are slightly smaller than the raw DID aggregate estimate, which amounts to approximately 0.42 years (5 months). The third column of Table 5 shows the root mean squared prediction error (rmspe), which is a measure of the difference in age at first birth between the treated and the synthetic control group during the pre-treatment period. Hence, the lower the rmspe, the better is our counterfactual. The rmspe values, displayed in column 3, are all small, demonstrating the good fit of the models.
Inference in the synthetic control estimation method is often non-standard because the number of non-treated units is typically small. However, as ADH argue in both their 2010 and their 2014 papers, “by systematizing the process of estimating the counterfactual of interest, the synthetic control method enables researchers to conduct a wide array of falsification exercises” or “placebo studies” that can be used for inference. We follow this approach and apply the synthetic control method to every potential control state to create distributions of 38 placebo treatment effects and other statistics. ADH recommend using the resulting distribution of the ratio post/pre-intervention rmspe values to construct a p value for this statistic. The p value is constructed by simply calculating the proportion of the estimated placebo ratios of post/pre-intervention rmspe values that are greater than or equal to the ratio for the truly treated states. The idea is that, in the absence of a treatment effect, the ratio of the post/pre-intervention fit should be similar for treated and non-treated units. As the last column in Table 5 shows, the p values for the post/pre-intervention rmspe are all very small, implying the existence of a statistically significant treatment effect for all four samples.
Figure 3 shows the annual average age at first birth in strongly treated states and in IL-MA-RI compared with the synthetic control group counterpart for the sample period (1972–2001) for all women and for white women. The synthetic control group does a good job in tracking the pre-treatment evolution of new mothers’ ages in states with strong coverage and in IL-MA-RI, which indicates we have a good approximation to the counterfactual trend in maternal age at first birth that states with strong or comprehensive coverage would have experienced had the mandates not been enacted. It is worth noting the contrast with the evolution of all non-treated units used in Fig. 2b, c, which has failed to track the treated states’ pattern as closely as the synthetic control group has done. This result is not surprising, given the low rmspe values and the closeness in terms of predictor values between the states with strong coverage and their synthetic version shown in Table 4, for example.
More important than the size of the estimated long-term effect is its evolution over time, which is shown in Fig. 4. Regressions of the estimated annual effects of the 17 post-treatment periods for the strong mandate states (and the 15 post-treatment periods for IL-MA-RI) on indicators of time since the mandates (i.e., less than 5 years since the mandate, between 6 and 10 years, or more than 10 years), shown in Table 6, confirm that the impact of the mandates grew significantly over time. Figure 4 and Table 6 are crucial because they demonstrate that the long-term cumulative impact of the mandates on the timing of first births extends beyond its short-term non-strategic impact on older women with infertility problems. The increasing impact of the mandates together with the results from Sect. 4.1 constitute evidence of the delay of motherhood. We believe that the mechanism operating here is simple; suppose no supply constraints existed for infertility treatments when the mandates were enacted. If mandates had only a non-strategic effect on older women (i.e., ex-post moral hazard), the estimated effect should therefore be positive but nearly constant over time. The long-term effect may be larger than the short-term effect because, for example, women who were young when the mandates were enacted strategically delay motherhood (i.e., exert ex-ante moral hazard). An alternative explanation for the increasing effect may be that supply constraints for fertility treatments existed when the mandates were enacted but gradually disappeared due to technological improvements and/or price reductions, giving access to a larger number of users of infertility treatments. Our analysis cannot identify the exact contribution of each of these potential explanations to the increasing effect of the mandates.Footnote 34