Appendix
Selection Effects by Sex
Figure 9 plots selection effects by sex. Differences are particularly large for educational attainment, height, ADHD and MDD. Several polygenic scores for mental illness and personality traits are more selected for (or less against) among women, including major depressive disorder (MDD), schizophrenia and neuroticism, while extraversion is more selected for among men.
Alternative Weighting Schemes
We compare results for our main weights to 3 alternative weighting schemes: weighting by age/qualification; geographical (weighting by Middle Super Output Area); and for women only, age, qualification and age at first live birth. Population data for weighting is taken from the 2011 UK Census and the 2006 General Household Survey (GHS). Weighting for Age/Qualification and Age/Qualification/AFLB weights was done using marginal totals from a linear model, using the calibrate() function in the R “survey” package (Jones and Tertilt 2020). Geographical weighting was done with iterative post-stratification using the rake() function, on Census Middle Layer Super Output Areas, sex and presence/absence of a partner.
Table 2 gives effect sizes as a proportion of the unweighted effect size, for all polygenic scores which are consistently signed and which are significantly different from zero in unweighted regressions.
Table 2 Weighted effect sizes as a proportion of unweighted effect sizes Stabillizing and Disruptive Selection
Stabilizing selection reduces variance in the trait under selection, while disruptive selection increases variance. To check for these, we rerun Eq. (1), adding a quadratic term in \(PGS_i\). Scores for hip circumference show significant stabilizing selection (\(p < 0.05/33\), negative coefficient on quadratic term). The EA2 score for educational attainment shows significant disruptive selection (\(p < 0.05/33\), positive coefficient), which reduces the strength of selection against educational attainment at very high levels of the PGS. (The quadratic on the EA3 score has a similar coefficient but is not significant at \(p < 0.05/33\).) Figure 10 plots predicted number of children against polygenic score from these regressions.
We also checked for stabilizing selection in the parents’ generation, using weights multiplied by the inverse of number of siblings. Scores for EA2 and EA3 show significant disruptive selection (\(p < 0.05/33\), positive coefficient on quadratic). Other scores including hip circumference were not significant.
Controlling for Age
Results in Fig. 3 could be explained by age, if older respondents have lower income and are less educated, and also show more natural selection on polygenic scores. However, when we rerun the regressions, interacting the polygenic score with income category and also with a quadratic in age, the interaction with income remains significant at p < 0.05/33 for 17 out of 33 regressions. Similarly if we interact the PGS with age of leaving full time education and a quadratic in age, the interaction with age leaving full time education remains significant at p < 0.05/33 for 12 out of 33 regressions.
Number of Partners and Presence of Partner by Sex
Figure 11 splits up Fig. 4 by sex. The pattern of results is the same in both sexes: selection effects are stronger among those with more lifetime sexual partners, and among those not currently living with a partner.
Parents’ Generation
Selection Effects and Change Over Time
The UK Biobank data contains information on respondents’ number of siblings (including them), i.e. their parents’ number of children. Since respondents’ polygenic scores are equal in expectation to the mean scores of their parents, we can use this to look at selection effects in the parents’ generation. We estimate equation (1) using parents’ RLRS as the dependent variable.Footnote 7 The parents’ generation has an additional source of ascertainment bias: sampling parents of respondents overweights parents who have many children. For instance, parents of three children will have, on average, three times more children represented in UK Biobank than parents of one child. Parents of no children will by definition not be represented. To compensate, we multiply our weights by the inverse of number of siblings.
Figure 12 shows regressions of parents’ RLRS on polygenic scores. For a clean comparison with the respondents’ generation, we rerun regressions on respondents’ RLRS excluding those with no children, and show results in the figure. Selection effects are highly correlated across the two generations, and most share the same sign. Absolute effect size estimates are larger for the parents’ generation. We treat this result cautiously, because effect sizes in both generations may depend on polygenic scores’ correlation with childlessness, and we cannot estimate this for the parents’ generation.
To learn more about this, we compare effect sizes excluding and including childless people in the current generation. The correlation between the two sets of effect sizes is 0.95. So, patterns across different scores are broadly similar whether the childless are counted or not. However, absolute effect sizes are smaller when the childless are excluded, for 27 out of 33 scores; the median percentage change is –41%.
The fact that childless people have such a strong effect on estimates makes it hard to compare total effect sizes across generations. In particular, since the parents’ generation has a different distribution of numbers of children, childless people may have had more or less effect in that generation. Another issue is that we are estimating parents’ polygenic scores by the scores of their children. This introduces noise into our independent variable, which might lead to errors-in-variables and bias coefficients towards zero.
As an alternative approach, we run regressions interacting polygenic scores with birth year, median split at 1950 (“early born” versus “late born”). We use both respondents’ RLRS and parents’ RLRS as a dependent variable. We use our standard weights, and further adjust for selection in the parents’ generation (see above).
Table 3 summarizes the results. We report the number of scores showing significant changes over time (i.e. a significant interaction between polygenic score and the “late born” dummy): either a significant change in sign, a significant increase in effect size, or a significant decrease in size. There is little evidence for changes in selection effects within the parents’ generation, with just one score showing a significant decrease in size. In the respondents’ generation, effect sizes were significantly larger in absolute size among the later-born for eight polygenic scores: ADHD, age at menopause, cognitive ability, Coronary Artery Disease, EA2, EA3, extraversion and Major Depressive Disorder. These changes are inconsistent with the intergenerational change, where estimated effect sizes were larger among the earlier, parents’ generation.
Overall, while there is some suggestive evidence for an increase in the strength of selection in recent history, the clearest result is that the pattern of relative effect sizes across scores is broadly consistent over time.
Table 3 Numbers of polygenic scores showing changes in selection effects between early and late born. Parental generation weights multiplied by 1/number of siblings Area Deprivation
Figure 13 plots effects on parents’ RLRS by Townsend deprivation quintile of birth area.
For comparison, Fig. 14 plots effects on respondents’ RLRS by Townsend deprivation quintile of birth area.
Age at First Live Birth
Among the parents’ generation, we can control for age at first live birth using the subsets of respondents who reported their mother’s or father’s age, and who had no elder siblings. We run regressions on parents’ RLRS on these subsets. Figure 15 shows selection effects by terciles of age at first live birth, for mothers and fathers. As in the respondents’ generation, effect sizes are smaller, or even oppositely signed, for older parents. Importantly, this holds for both sexes.
Figure 16 shows the regressions controlling for either parent’s age at their birth. Effect sizes are very similar, whether controlling for father’s or mother’s age. As in the respondents’ generation, effect sizes are negatively correlated with the effect sizes from bivariate regressions without the control for age at birth (father’s age at birth: \(\rho\) –0.43; mother’s age at birth: \(\rho\) –0.59).
Effects of Polygenic Scores on Age at First Live Birth
Our results suggest that polygenic scores may directly correlate with age at first live birth. Figure 17 plots estimated effect sizes from bivariate regressions for respondents. Figure 18 does the same for their parents, using only eldest siblings.Footnote 8 Effect sizes are reasonably large. They are also highly correlated across generations. Effect sizes of polygenic scores on father’s age at own birth, and on own age at first live birth, have a correlation of 0.99; for mother’s age and own age it is 0.99.
Mediation Analysis
We run a standard mediation analysis in the framework of Baron and Kenny (1986). For each polygenic score where the bivariate correlation with RLRS is significant at p < 0.05/33, we estimate
$$\begin{aligned} RLRS_i&= \alpha + \beta PGS_i + \gamma EA_i + X_i\mu + \varepsilon _i \end{aligned}$$
(3)
$$\begin{aligned} EA_i&= \delta + \zeta PGS_i + X_i\mu + \eta _i \end{aligned}$$
(4)
where \(RLRS_i\) is relative lifetime reproductive success, \(PGS_i\) is the polygenic score, \(EA_i\) is educational attainment (age of leaving fulltime education), and \(X_i\) is a vector of controls. The total effect of PGS on RLRS is \(\beta + \gamma \zeta\). The “indirect effect” mediated by EA is \(\gamma \zeta\). The standard error of the indirect effect can be calculated as
$$\begin{aligned} \sqrt{{\hat{\gamma }}^2 {\hat{\sigma }}_\zeta ^2 + {\hat{\zeta }}^2{\hat{\sigma }}_\gamma ^2} \end{aligned}$$
where \({\hat{\sigma }}_\zeta\) is the standard error of \({\hat{\zeta }}\), etc. We include controls for age and sex in X.
Table 4 Mediation analysis Table 4 shows results. For 22 out of 23 scores, the indirect effect on fertility via human capital is significantly different from 0 at p = 0.05/23 and has the same sign as the total effect. We also calculate the proportion of the total effect that is mediated via the indirect effect, along with uncorrected 95% confidence intervals (100 bootstraps). Note that if the confidence interval for the total effect contains zero, the confidence interval for the proportion may be unbounded (Franz 2007).
Within-Siblings Regressions
Results in the main text support our theory that natural selection on polygenic scores is driven by their correlation with human capital. Here, we test whether polygenic scores cause fertility by running within-siblings regressions. We run a single regression on 29 polygenic scores within 17161 sibling groups (N = 31169). Thus, we control both for environmental confounds (since scores are randomly allocated within sib-groups by meiosis), and for genetic confounds captured by our polygenic scores. We remove four scores which correlate highly with other scores (educational attainment 2, hip circumference, waist circumference and waist-hip ratio). Figure 19 shows the results.
With a reduced sample size, all within-sibling effects are insignificant after Bonferroni correction. However, effect sizes are positively correlated with effect sizes from the pooled model, and about 70% smaller (regressing within-sibling on pooled effect sizes, b = 0.292). This attenuation is broadly consistent with the decrease in heritability in within-sibling GWASs on age at first birth and educational attainment (Howe et al. 2021). We see these results as providing tentative evidence that polygenic scores cause fertility, with effects being partly driven by correlations with environmental variation in human capital. We also reran within-siblings regressions adding a control for education. Most effect sizes barely change, suggesting that our measure of education does not in general mediate differences in fertility among siblings.
Effects on Inequality
Table 5 shows correlations between children’s polygenic scores and household income (UKB data field 738). Column “With selection” uses respondents’ scores, multiplying weights by number of children. Column “Without selection” uses our standard weights, i.e. it estimates the counterfactual correlation if all respondents had the same number of children.
Table 5 Correlations of polygenic scores with income group Further Results
Selection Effects on Raw Polygenic Scores
Figure 20 compares selection effects on polygenic scores residualized for the top 100 principal components of the genetic data, to selection effects on raw, unresidualized polygenic scores. In siblings regressions, effect sizes are larger for raw scores—sometimes much larger, as in the case of height. 29 out of 33 “raw” effect sizes have a larger absolute value than the corresponding “residualized” effect size. The median proportion between raw and controlled effect sizes is 0.8. Among the children regressions, this no longer holds. Effect sizes are barely affected by controlling for principal components.
Overall, 72.73 per cent of effect sizes are consistently signed across all four regressions (on children and siblings, and with and without residualization).
To get a further insight into this we regress respondents’ and parents’ RLRS on individual principal components. Figure 21 shows the results. Labels show the top principal components. These have larger effect sizes in siblings regressions. One possibility is that the parents’ generation was less geographically mobile, and so geographic patterns of childrearing were more correlated with principal components, which partly capture the location of people’s ancestors.
Genetic Correlations with EA3
Another way to examine the “earnings” theory of natural selection is to compare selection effects of polygenic scores with their genetic correlation with educational attainment (EA3). Since EA3 strongly predicts earnings, if earnings drives differences in fertility, we’d expect a correlation between the two sets of results. Figure 22 shows this is so: the correlation, after excluding EA2, is –0.82. Genetic correlations were calculated using LD score regression from GWAS summary statistics.
Model Proofs
Solution for the One-Period Model
Differentiating and setting \(\frac{dU}{dN}=0\) gives the first order condition for an optimal choice of children \(N^{*}>0\):
$$\begin{aligned} \frac{bW}{(W(1-bN^{*}))^{\sigma }}\ge a\text {, with equality if }N^{*}>0. \end{aligned}$$
Rearranging gives
$$\begin{aligned} N^{*}=\max \left\{ \frac{1}{b}\left( 1-\left( \frac{b}{a}\right) ^{1/\sigma }W^{(1-\sigma )/\sigma }\right) ,0\right\} . \end{aligned}$$
(5)
Note that when \(\sigma <1\), for high enough W, \(N^{*}=0\). Differentiating gives the effect of wages on fertility for \(N^{*}>0\). This is also the fertility-human capital relationship:
$$\begin{aligned} \frac{dN^{*}}{dh}=\frac{dN^{*}}{dW}=-\frac{1}{b}\left( \frac{b}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }W^{(1-2\sigma )/\sigma }. \end{aligned}$$
(6)
This is negative if \(\sigma <1\). Also,
$$\begin{aligned} \frac{d^{2}N^{*}}{dW^{2}}=-\frac{1}{b}\left( \frac{b}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }\frac{1-2\sigma }{\sigma }W^{(1-3\sigma )/\sigma } \end{aligned}$$
For \(0.5<\sigma <1\), this is positive, so the effect of fertility on wages shrinks towards zero as wages increase (and becomes 0 when \(N^{*}=0\)). Next, we consider the time cost of children b:
$$\begin{aligned} \frac{d^{2}N^{*}}{dWdb}=-\left( \frac{1}{a}\right) ^{1/\sigma }\left( \frac{1-\sigma }{\sigma }\right) ^{2}(Wb)^{(1-2\sigma )/\sigma }<0. \end{aligned}$$
Lastly we consider the effect of a. From (5), \(N^{*}\) is increasing in a. Differentiating (6) by a gives
$$\begin{aligned} \frac{d^{2}N^{*}}{dadW}=b^{1/\sigma -1}\frac{1-\sigma }{\sigma ^{2}}W^{(1-2\sigma )/\sigma }a^{-1/\sigma -1} \end{aligned}$$
which is positive for \(\sigma <1\).
Solution for the Two-Period Model
Period 1 and period 2 income are:
$$\begin{aligned} Y_{1}&=1-s-bN_{1} \end{aligned}$$
(7)
$$\begin{aligned} Y_{2}&=w(s,h)(1-bN_{2}) \end{aligned}$$
(8)
Write the Lagrangian of utility U (2) as
$$\begin{aligned} {\mathcal {L}}(N_{1},N_{2},s)=u(Y_{1})+u(Y_{2})+a(N_{1}+N_{2})+\lambda _{1}N_{1}+\lambda _{2}N_{2}+\lambda _{3}(\frac{1}{b}-N_{2})+\mu s \end{aligned}$$
Lemma 5 below shows that if \(\sigma >0.5\), this problem is globally concave, guaranteeing that the first order conditions identify a unique solution. We assume \(\sigma >0.5\) from here on.
Plugging (7) and (8) into the above, we can derive the Karush-Kuhn-Tucker conditions for an optimum \((N_{1}^{*},N_{2}^{*},s^{*})\) as:
$$\begin{aligned}&\frac{d{\mathcal {L}}}{dN_{1}}=-bY_{1}^{-\sigma }+a+\lambda _{1} =0\text {, with }\lambda _{1}=0\text { if }N_{1}^{*}>0; \end{aligned}$$
(9)
$$\begin{aligned}&\frac{d{\mathcal {L}}}{dN_{2}}=-bs^{*}hY_{2}^{-\sigma }+a+\lambda _{2} -\lambda _{3}=0\text {, with }\lambda _{2}=0\text { if }N_{2}^{*}>0,\lambda _{3}=0\text { if }N_{2}^{*}<\frac{1}{b}; \end{aligned}$$
(10)
$$\begin{aligned}&\frac{d{\mathcal {L}}}{ds}=-Y_{1}^{-\sigma }+h(1-bN_{2}^{*})Y_{2}^{-\sigma }+\mu =0; \end{aligned}$$
(11)
$$\begin{aligned}&N_{1}^{*},N_{2}^{*},s^{*},\lambda _{1},\lambda _{2},\lambda _{3},\mu \ge 0;N_{2}^{*}\le \frac{1}{b}. \end{aligned}$$
(12)
Note that the Inada condition (that marginal utility of income grows without bound as income approaches zero, \(\lim _{x\rightarrow 0}u'(x)=\infty\)) for period 1 rules out \(s^{*}=1\) and \(N_{1}=1/b\), so we need not impose these constraints explicitly. Also, so long as \(N_{2}^{*}<1/b\), the same condition rules out \(s^{*}=0\). We consider four cases, of which only three can occur.
Case 1: \(N_{1}^{*}>0,N_{2}^{*}>0\)
Rearranging (9), (10) and (11) gives:
$$\begin{aligned} N_{1}^{*}&=\frac{1}{b}\left( 1-s^{*}-\left( \frac{b}{a}\right) ^{1/\sigma }\right) ; \end{aligned}$$
(13)
$$\begin{aligned} N_{2}^{*}&=\frac{1}{b}\left( 1-\left( \frac{b}{a}\right) ^{1/\sigma }(s^{*}h)^{(1-\sigma )/\sigma }\right) ; \end{aligned}$$
(14)
$$\begin{aligned} s^{*}&=\frac{1-bN_{1}^{*}}{1+\left( (1-bN_{2}^{*})h\right) ^{1-1/\sigma }}. \end{aligned}$$
(15)
Plugging the expressions for \(N_{1}^{*}\) and \(N_{2}^{*}\) into \(s^{*}\) gives
$$\begin{aligned} s^{*}=\frac{s^{*}+\left( \frac{b}{a}\right) ^{1/\sigma }}{1+\left( \left( \frac{b}{a}\right) ^{1/\sigma }s^{*(1-\sigma )/\sigma }h{}^{1/\sigma }\right) ^{1-1/\sigma }} \end{aligned}$$
which simplifies to
$$\begin{aligned} s^{*}=\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}h^{(1-\sigma )/(2\sigma -1)}. \end{aligned}$$
(16)
Plugging the above into (13) and (14) gives:
$$\begin{aligned} N_{1}^{*}&=\frac{1}{b}\left( 1-\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}h^{(1-\sigma )/(2\sigma -1)}-\left( \frac{b}{a}\right) ^{1/\sigma }\right) ;\\ N_{2}^{*}&=\frac{1}{b}\left( 1-\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}h^{(1-\sigma )/(2\sigma -1)}\right) . \end{aligned}$$
Note that that \(N_{1}^{*}<N_{2}^{*}\). For these both to be positive requires low values of h if \(\sigma <1\) and high values of h if \(\sigma >1\). Also:
$$\begin{aligned} w(s^{*},h)\equiv s^{*}h=\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}h^{\sigma /(2\sigma -1)}. \end{aligned}$$
Observe that \(w(s^{*},h)\) is increasing in h for \(\sigma >0.5\), and convex iff \(0.5<\sigma <1\).
While \(N_{1}^{*}\) and \(N_{2}^{*}\) are positive, they have the same derivative with respect to h:
$$\begin{aligned} \frac{dN_{t}^{*}}{dh}=-\frac{1}{b}\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}\frac{1-\sigma }{2\sigma -1}h^{(1-\sigma )/(2\sigma -1)-1} \end{aligned}$$
(17)
Examining this and expression (16) gives:
Lemma 1
For \(\sigma <1\), case 1 holds for h low enough, and in case 1, \(N_{1}^{*}\) and \(N_{2}^{*}\) decrease in h, while \(s^{*}\) increases in h.
For \(\sigma >1\), case 1 holds for h high enough, and in case 1 \(N_{1}^{*}\) and \(N_{2}^{*}\) increase in h, while \(s^{*}\) decreases in h.
\(N_{t}^{*}\) is convex in h for \(\sigma >2/3\), and concave otherwise. \(s^{*}\) is convex in h if \(\sigma <2/3\), and concave otherwise.
Case 2: \(N_{1}^{*}=0,N_{2}^{*}>0\)
Replace \(N_{1}^{*}=0\) into the first order condition for \(s^{*}\) from (11), and rearrange to give:
$$\begin{aligned} s^{*}=\frac{1}{1+\left( (1-bN_{2})h\right) ^{1-1/\sigma }}. \end{aligned}$$
Now since \(N_{2}^{*}>0\), we can rearrange (10) to give
$$\begin{aligned} N_{2}^{*}=\frac{1}{b}\left( 1-\left( \frac{b}{a}\right) ^{1/\sigma }(s^{*}h)^{(1-\sigma )/\sigma }\right) . \end{aligned}$$
(18)
Plugging this into \(s^{*}\) gives
$$\begin{aligned} s^{*}&=\frac{1}{1+\left( \frac{bh}{a}\right) ^{(\sigma -1)/\sigma ^{2}}(s^{*}){}^{-(1-\sigma )^{2}/\sigma ^{2}}} \end{aligned}$$
which can be rearranged to
$$\begin{aligned} (1-s^{*})(s^{*})^{(1-2\sigma )/\sigma ^{2}}=\left( \frac{a}{bh}\right) ^{(1-\sigma )/\sigma ^{2}}. \end{aligned}$$
(19)
Differentiate the left hand side of the above to get
$$\begin{aligned}&\frac{1-2\sigma }{\sigma ^{2}}(1-s^{*})(s^{*})^{(1-2\sigma )/\sigma ^{2}-1}-(s^{*})^{(1-2\sigma )/\sigma ^{2}}\nonumber \\&\quad = \frac{1-2\sigma }{\sigma ^{2}}(s^{*})^{(1-2\sigma )/\sigma ^{2}-1}-\frac{\sigma ^{2}+1-2\sigma }{\sigma ^{2}}(s^{*})^{(1-2\sigma )/\sigma ^{2}}\nonumber \\&\quad = \frac{1-2\sigma }{\sigma ^{2}}(s^{*})^{(1-2\sigma )/\sigma ^{2}-1}-\frac{(1-\sigma )^{2}}{\sigma ^{2}}(s^{*})^{(1-2\sigma )/\sigma ^{2}}. \end{aligned}$$
(20)
This is negative if and only if
$$\begin{aligned} s^{*}&>\frac{1-2\sigma }{(1-\sigma )^{2}} \end{aligned}$$
which is always true since \(\sigma >0.5\). Note also that since \(\sigma >0.5\), then the left hand side of (19) approaches infinity as \(s^{*}\rightarrow 0\) and approaches 0 as \(s^{*}\rightarrow 1\). Thus, (19) implicitly defines the unique solution for \(s^{*}\).
To find how \(s^{*}\) changes with h, note that the right hand side of the above decreases in h for \(\sigma <1\), and increases in h for \(\sigma >1\). Putting these facts together: for \(\sigma <1\), when h increases the RHS of (19) decreases, hence the LHS decreases and \(s^{*}\) increases, i.e. \(s^{*}\) is increasing in h. For \(\sigma >1\), \(s^{*}\) is decreasing in h.
To find how \(N_{2}^{*}\) changes with h, we differentiate (18):
$$\begin{aligned} \frac{dN_{2}^{*}}{dh}&=-\frac{1}{b}\left( \frac{b}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }(s^{*}h)^{(1-2\sigma )/\sigma }(s^{*}+h\frac{ds^{*}}{dh}) \end{aligned}$$
(21)
which is negative for \(\sigma <1\), since \(\frac{ds^{*}}{dh}>0\) in this case.
Differentiating again:
$$\begin{aligned} \frac{d^{2}N_{2}}{dh^{2}}&=-X \left[ \frac{1-2\sigma }{\sigma }(s^{*}h)^{(1-3\sigma )/\sigma }(s^{*}+h\frac{ds^{*}}{dh})^{2}\right. \\&\quad \left. +(s^{*}h)^{(1-2\sigma )/\sigma }(2\frac{ds^{*}}{dh}+h\frac{d^{2}s^{*}}{dh^{2}})\right] \\&=X(s^{*}h)^{(1-3\sigma )/\sigma }\left[ \frac{2\sigma -1}{\sigma }(s^{*}\right. \\&\left. \quad +h\frac{ds^{*}}{dh})^{2}-(s^{*}h)(2\frac{ds^{*}}{dh}+h\frac{d^{2}s^{*}}{dh^{2}})\right] \end{aligned}$$
where \(X=\frac{1}{b}\left( \frac{b}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }>0\). Note that \(\frac{d^{2}N_{2}}{dh^{2}}\) is continuous in \(\sigma\) around \(\sigma =1\). Note also from (19) that for \(\sigma =1\), \(s^{*}\) becomes constant in \(\sigma\). The term in square brackets then reduces to \((s^{*})^{2}>0\). Putting these facts together, for \(\sigma\) sufficiently close to 1, \(\frac{d^{2}N_{2}^{*}}{dh^{2}}>0\), i.e. \(N_{2}^{*}\) is convex in h.
This case holds for intermediate values on h. Equation (21) shows that for \(\sigma < 1\), \(N_2\) decreases in h; the requirement that \(N_2>0\) therefore puts a maximum on h. When \(\sigma >1\), \(N_2\) increases in h and this puts a minimum on h. The requirement \(N_1 = 0\) provides the other bound. Equation (9) requires \(-bY_{1}^{-\sigma }+a \le 0\) since \(\lambda _1\) must be non-negative. The LHS is increasing in \(Y_1\), and hence decreasing in s as \(Y_1 = 1-s\) since \(N_1=0\). Lastly, optimal choice of education \(s^*\) increases in h for \(\sigma <1\), and decreases for \(\sigma > 1\). Hence for \(\sigma < 1\), (9) puts a minimum on h, and for \(\sigma > 1\) it puts a maximum on h.
Summarizing:
Lemma 2
Case 2 holds for intermediate values of h. In case 2: for \(\sigma <1\), \(s^{*}\) is increasing in h and \(N_{2}^{*}\) is decreasing in h. For \(\sigma >1\), \(s^{*}\)is decreasing in h. For \(\sigma\) close enough to 1, \(N_{2}^{*}\) is convex in h.
Case 3: \(N_{1}^{*}=0,N_{2}^{*}=0\)
We can solve for \(s^{*}\) by substituting values of \(Y_{1}\) and \(Y_{2}\) into (11):
$$\begin{aligned} -(1-s^{*}){}^{-\sigma }+h(s^{*}h)^{-\sigma }=0 \end{aligned}$$
which rearranges to
$$\begin{aligned} s^{*}=\frac{1}{1+h^{(\sigma -1)/\sigma }}. \end{aligned}$$
(22)
Conditions (9) and (10) become:
$$\begin{aligned} -b(1-s^{*})^{-\sigma }+a&\le 0\\ -bs^{*}h(s^{*}h)^{-\sigma }+a&\le 0 \end{aligned}$$
equivalently
$$\begin{aligned} \frac{a}{b}&\le (1-s^{*})^{-\sigma }\\ \frac{a}{b}&\le s^{*}h(s^{*}h)^{-\sigma } \end{aligned}$$
which can both be satisfied for a/b close enough to zero. Note from (22) that as \(h\rightarrow \infty\), \(s^{*}\) increases towards 1 for \(\sigma <1\), and decreases towards 0 for \(\sigma >1\). Note also that the right hand side of the first inequality above approaches infinity as \(s^{*}\rightarrow 1\), therefore also as \(h\rightarrow \infty\) for \(\sigma <1\). Rewrite the second inequality as
$$\begin{aligned} \frac{a}{b}<(s^{*}h)^{1-\sigma }=\left( \frac{h}{1+h^{(\sigma -1)/\sigma }}\right) ^{1-\sigma }=\left( h^{-1}+h^{-1/\sigma }\right) ^{\sigma -1} \end{aligned}$$
and note that again, as \(h\rightarrow \infty\), the RHS increases towards infinity for \(\sigma <1\), and decreases towards zero otherwise. Thus, for \(\sigma <1\), both equations will be satisfied for h high enough. For \(\sigma >1\), they will be satisfied for h low enough. Summarizing
Lemma 3
For \(\sigma <1\), case 3 holds for h high enough, and in case 3, \(s^{*}\) increases in h. For \(\sigma >1\), case 3 holds for h low enough and \(s^{*}\) decreases in h.
Case 4: \(N_{1}^{*}>0,N_{2}^{*}=0\)
Rearranging the first order conditions (9) and (10) for \(N_{1}^{*}\) and \(N_{2}^{*}\) gives
$$\begin{aligned} \frac{a}{b}&=(1-s^{*}-bN_{1}^{*})^{-\sigma }\\ \frac{a}{b}&\le s^{*}hY_{2}^{-\sigma } \end{aligned}$$
hence
$$\begin{aligned} (1-s^{*}-bN_{1}^{*})^{-\sigma }&\le s^{*}hY_{2}^{-\sigma }=(s^{*}h)^{1-\sigma }\\ \Leftrightarrow (1-s^{*}-bN_{1}^{*})^{\sigma }&\ge (s^{*}h)^{\sigma -1}\\ \Leftrightarrow 1-s^{*}-bN_{1}^{*}&\ge (s^{*}h)^{1-1/\sigma } \end{aligned}$$
Now rearrange the first order condition for \(s^{*}\) from (11), noting that since \(N_{2}^{*}=0\), \(s^{*}>0\) by the Inada condition.
$$\begin{aligned} h^{1/\sigma -1}(1-s^{*}-bN_{1}^{*})&=s^{*}\\ 1-s^{*}-bN_{1}^{*}&=s^{*}h^{1-1/\sigma } \end{aligned}$$
This, combined with the previous inequality, implies
$$\begin{aligned} (s^{*}h)^{1-1/\sigma }&\le s^{*}h^{1-1/\sigma }\\ \Leftrightarrow (s^{*})^{-1/\sigma }&\le 1 \end{aligned}$$
which cannot hold since \(0<s^{*}<1\).
Comparative Statics
We can now examine how the fertility-human capital relationship
$$\begin{aligned} \frac{dN^{*}}{dh},\text { where }N^{*}\equiv N_{1}^{*}+N_{2}^{*}, \end{aligned}$$
changes with respect to other parameters. We focus on the case \(\sigma <1\), since it gives the closest match to our observations, and since it also generates “reasonable” predictions in other areas, e.g. that education levels increase with human capital. Figure 23 shows how \(N^{*}\) changes with h for \(a=0.4,b=0.25,\sigma =0.7\).
Lemma 4
For \(\sigma < 1\) in a neighbourhood of 1, \(N^*\) is globally convex in h.
Proof
From Lemmas 1, 2 and 3, as h increases we move from \(N_{1}^{*},N_{2}^{*}>0\) to \(N_{1}^{*}=0,N_{2}^{*}>0\) to \(N_{1}^{*}=N_{2}^{*}=0\). Furthermore, for \(\sigma >2/3\), \(N_{1}^{*}\) and \(N_{2}^{*}\) are convex in h when they are both positive, and for \(\sigma\) close enough to 1, \(N_{2}^{*}\) is convex in h when \(N_{1}^{*}=0\). All that remains is to check that the derivative is increasing around the points where these 3 regions meet. That is trivially satisfied where \(N_{2}^{*}\) becomes 0, since thereafter \(\frac{dN^{*}}{dh}\) is zero. The derivative as \(N_{1}^{*}\) approaches zero is twice the expression in (17):
$$\begin{aligned} -\frac{2}{b}\left( \frac{b}{a}\right) ^{1/(2\sigma -1)}\frac{1-\sigma }{2\sigma -1}h^{(1-\sigma )/(2\sigma -1)-1} \end{aligned}$$
(23)
and the derivative to the right of this point is given by (21):
$$\begin{aligned} -\frac{1}{b}\left( \frac{b}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }(s^{*}h)^{(1-2\sigma )/\sigma }(s^{*}+h\frac{ds^{*}}{dh}) \end{aligned}$$
(24)
We want to prove that the former is larger in magnitude (i.e. more negative). Dividing (23) by (24) gives
$$\begin{aligned} 2\frac{\sigma }{2\sigma -1}\left( \frac{b}{a}\right) ^{(1-\sigma )/(\sigma (2\sigma -1))}\frac{h^{(1-\sigma )^{2}/(\sigma (2\sigma -1))}}{s^{*}(s^{*}+h\frac{ds^{*}}{dh})} \end{aligned}$$
Examining (19) shows that as \(\sigma \rightarrow 1\), \(s^{*}\rightarrow 0.5\) and \(\frac{ds^{*}}{dh}\rightarrow 0\), and therefore the above approaches
$$\begin{aligned} 2\frac{1}{(0.5)^{2}}=8. \end{aligned}$$
\(\square\)
We can now gather the theoretical predictions stated in Table 1.
Prediction 1 for \(\sigma <1\), total fertility \(N^{*}\equiv N_{1}^{*}+N_{2}^{*}\) is decreasing in human capital h.
Furthermore, for \(\sigma\) close enough to 1, fertility is convex in human capital, i.e.
Prediction 2 part 1 the fertility-human capital relationship is closer to 0 at high levels of h.
For \(\sigma <1\), education levels \(s^{*}\) increase in h, and so therefore do equilibrium wages \(w(s^{*},h)\). This, plus fact 1, gives:
Prediction 2 part 2 for \(\sigma <1\) and close to 1, the fertility-human capital relationship is weaker among higher earners.
Prediction 4 for \(\sigma <1\) and close to 1, the fertility-human capital relationship is weaker at high levels of education.
Next, we compare people who start fertility early (\(N_{1}^{*}>0\)) versus those who start fertility late (\(N_{1}^{*}=0\)). Again, for \(\sigma <1\) the former group have lower h than the latter group. Thus we have:
Prediction 5 for \(\sigma <1\) and close to 1, the fertility-human capital relationship is weaker among those who start fertility late.
Lastly, we prove prediction 3. Differentiating \(dN_{t}^{*}/dh\) in (17) with respect to b, for when \(N_{1}^{*}>0\) gives:
$$\begin{aligned}&\frac{d^{2}N_{t}^{*}}{dhdb} =\frac{2\sigma -2}{2\sigma -1}b^{(3-4\sigma )/(2\sigma -1)}\\&\quad \left( \frac{1}{a}\right) ^{1/(2\sigma -1)}\frac{1-\sigma }{2\sigma -1}h^{(\sigma -1)^{2}/(\sigma (2\sigma -1))} \end{aligned}$$
which is negative for \(0.5<\sigma <1\). When \(N_{1}^{*}=0\), differentiating \(dN_{2}^{*}/dh\) in (21) gives:
$$\begin{aligned}&\frac{d^{2}N_{2}^{*}}{dhdb}=-\frac{1-\sigma }{\sigma }b^{(1-2\sigma )/\sigma }\\&\quad \left( \frac{1}{a}\right) ^{1/\sigma }\frac{1-\sigma }{\sigma }(s^{*}h)^{(1-2\sigma )/\sigma }(s^{*}+h\frac{ds^{*}}{dh}) \end{aligned}$$
which again is negative for \(\sigma <1\). Therefore:
Prediction 3 for \(\sigma <1\), the fertility-human capital relationship is more negative when the burden of childcare b is larger.
Including a Money Cost
The model can be extended by adding a money cost m per child. Utility is then
$$\begin{aligned} U = u(1 - s - bN_1 - mN_1) + u(w(s,h)(1 - bN_2) - mN_2) + a(N_1 + N_2) \end{aligned}$$
Figure 24 shows a computed example with \(a = 0.4, b = 0.175, \sigma = 0.7, m = 0.075\). Fertility first declines steeply with human capital, then rises. In addition, for parents with low AFLB (\(N_1 > 0\)), the fertility-human capital relationship is negative, while for parents with higher AFLB (\(N_1 = 0\)) it is positive.
Concavity
Lemma 5
For \(\sigma >0.5\), U in equation (2) is concave in \(N_{1},N_{2}\) and s.
Proof
We examine the Hessian matrix of utility in each period. Note that period 1 utility is constant in \(N_{2}\) and period 2 utility is constant in \(N_{1}\). For period 1 the Hessian with respect to \(N_{1}\) and s is:
$$\begin{aligned}&\left[ \begin{array}{cc} d^{2}u/dN_{1}^{2} &{} d^{2}u/dsdN_{1}\\ d^{2}u/dsdN_{1} &{} d^{2}u/ds^{2} \end{array}\right] \\&\quad =\left[ \begin{array}{cc} -\sigma b^{2} &{} -\sigma b\\ -\sigma b &{} -\sigma \end{array}\right] Y_{1}^{-\sigma -1} \end{aligned}$$
with determinant
$$\begin{aligned} (\sigma ^{2}b^{2}-\sigma ^{2}b^{2})Y_{1}^{-2\sigma -2}=0. \end{aligned}$$
Thus, first period utility is weakly concave. For period 2 with respect to \(N_{2}\) and s, the Hessian is:
$$\begin{aligned}&\left[ \begin{array}{cc} d^{2}u/dN_{2}^{2} &{} d^{2}u/dsdN_{2}\\ d^{2}u/dsdN_{2} &{} d^{2}u/ds^{2}dN_{2} \end{array}\right] \\&\quad =\left[ \begin{array}{cc} -\sigma (bsh)^{2}Y_{2}^{-\sigma -1} &{} -(1-\sigma )bhY_{2}^{-\sigma }\\ -(1-\sigma )bhY_{2}^{-\sigma } &{} -\sigma [h(1-bN_{2}^{*})]^{2}Y_{2}^{-\sigma -1} \end{array}\right] \end{aligned}$$
with determinant
$$\begin{aligned}&(-\sigma (bsh)^{2}Y_{2}^{-\sigma -1})(-\sigma [h(1-bN_{2}^{*})]^{2}Y_{2}^{-\sigma -1})-(-(1-\sigma )bhY_{2}^{-\sigma })^{2}\\&\quad = \sigma ^{2}(bsh)^{2}[h(1-bN_{2}^{*})]^{2}Y_{2}^{-2\sigma -2}-(1-\sigma )^{2}(bh)^{2}Y_{2}^{-2\sigma }\\&\quad = \sigma ^{2}(bh)^{2}Y_{2}^{-2\sigma }-(1-\sigma )^{2}(bh)^{2}Y_{2}^{-2\sigma }\text {, using that }Y_{2}=(sh)(1-bN)\\&\quad = (bh)^{2}Y_{2}^{-2\sigma }(\sigma ^{2}-(1-\sigma )^{2}) \end{aligned}$$
which is positive if and only if \(\sigma >0.5\). Thus, if \(\sigma >0.5\) then the Hessian is negative definite and thus utility is concave; this combined with weak concavity of period 1, and linearity of \(a(N_{1}+N_{2})\), shows that (2) is concave. \(\square\)
Effect of a
Lemma 6
For \(\sigma <1\), \(d^2N^*/dadh > 0\), i.e. the effect of a increases at higher levels of human capital.
Proof
Differentiating (17) with respect to a gives
$$\begin{aligned} \frac{d^2N^*_t}{dadh} = -\frac{1}{b}\frac{1}{1-2\sigma }a^{-1}\left( \frac{a}{b}\right) ^{1/(1-2\sigma )}\frac{1-\sigma }{2\sigma -1}h^{(1-\sigma )/(2\sigma -1)-1} \end{aligned}$$
for \(t=1,2\) when \(N^*_1,N^*_2>0\). For \(\sigma > 0.5\) this is positive.
When \(N^*_1 = 0,N^*_2>0\), differentiating (21) with respect to a gives
$$\begin{aligned} \frac{d^2N_{2}^{*}}{dadh} = \frac{1}{b}\frac{1}{\sigma }a^{-1}\left( \frac{a}{b}\right) ^{-1/\sigma }\frac{1-\sigma }{\sigma }(s^{*}h)^{(1-2\sigma )/\sigma }(s^{*}+h\frac{ds^{*}}{dh}) \end{aligned}$$
which is \(-a^{-1}/\sigma\) times (21) and hence is positive.
\(\square\)