1 Introduction

Whether and to what extent changes over time in fertility are mostly associated with period or cohort effects has been a central point of discussion in the demographic literature for decades (e.g.,Bhrolchain 1992; Ryder 1986). For instance, in their 1982 classic review, Hobcraft, Menken and Preston stated that “In one way or another, demography has concerned itself with the measurement of age, period, and cohort effects for well over a century” (Hobcraft et al. 1982). However, as of today there is not yet a commonly-agreed method to decompose period and cohort effects from the underlying age-specific fertility data that constitute the typical point of departure for macro-level fertility analysis.

The baby boom in the United States, a unique phenomenon for its magnitude and significance, has attracted several discussions on whether its main drivers where at the period or at the cohort level (e.g.,Butz and Ward 1979a; Pullum 1980; Healy 2018). In this paper, we present a novel Age-Period-Cohort (APC) analysis on this key fertility phenomenon, using U.S. age-specific fertility rates (ASFRs) between 1933 and 2015. APC analysis aims at breaking down a phenomenon of interest into constituent effects associated with the three typical time scales of demography: age, period (or calendar year), and cohort. As it is well-known, the principal challenge of APC models consists in simultaneously identifying the distinct effect of each of the three time scales. This identification carries relevant implications for the interpretation of past trends, as well as for forecasting the future (Myrskyla et al. 2013; Ryder 1990; Smith 2020).

In what follows, we analyze U.S. fertility data using what we see as a realistic approach, which does not rely on specific, and untestable, identifying assumptions on the separation between age, period, and cohort effects (Kuang et al. 2008; Nielsen and Nielsen 2014). By relying on second differences, this approach is also related to the earlier study of Pullum (1980) on U.S. fertility.

The main contributions of our study are twofolds. First, we contribute to the methodological literature on the analysis of age-specific demographic rates by outlining and using a novel approach, which we apply to what one can now define as standard demographic source, the Human Fertility Database. These analyses can be extended to other country or period settings. Second, we contribute to the literature on the interpretation of the U.S. baby boom, by demonstrating that both period and cohort changes have contributed to fertility change, with period, however, being relatively more important than cohort. Our analyses also document the different time scales at which period and cohort effects have an impact on fertility. While period contributes to the explanation of long-term fertility trends through both temporal discontinuities and short-term effects Foster (1990), and their cumulation, cohort contributes mostly through the cumulation of effects over time. Given our focus on non-linear effects, it is plausible that our approach downplays the role of cohorts in shaping fertility, which is more likely captured by linear trends (differently from age and period effects). This problem, however, is intrinsically linked to the basic APC problem of not being able to identify separately linear period and cohort trends Smith (2020). The remainder of this paper is structured as follows. In Sect. 2, we briefly review recent developments in APC analysis. In Sect. 3.1 we introduce and describe the data we study. In Sect. 3.2 we define the model we apply in our analyses. The results of APC analyses are presented in Sect. 4. Section 5 concludes and discusses.

2 The challenges of age-period-cohort analysis

In this section, we assume to start from data on age-specific fertility rates (ASFRs) for a number of consecutive years, such as those available through the Human Fertility Database for a number of countries. Almost all ideas, however, can be extended to any age-specific demographic rate setting. In order to carry Age-Period-Cohort (APC) analyses, we refer to the following standard model, used to separate the theoretical effects of age, period, and cohort:

$$\begin{aligned} g(E(Y_{ij}))=\mu _{ij}=\alpha _i +\beta _j+\gamma _{k} + \delta . \end{aligned}$$
(1)

In Eq. (1), we define \(i=1,\ldots ,I\) ages, \(j=1,\ldots ,J\) periods, and \(k=1,\ldots , I+J-1\) cohorts. \(E(Y_{ij})\) is the expected value of the ASFR for the i-th age-group, the j-th period and g is a suitable link function. If we model directly ASFRs, g is the identity function. \(\alpha _i\) denotes the mean difference from the overall mean \(\mu\) associated with the i-th age category, \(\beta _j\) is the deviation from the overall mean associated with the j-th time period and \(\gamma _k\) the deviation associated with the k-th cohort. The usual ANOVA constraint applies so that the sum of the coefficients of each effect is equal to 0. As it is well known, model (1) suffers from a lack of identifiability, due to the linear dependency among the three components. When two out of the three components are known, the value of the third is also determined (Cohort \(=\) Period −Age). This a special case of collinear regressors producing a singular design matrix. Indeed, if we stack the time effects in a unique vector \(\theta =(\alpha _1, \ldots ,\alpha _I, \beta _1, \ldots , \beta _J, \gamma _1,\ldots ,\gamma _K,\delta )^{'},\) for a suitable choice of design matrix \(\varvec{X},\) model (1) can be written as a regression model \(g(\varvec{\mu })=\varvec{X}\theta\). In this case, the model’s design matrix \(\varvec{X}\) is rank deficient, due to the linear dependency of the entries related to the cohort on the entries related to age and period effects. As a consequence, there are infinite solutions to the equation, so that no valid estimate of the distinct effects of the three time dimensions can be identified. Observed trends therefore cannot be uniquely assigned to age, period, and cohort.

Several solutions have been proposed to this identification problem—see among others Smith and Wakefield (2016) for a detailed review. In what follows, we shortly refer to some of these proposals.

Mason et al. (1973) introduced the Constrained Generalized Linear Model (CGLM). The idea of the CGLM is to introduce at least one identification coefficient constraint. In general, the constraints are equality ones. For example, the effects of the first two age groups, periods or cohorts are constrained to be equal based on theoretical or external information (Mason et al. 1973). Clayton and Schifflers (1987a) restrict the first period effects differences to be on average equal to zero. Bayesian methods based on ICAR priors include sum-to-zero constraints and penalties on the linear trends in age, period and cohort effects, see Berzuini and Clayton (1994), Besag et al. (1995), Knorr-Held and Rainer (2001). An approach of this kind has essentially two problems. First, the assumptions underlying the imposed constraints require strong prior information and cannot in general be empirically validated. Second, the results of the analyses strongly rely on the imposed constraints: different choices of identifying constraints can lead to different estimates for the age, period or cohort effects (Rodgers 1982; Holford 1991). A large literature in demography, epidemiology and statistics has discussed the sensitivity of the APC model to the assumptions specification (see Mason and Wolfinger 2001).

A further approach starts from nonlinear parametric functions for one of the components, so to break the linear dependency among them. The idea is to treat the time effects as continuous variables and model their effects by means of polynomial functions. Carstensen (2007) suggested to model the age, period and cohort effects by means of penalized splines while imposing again a set of constraints for identifiability. Unfortunately, a solution of this kind does not completely solve the identification problem, since, as emphasized by Fienberg and Mason (1979), the linear components of such models remain unspecified. As pragmatically argued by Smith (2020) following a broad and critical reflection of the wide literature on the topic, it is clear that identification in Age-Period-Cohort analysis, always entails some constraints in the linear terms.

In biostatistics, demography and epidemiology, alternative approaches have been suggested. Robertson and Boyle (1986) suggest to rely on individual records to construct a three-way APC table, but this approach is again based on assumptions. In particular Clayton and Schifflers (1987) and Clayton and Schifflers (1987a), Holford (1983) and Tarone and Chu (1992) suggest to express the model only in terms of the functions of the time effect parameters that are estimable as linear combinations of curvature effects and some specific functions of the time effects slopes. The limits of a hierarchical-model approach to APC modelling have been documented, among others, in the simulation study of Bell and Jones (2014).

Fu (2000), Knight and Fu (2000) and Fu et al. (2004) and Yang and Land (2004) followed the so-called “estimable function” approach, to derive a new APC estimator called the Intrinsic Estimator (IE). The IE is based on the singular value decomposition of matrices. Yang et al. (2008) provide a comparison of the intrinsic estimator approach to conventional solutions to the identification problem. The IE estimator is presented as providing robust estimates of the time effects trend and uniquely determining the coefficient estimates. However, Luo (2013) shows, through simulation studies, that the Intrisic Estimator is as well based on constraints that are difficult to verify in practice, constraints on which its statistical properties rely.

If one is willing to cast hypotheses on the mechanisms by which age, period, and cohort affect a specific variable, Winship and Harding (2008) proposed a mechanism-based approach that draws on causal modelling. This approach relies, however, on having data on the variables measuring these mechanism over the same scale of time.

After discussing the data we start from, we introduce our realistic approach that only focuses on vital rates as input.

3 Data and methods

3.1 Data

We analyze the set of age-specific fertility rates available from the Human Fertility DatabaseFootnote 1 for the United States. The Human Fertility Database (HFD) is a joint project of the Max Planck Institute for Demographic Research (MPIDR) in Rostock, Germany and the Vienna Institute of Demography (VID) in Vienna, Austria, based at MPIDR. The HFD provides, for instance, data for U.S. ASFRs between 1933 and 2015, for ages between 15 and 44.

We separately conduct analyses of a) general ASFRs; b) first birth ASFRs (parity 1); c) second and higher parity ASFRs. These rates are visualized respectively in Figs. 1,  2, and  3. These plots are broadly consistent with well-known studies of the baby boom and bust (Butz and Ward 1979; Bongaarts and Feeney 1998; Pullum 1980), as well with the more recent visualization of Healy (2018). The differences between the three plots, and in particular Fig. 2 on first births also outline the need to analyze ASFRs of Parity 1 separately from other parities. This is in line with other analyses of the U.S. baby boom (Bongaarts and Feeney 1998).

Fig. 1
figure 1

U.S. age-specific fertility rates (ASFRs) by age and period, all parities, 1933–2015

Fig. 2
figure 2

U.S. age-specific fertility rates (ASFRs) by age and period, parity 1, 1933–2015

Fig. 3
figure 3

U.S. age-specific fertility rates (ASFRs) by age and period, parities 2 and higher, 1933–2015

3.2 Methods

In our analyses, we use the APC approach proposed by Kuang et al. (2008), which has been discussed and applied to the analysis of mortality data in Nielsen and Nielsen (2014), and in Martinez Miranda et al. (2015). Kuang et al. (2008) abandon the standard APC model described in Eq. (1) and suggest to overcome the identification problem by modeling the time effects in terms of parameters that are identifiable. The model, that below is described in detail, is parametrized in terms of a freely varying vector given by three initial points, functions of the unidentified first differences, and the full set of second differences of the time effects. Indeed, Clayton and Schifflers (1987a) show that ratios of relative risks are identifiable and these in a logarithmic scale are the second differences. Pullum (1980) provides an early study of first differences on U.S. ASFRs of all parities.

The statistical model we rely upon is defined as in Kuang et al. (2008) and Nielsen and Nielsen (2014) with respect to an age-cohort coordinate system, referred to as a general trapezoid. The general trapezoid unifies the three Lexis-diagrams, that are the standard tabular formats for the storage of vital rates: tables by age and period, age and cohort and period and cohort. A general trapezoid is defined as an index set \(\mathcal {I}\) such that

$$\begin{aligned} \mathcal{{I}}_{ac} =\{(i,k): i=1 \ldots , I, 1\le k \le K, L+1\le j \le L+ J\} \end{aligned}$$

with IJ and K being the numbers of age, period and cohort indexes, and \(L+1\) the lowest period index. Let \(Y_{ik}\) be the vital rate value (in our case, the ASFR) for the i-th age group and the k-th cohort with ik belonging to the trapezoid \(\mathcal{{I}}_{ac}\), and let \(E(\mu _{ik})\) be the rate expected value. The APC model is expressed in terms of the parameter \(\xi\) defined as follows

$$\begin{aligned} \xi= & {} (\mu _{U,U},\mu _{U+1,U}, \mu _{U,U+1}, \Delta ^2\alpha _3,\ldots ,\Delta ^2\alpha _I, \nonumber \\{} & {} \Delta ^2\beta _L+2,\ldots , \Delta ^2\beta _L+J,\Delta ^2\gamma _{3},\ldots , \Delta ^2\gamma _{K})\ \end{aligned}$$
(2)

where \(U=\text{ integer }\{(L+3)/2\}\) identifies the middle of the trapezoid first diagonal of odd length, \(\Delta\) is the first difference operator, i.e \(\Delta \alpha _i= \alpha _i-\alpha _{i-1}\) and \(\Delta ^2\) is the second difference operator, \(\Delta ^2 \alpha _i= \Delta \alpha _i- \Delta \alpha _{i-1}=\alpha _i -2\alpha _{i-1} +\alpha _{i-2}\). Note that \(\xi\) has four fewer components than \(\theta\). Indeed, the identification problem of the conventional APC model (1) arises from its over-parametrization.

The linear predictor in (1) is then specified in terms of the parameter \(\xi\) as follows

$$\begin{aligned} \mu _{ik} = \mu _{U,U}+(i-U)(\mu _{U+1,U}-\mu _{U,U})+(k-U)(\mu _{U,U+1}-\mu _{U,U})+ A_i +B_j +C_k \end{aligned}$$
(3)

where

$$\begin{aligned} A_i= & {} I_{(i<U)}\sum _{t=i+2}^{U+1}\sum _{s=t}^{U+1}\Delta ^2 \alpha _s + I_{(i>U+1)}\sum _{t=U+2}^{i}\sum _{s=U+2}^{t}\Delta ^2 \alpha _s \\ B_j= & {} I_{(L \,{\textrm{odd}} \& j=2U-2)} \Delta ^2 \beta _{2U}+ I_{(j>2U)}\sum _{t=2U+1}^{j}\sum _{s=2U+1}^{t}\Delta ^2 \beta _{s}\\ C_k= & {} I_{(k<U)}\sum _{t=k+2}^{U+1}\sum _{s=t}^{U+1}\Delta ^2 \gamma _s + I_{(k>U+1)}\sum _{t=U+2}^{k}\sum _{s=U+2}^{t}\Delta ^2 \gamma _s \end{aligned}$$

From (3), the predictor has a single level expressed by \(\mu _{U,U}\) that satisfies \(\mu _{U,U}=\alpha _{U} +\beta _{U}+\gamma _{U}+\delta\) so that \(\mu _{U,U}\) is identifiable, but the single levels \(\alpha _U\), \(\beta _U\), \(\gamma _U\) and \(\delta\) are not identifiable from the model. The model foresees as well two linear trends, expressed with slopes defined as one-step slopes in age and cohort \((\mu _{U+1,U}-\mu _{U,U})\) and \((\mu _{U,U+1}-\mu _{U,U})\) and can be expressed also as \(\mu _{U+1,U}-\mu _{U,U}=\Delta \alpha _{U}+\Delta \beta _{U}\) and \(\mu _{U,U+1}-\mu _{U,U}= \Delta \beta _{U}+\Delta \gamma _{U}\). Such trends are estimable but the individual slopes \(\Delta \alpha _{U}, \Delta \beta _{U}, \Delta \gamma _{U}\) are not identifiable. Note that the choice of the three initial points \(\mu _{U,U},\mu _{U-1,U}, \mu _{U+1,U}\) is not unique, any three points can be chosen as long as their indexes are not linear dependent. Here, we follow Nielsen and Nielsen (2014) and the level is anchored at the middle of the first diagonal of odd length so to derive a model that is symmetric in age and cohort. Finally, the contribution of the time components age, cohort and period is modeled in terms of the cumulative sums of the double differences with respect to age \(A_i,\) period \(B_j\) and cohort \(C_k\), respectively, that are identifiable.

Kuang et al. (2008) show that \(\xi\) is a function of \(\theta\), so that the predictor defined in (3) is a function of the time effects. The authors also show that the predictor is invariant with respect to any translation and addition of linear trend to the time effects, this implying \(\xi\) to be identifiable. Furthermore, when the link function g is set equal to the canonical link, such results make it possible to draw on the exponential family theory. The statistical model is from the exponential family, and since the predictor is linear in \(\xi\) that is freely variant, the model is regular with \(\xi\) as the canonical parameter.

The predictor in (3) can be written in linear form as \(\varvec{X} \xi\), for a suitable specification of the design matrix \(\varvec{X}\), so that the model turns out to be a generalized linear model, and maximum likelihood estimates of the parameter vector \(\xi\) can be obtained through any statistical software. In addition, it can be easily seen that two-factor models (AP, or age-period; AC, or age-cohort; PC, or period-cohort), as well as one-factor models (A, P, or C), or others such as Clayton and Schifflers’s drift Clayton and Schifflers (1987) are nested into (3). Model selection criteria such as the Akaike Information Criterion (AIC, (?), or following standard likelihood theory, Likelihood Ratio tests, can be used to compare the APC model in (3) with any of such nested models.

A specific R Package, apc, has been written to estimate models within this APC approach (Nielsen 2015). Our analyses rely on the apc package.

4 Results

4.1 Model selection

In our main analyses, we adopt a canonical identity link, so the model is a Gaussian Linear regression model with the predictor specified in (3). Our first results come from the analysis of model selection, a deviance table that makes it possible to compare the complete APC model with the nested models, including only one or two of the three time components (age, period, and cohort). Models can be compared using Likelihood Ratio tests, or the Akaike Information Criterion (AIC). This first analysis allows us to detect whether all components are needed for the analysis of U.S. fertility rates, i.e. whether period and cohort are both needed (in addition to age), and also to investigate on the relative importance by assessing the lower fit when either period or cohort are needed.

Table 1 displays the deviance table for U.S. all-parity ASFRs. From the likelihood ratio tests, the complete model with all three components needs to be selected. The APC model is therefore the preferred one, on the basis of a comparison of the AIC values, since it strikes the lowest value (\(-\)12867.39). Consequently, the first main result is that both period and cohort components, in addition to age, are needed to model long-term trends in U.S. ASFRs. A second reading of this table aims at assessing which of the two components, period or cohort, is worth keeping when deciding to go for a two factors model. Here we compare the AP (age-period) and the AC (age-cohort) model, the AP model having the lowest AIC (\(-\)12133.33 vs. -11177-77).

Table 2 displays the same deviance table on first birth ASFRs. Results are similar to what is obtained for all-parity ASFRs, with the preferred model including all three components (APC, AIC=\(-\)16255.01), and the second-best model including age and period (AIC=\(-\)15632.43). Similar conclusions can be reached when the deviance table of parities 2 and over is considered (Table 3).

Therefore, according to this reading, while all three components matter, in addition to age, period is relatively more relevant than cohort in the analysis of U.S. fertility data on top of linear trends. This holds both when all parities are considered and when only the first parity is considered. This general result is consistent with the early analysis on all parities by Pullum (Pullum 1980), who concluded that “the cohort identification was less important than the period identification” (p. 241). Yet, given the focus on second differences, our analyses do not rule out the idea that linear trends in fertility are mostly attributable to cohorts rather than to periods, as biological and social effects rule out a linear age effect idea, and theoretical interpretation of period effects rule are emphasizing non-linearities and discontinuities Smith (2020).

Table 1 Deviance table for models on all parities
Table 2 Deviance table for models on parity 1
Table 3 Deviance table for models on all parities but parity 1

We can assess, the goodness of fit of the selected APC model based on the graphical visualizations that are provided by the R package, including plots of the residuals, of fitted values and of the linear predictor. In particular, we focus here on a “Probability Transform Plot”, displayed in Fig. 4 for all parities. The plot returns for each age and period, the probability to observe the actual age-specific observed rates. In this way, the plot reveals where there are extreme values given the fit, and if they identify a specific pattern. As we can see, the great majority of all points are in black, the color corresponding to observed responses value falling in the 80% central part of the fitted distribution. Green points and red triangles flagging responses falling in the 1% tail of the fitted distribution are rare, and they mainly represent lower ages, where data are sparse. More specifically, there is a cluster of flagged points associated with age between 20 and 25 and period between 1955 and 1962, signaling the specific nature of this phase of the baby boom concerning lower ages at childbearing. In A the Probability Transform Plots for the two other cases, parity 1 and parity 2+ are reported, and same conclusions can be drawn.

Fig. 4
figure 4

Probability Transform Plots - Gaussian Model - All Parities: central 80% in black, tails (10%, 5% and 1%) in green, blue and red

We acknowledge that modelling the age-specific rates with an identity link might not be the only choice, especially in a forecasting perspective. As an alternative, we estimated a Poisson regression model on the observed birth counts. Also in this case, based on the deviance tables for all considered cases, the APC model turned out to be the best one. Figure 5 displays the probability transform plot for the Poisson regression model, again for the case of all parities. As we can see almost all points are red, so that we can conclude that the Gaussian model performs better, based on such diagnostic. Same conclusions can be drawn for parity 1 and parity 2+, as we can from the plots reported in A

Fig. 5
figure 5

Probability Transform Plots -Poisson Model- All Parities: central 80% in black, tails (10%, 5% and 1%) in green, blue and red

We completed the evaluation of the goodness of fit of the Gaussian regression model, with the analysis of studentized residuals and leverage measures. Such measures cannot be directly obtained through the standard R package apc. However, it is possible to extract the design matrix and use it to fit a standard glm model. In all cases, more than 95% of studentized residuals fall in the interval \([-2,2]\) and more than 95% of leverage measures are below the corresponding cut-off, as we can see from the tables provided in B

4.2 Analysis of the APC model

We now consider the results of APC models. We rely on the graphical representation of results suggested and discussed in Nielsen (2014). This representation makes it possible to disentangle the contribution of the time effects in terms of deviations above and below arbitrary linear trends. Results are displayed respectively in Fig. 6 for all parities, and in Fig. 7 for first births. Results on parities 2 and over are available upon request.

Each figure should be read as horizontally divided into two panels. The first panel displays the estimates of the second differences for each time component (age, period, and cohort). The second panel displays the detrended estimates of the double sums, that corresponds to the terms \(A_i\), \(B_j\) and \(C_k\) in (3) for the age, period and cohort effects respectively and constrained to start and end at 0. Such panel displays the effect of time components, in terms of deviations from the linear trend due to each of them. Note that the red and the blue dotted lines need to be read as 10% and 5% significance bands, respectively. Any value above or below the two lines should be considered as statistically significant.

Let us begin by discussing the results of the analysis for all parities, and focus on the period and cohort interpretation. As shown in the first panel of Fig. 6 box (b), for what concerns period, a clear discontinuity emerges after the end of World War II. Indeed, the second differences at 1946 (involving data from 1946, 1945, and 1944) and 1948 (involving 1948, 1947, and 1946) are the only significant ones, with opposite values 0.017 and \(-0.016\), respectively. This denotes a statistically significant period acceleration in the U.S. ASFRs to be located around 1946, followed by a deceleration in 1948. Considering the second panel, in cumulative terms, the baby boom and the baby bust are clearly visible. Box (e) displays a significant deviation above the trend between 1940 and 1960, with a peak reached around 1960, and a significant deviation below the trend, between 1960 and 2000. For what concerns cohort, while there are no significant discontinuities (see box (c)), the highest “deviant” fertility is for cohorts born around 1940 s, with peaks of low fertility for cohorts around 1920s. Figure 7 displays the analysis on first births only. Despite the qualitative differences arising from the visualization of two panels, APC model results are broadly in line with those on all parities.

Fig. 6
figure 6

USA: Plots of the results of the APC model fit on all parities

Fig. 7
figure 7

USA: Plots of the results of the APC model fit on parity 1

5 Conclusions

In this article, we analyzed age-specific fertility data from the Human Fertility Database on the United States between 1933 and 2015. We outlined and used a novel age-period-cohort (APC) approach based on the proposal introduced in Kuang et al. (2008) and Nielsen and Nielsen (2014) to analyze long-term trends in all-parity, parity 1, and parity 2+ age-specific fertility rates.

Our analyses relate to the literature on the prevalence of the role of period vs. cohort in shaping fertility choices (e.g.,Bhrolchain 1992; Ryder 1986). Model selection statistics allow to show that the preferred representations include both period and cohort components (in addition to age). However, in a period vs. cohort tournament on non-linear trends, period prevails, as age-period models represent data better than age-cohort models. This result holds for all-parity, parity 1 and parity 2+ data, and is in line with the earlier results on the U.S. baby boom of Pullum (1980).

Fine-grained analyses of the models allow to explain this result, by pointing to the different time scales at which period effects and cohort effects operate. In particular, period effects can show interpretable discontinuities, as measured by second differences in estimated period effects and represented by what happens immediately after World War II. Cohort effects are more continuous over time.

The approach we adopted can be used to analyze data from standard demographic databases with time series of age-specific vital rates, such as the Human Fertility Database or the Human Mortality Database. Results are easily obtained using the specially-designed and freely accessible apc package (Nielsen 2015).

6 Supplementary information

Scripts and datasets for the replication with R of all paper analyses are provided as Supplementary online information