The first part of this section comprises a short review of the estimation of the intergenerational income elasticity, with a focus on how to handle attenuation bias and life cycle bias. In the second part, I explain the concepts relative and absolute mobility which are used to compare the Swedish local labor markets. A brief introduction to multilevel modelling and the specific model used in this study are given in the last part of this section.
The IGE, attenuation bias, and life cycle bias
Income mobility refers broadly to the extent that (some measure of) child income varies with (some measure of) parent income. The by far most commonly employed mobility measure in the literature is the intergenerational elasticity (IGE). This is typically the slope parameter of a regression of log lifetime income of generation t on log lifetime income of generation t − 1. The closer the IGE is to zero, the more mobile the sample under consideration is said to be. Estimates of the IGE in the literature center around 0.4 with higher estimates for the USA, and usually smaller estimates for the European and especially the Nordic countries (see Björklund and Jäntti 1997; Solon 1992, 1999, 2004; or Mazumder 2005). Recent summaries of economic research in intergenerational mobility are provided by Björklund and Jäntti (2009) and Black and Devereux (2011). Extensions include the study of more than two generations such as Lindahl et al. (2015). One should bear in mind that knowing the intergenerational elasticity does not tell us, for example, how many and which of the children improve or worsen their economic status compared to their parents, i.e. the actual moving patterns of income status between generations. Mobility in this sense can be captured, for example, using transition matrices.
The IGE is typically estimated using the following benchmark equation:
$$ {y_{f}^{C}}=\alpha+\beta {y_{f}^{P}}+{\varepsilon_{f}^{C}} $$
(1)
where β is the parameter of interest, the elasticity between parent and child income, \({y_{f}^{C}}\) and \({y_{f}^{P}}\) are a the log of child and parent lifetime earnings in family f, respectively, and \({\varepsilon _{f}^{C}}\) is assumed to be an iid error term representing all other influences on child earnings not correlated with parental income. I will use the terms income and earnings interchangeably in this section due to the range of different income/earning concepts used in this literature.
What complicates the estimation of the IGE is the need for lifetime income data for the two generations. Approximations made in lack of sufficient data lead to at least two well-known measurement problems: attenuation bias and life cycle bias. Attenuation bias occurs due to measurement error of the regressor, most clearly seen when single year income observations are used to estimate the IGE. This was typical in early studies such as Solon (1992).
Assuming a classic error-in-variable-model, measured income y
f
then equals the true income \(y_{f}^{\star }\), plus an error:
$$ y_{f}=y_{f}^{\star}+\nu_{f}\;. $$
(2)
The known implication (Hausman 2001) is a downward inconsistent IGE estimate. The bias can be reduced using an average of T income observations to approximate the average of true lifetime income:
$$ {y_{f}^{P}}=\frac{1}{T}\underset{t=1}{\overset{T}{\sum}}\left( y_{f,t}^{P\star}+\nu_{f,t}^{P}\right). $$
(3)
Björklund and Jäntti (1997) showed that in this case, the inconsistency is diminishing in the number of observed years T (assuming the measurement errors/transitory fluctuations are not serially correlated). Mazumder (2005) used simulations to show that using a 5-year average (a number of typical magnitude in the literature) to measure father lifetime income still results in a downward bias of around 30%.
I address attenuation bias by averaging over a very large number of annual income observations where T is 17 for most parents in the sample (see Section 3.1 for more details). Importantly, income is observed for all individuals during the same age span, in the middle of their working lives.
Life cycle bias arises when single-year income observations of the child systematically deviate from the average of annual lifetime income (left hand-side measurement error). One can think of a parameter in front of \(y_{t}^{\star }\) in Eq. 2 that is time variable. In this case, the inconsistency of the OLS coefficient varies as a function of the age at which annual income is measured.
Since there are fewer years of income data available for the child generation, I handle life cycle bias by averaging over three income years in the early thirties. During these years, Swedish men have been shown to earn approximately as much as the yearly average over a whole lifetime (Bhuller et al. 2011; Nybom and Stuhler 2016b). However, there exist no similar studies focusing on women. In general, women have been excluded from most studies on intergenerational mobility. One potential reason for this could be their lower labor market participation and greater frequency of work absences related to childbearing.
It seems not too far of a stretch to interpret childbearing in terms of life cycle bias: The income trajectories over the life cycle of women differ systematically depending on having children (the so called “family gap,” see for example, Waldfogel 1998 or Budig and England 2001). In particular, motherhood, as well as the timing of motherhood, has been shown to affect wages, both directly and indirectly through motherhood related choices such as lower labor market participation and working to a larger extent in the public sector (Simonsen and Skipper 2006; Miller 2011).
However, these aspects pose similar problems to the approximation of life time income as those caused, for example, by heterogeneity in schooling decisions. Nybom and Stuhler (2016b) have shown that the shape of earnings over the life cycle for men (and thus the relationship between average life time income and annual incomes) varies systematically with education levels and other background variables. Thus, life cycle bias is presumably a problem for both genders and there is no strong reason to exclude daughters in particular. In addition, the results of this study will be more comparable to Chetty et al. (2014a) who also studied all children, sons and daughters, as one group.
There are two additional problems associated with the IGE measurement. Chetty et al. (2014a) showed for US data that the relationship between log incomes of children and their parents is not well represented by a linear regression model. This point has even been raised by Couch and Lillard (2004) and Bratsberg et al. (2007). One suggested remedy is to use income ranks instead of the log of incomes. A second problem are zero-income observations which have to be dropped or transformed for the analysis in log incomes. Dropping individuals with zero income will overstate mobility if children with zero incomes are over-represented in low income families. Recoding all zeros, on the other hand, leads to highly variable results depending on the replacement values chosen. A detailed analysis of this issue for my data can be found in Appendix A: Ranks versus logged incomes. Income ranks are found to be the preferred choice and are thus used exclusively in the regional analysis.
The relationship between income ranks
Instead of using log incomes, income ranks can be constructed to measure intergenerational income mobility. Importantly, observations with zero income do not need any special treatment here (Dahl and DeLeire 2008). As shown by Nybom and Stuhler (2016a), income ranks for Swedish men are found to be significantly more stable over the life cycle than log incomes, especially when measured above the age of 30. I rank children based on their approximated average lifetime incomes relative to other children in the same birth cohort. Parents are ranked similarly, by income and birth cohort relative to other parents. The ordered income levels are transformed into percentile ranks, i.e., normalized fractional ranks.Footnote 2 The following equation is then estimated by OLS:
$$ {R_{f}^{c}}=\alpha+\beta\,{R_{f}^{p}}+{\varepsilon_{f}^{c}} $$
(4)
where \({R_{f}^{c}}\) and \({R_{f}^{p}}\) are the rank of the child and parents in family f, respectively. The coefficient β (the rank-rank slope) is equal to the correlation coefficient between the ranks since, by construction, the ranks are approximately uniformly distributed. Both the IGE and the rank-rank slope show the persistence of income between parent and child generation. The measures differ conceptually when income inequality is larger in the child generation compared to the parent generation: with growing inequality, moving one rank down will correspond to a larger income loss in absolute terms since the distance between ranks increases.
When estimating rank-rank relationships on the regional level below, the national ranks assigned to each individual remain the same following Chetty et al. (2014a). If we were to use regional ranks instead, i.e., order individuals within each region, we would have a hard time interpreting the results: what does it mean that sons from low-income families in Stockholm reach on average the 38th percentile rank (within Stockholm), while sons from low-income families in Gothenburg reach on average the 35th percentile rank (within Gothenburg)? Is the income level at the 38th percentile within Stockholm higher or lower than the 35th percentile within Gothenburg? Using national ranks, we create a common scale that makes a regional comparison meaningful.Footnote 3
I analyze two mobility measures on the regional level, relative and absolute mobility. Relative mobility is computed according to the following equation:
$$ \bar{R}_{100,r}^{c}-\bar{R}_{0,r}^{c}=100\times\beta_{r} $$
(5)
where \(\bar {R}_{p,r}^{c}\) is the average child rank at percentile p in region r and β
r
is the rank-rank slope parameter from region r. Relative mobility can be viewed simply as a measure of the slope and thus the number of ranks a child on average rises in the income distribution given an increase in the parent income rank. Since all income ranks are distributed between 0 and 100, the scaled rank-rank slope can also be viewed as a measure of maximum outcome inequality in a region. As seen from the left hand side of Eq. 5, relative mobility equals the child rank difference between the child from the two families with highest and lowest parent income, respectively. Higher relative mobility in one region implies a larger spread in child outcomes, given parent incomes.
Relative mobility of 43 in region A, for example, means that the adult long run incomes of all children from that particular region differ by at most 43 ranks. In terms of the slope, we can also say that, compared to a region B where relative mobility is 38, the association between child and parent income is stronger in region A. It is important to keep in mind that both the IGE and relative mobility are relative measures and therefore do not reveal if higher relative mobility, i.e., a lower rank-rank slope, is driven by better outcomes of some poorer families, or solely by worse outcomes of richer families. Therefore, a measure of absolute mobility is necessary to obtain a more comprehensive picture of income mobility.
Absolute mobility is defined as the mean adult rank of children with parents located at a certain percentile p in the parent distribution. It is a prediction based on both the intercept and the slope estimates for the regions. I choose to compare the regions in terms of absolute mobility at percentile 25 in order to learn about the prospects for children from low income families as well as to facilitate comparisons to the US study. Outcomes at other percentiles can easily be constructed using the relative and absolute mobility results in Table 7. Absolute mobility at p = 25 is calculated according to the following formula:
$$\begin{array}{@{}rcl@{}} \bar{R}_{25,r}^{c} =\alpha_{r}+\beta_{r}\times25\:. \end{array} $$
(6)
The left panel in Fig. 1 illustrates relative and absolute mobility. The former is given by the difference in mean child rank (Y-axis) between parents with the highest and lowest income rank (X-axis), alternatively the rank-rank slope multiplied by 100. The latter is measured by the mean child rank given parents at the 25th percentile. The right panel shows three example regions for clarification. Region 1 and region 3 share the same level of relative mobility, i.e., the outcome inequality measured in ranks for children in those regions is the same. However, mobility differs in absolute terms: for every parent percentile, the mean child rank is higher in region 3. Region 1 and region 2 have the same level of absolute mobility at parent percentile 25. However, relative mobility is lower in region 2 which can be seen by the steeper rank-rank slope indicating a larger variance of ranks children obtain in this region. Children with parents in the top of the income distribution reach significantly higher outcomes in region 2 compared to region 1. Note that a steeper rank-rank slope means a larger wedge between children from top and bottom ranked parents and thus a lower level of relative mobility.
It is important to be aware of which aspects the mobility measures above can and cannot capture. The IGE, the slope coefficient of a regression of log incomes, takes into account both the correlation between log incomes and the spread of the child and parent income distribution, since it is equal to
$$ \beta=\frac{Cov\left( {y_{f}^{C}},{y_{f}^{P}}\right)}{Var\left( {y_{f}^{P}}\right)}=\frac{Cov\left( {y_{f}^{C}}, {y_{f}^{P}}\right)}{\sigma_{P}\sigma_{C}}\frac{\sigma_{C}}{\sigma_{P}}=corr\left( {y_{f}^{C}},{y_{f}^{P}}\right) \frac{\sigma_{C}}{\sigma_{P}}, $$
(7)
where σ
C(P) is the standard deviation of the child (parent) distribution. The rank-rank slope on the other hand is just equal to the correlation coefficient between the income ranks since, after transforming income levels into percentile ranks, incomes in all generations are approximately uniformly distributed between 0 and 100 and the ratio of standard deviations cancels out.
If income inequality had grown more from one generation to the next everything else equal (i.e., an increase in σ
C
only), the IGE would now be larger while the rank-rank slope would not change. A change in the mean of the income distribution (a shift of the complete distribution to the left or right), however, will show up in neither the IGE or the rank-rank slope since covariances, standard deviations, and ranks are not affected by such a shift, ceteris paribus.
Regional estimation
The estimation of rank-rank slopes and intercepts by region can be implemented in a variety of ways. The simplest one would be to estimate R different equations as in Eq. 4 for regions r = 1, ... , R by OLS, resulting in R different slopes and intercepts (as done in Chetty et al. 2014a). Let us call this the no-pooling case. Ignoring the regional information completely and estimating the equation for the whole sample as one group would give us one slope estimate and one intercept, i.e., the overall national estimates. We can call this the complete pooling case, for further reference below.
A third and potentially better alternative is to recognize not only the grouped nature of the problem at hand (individuals are sorted into different regions), but to explicitly model this relationship by taking into account both the within- and the between-region variances using a multilevel (or hierarchical) model. Multilevel models are widely used in political sciences (modelling for instance election turnouts or state-level public opinion, see for example, Lax and Phillips 2009, Galbraith and Hale 2008, Shor et al. 2007, or Steenbergen and Jones 2002 for an overview) and in the context of education (students are grouped into class rooms and class rooms into schools and school districts, see for example, Koth et al. 2008). The terminology and notation below follow Gelman and Hill (2006).
The multilevel model is characterized by a level-1 equation for the smallest units (8), in this case modeling the relationship between child income rank and parent income rank for family f in region r, and a set of level-2 equations for the larger units, here the regions. The level-2 equations (9, 10) model explicitly the intercepts and slope coefficients across regions:
$$\begin{array}{@{}rcl@{}} {R_{f}^{c}} & =&\alpha_{r}+\beta_{r}{R_{f}^{p}}+{\varepsilon_{f}^{c}} \end{array} $$
(8)
$$\begin{array}{@{}rcl@{}} \alpha_{r} & =&\gamma^{\alpha}+\eta_{r}^{\alpha} \end{array} $$
(9)
$$\begin{array}{@{}rcl@{}} \beta_{r} & =&\gamma^{\beta}+\eta_{r}^{\beta} \end{array} $$
(10)
where \({\varepsilon _{f}^{C}}\), \(\eta _{r}^{\alpha }\), and \(\eta _{r}^{\beta }\) are random errors centered around zero and with variances \({\sigma _{R}^{2}}\), \(\sigma _{\alpha }^{2}\), and \(\sigma _{\beta }^{2}.\) Another common and equivalent way to write this model is
$$\begin{array}{@{}rcl@{}} {R_{f}^{c}} & \sim & N\left( \alpha_{r}+\beta_{r}{R_{f}^{p}}\:,\:{\sigma_{R}^{2}}\right),\text{ for }f=1,...,F \end{array} $$
(11)
$$\begin{array}{@{}rcl@{}} \left( \begin{array}{c} \alpha_{r}\\ \beta_{r} \end{array}\right) & \sim & N\left( \left( \begin{array}{c} \gamma^{\alpha}\\ \gamma^{\beta} \end{array}\right),\left( \begin{array}{cc} \sigma_{\alpha}^{2} & \rho\sigma_{\alpha}\sigma_{\beta}\\ \rho\sigma_{\alpha}\sigma_{\beta} & \sigma_{\beta}^{2} \end{array}\right)\right),\text{ for }r=1,...,R \end{array} $$
(12)
which emphasizes the fact that the coefficients α
r
and β
r
are given a probability distribution with means and variances estimated from the data. Substituting Eqs. 9 and 10 into Eq. 8, the model can be re-expressed as a mixed model
$$ {R_{f}^{c}}=\gamma^{\alpha}+\eta_{r}^{\alpha}+\gamma^{\beta}{R_{f}^{p}}+\eta_{r}^{\beta}{R_{f}^{p}}+{\varepsilon_{f}^{c}} $$
(13)
where in multilevel terminology, the γ’s are “fixed effects” (= averages across all regions) and the η’s are “random effects” (= draws from the estimated distributions).Footnote 4
The multilevel model appears similar to a random or fixed effects model often used in economics, but there are some important differences. We could for instance estimate a fixed effects model by simply adding 2 × (R − 1) regional dummies to Eq. 4, for regional intercepts and slopes. This approach would basically control away all between-region differences. In a multilevel model, the between-region variance is explicitly estimated from the data and used to predict the regional effects. Also, if there are only few observations in some regions, the estimates using regional dummies will be inefficient. The multilevel model on the other hand makes use of all observations when estimating the variance components and leads therefore to more precise estimates when there is little within-region variance. Importantly, it is thus not necessary to have observations over the whole parent percentile distribution in each of the regions in order to efficiently estimate the model parameters.
Note also that ordinary least squares is just a special case of multilevel models: The variance of the regionally varying parameters is zero in the limit in the complete-pooling case (national OLS) and infinity in the no-pooling model (distinct OLS regressions by region). With multilevel data, however, we can explicitly estimate this variance and do not need to assume it to be either zero or infinity.
Again, in the no-pooling case, the α
r
’s and β
r
’s in Eq. 8 are the OLS estimates from separate regressions, varying completely freely from each other. In the complete pooling case, the α
r
’s and β
r
’s are constrained to one common α and β. Here, in the multilevel model, where Eqs. 8–10 are fitted simultaneously by maximum likelihood estimation, the α
r
’s and β
r
’s are given a “soft constraint”: they are assigned a probability distribution given in Eq. 12, with mean and standard deviation estimated from the data, which actually pulls the coefficient estimates partially towards their mean.
The amount of pooling depends on the number of observations in each group as well as the between-regions variance of the parameters. In fact, an estimate of a regional intercept, for example, can be expressed as a weighted average between the mean across all regions, γ
α (complete pooling), and the average of the \({R_{f}^{c}}\)’s within the region, \(\bar {R}_{r}^{c}\) (no pooling):
$$\begin{array}{@{}rcl@{}} \hat{\alpha_{r}}^{multilevel} & = & \omega_{r}\hat{\alpha}^{complete-pooling}+\left( 1-\omega_{r}\right)\hat{\alpha_{r}}^{no-pooling}. \end{array} $$
(14)
$$\begin{array}{@{}rcl@{}} \hat{\alpha_{r}}^{multilevel} & = & \omega_{r}\gamma^{\alpha}+\left( 1-\omega_{r}\right)\bar{R}_{r}^{c} \end{array} $$
(15)
where the pooling factor ω
r
is calculated according to
$$ \omega_{r}=1-\frac{\sigma_{\alpha}^{2}}{\sigma_{\alpha}^{2}+\frac{{\sigma_{R}^{2}}}{n_{r}}}. $$
(16)
Thus, the intercept in a region with few observations is deemed less reliable and pulled towards the average value of all regions. The estimates for a region with many observations on the other hand will usually coincide with those from a separate OLS regression.
This is the main argument for using multilevel modelling in this particular study: there are many regions in Sweden with relatively few observations. The large regions have more than 400 times as many observations as the small regions. A separate regression for those small regions leads to extreme mobility estimates with large standard errors. In other words, we would not trust those estimates (even though they might seem appealing since we could report some exceptionally low and high levels of intergenerational mobility). Another useful aspect of multilevel models is that it is possible to include regional-level indicators along with regional-level predictors, which would lead to collinearity in OLS.
In a second model, I add five regional types (as described in Section 3.2 below) as a regional level predictor in the form of dummies to Eqs. 9 and 10:
$$\begin{array}{@{}rcl@{}} \alpha_{r} & = & \gamma_{1}^{\alpha}+\sum\limits_{i=2}^{6}\gamma_{i}^{\alpha}T_{i}+\eta_{r}^{\alpha} \end{array} $$
(17)
$$\begin{array}{@{}rcl@{}} \beta_{r} & = & \gamma_{1}^{\beta}+\sum\limits_{i=2}^{6}\gamma_{i}^{\beta}T_{i}+\eta_{r}^{\beta}. \end{array} $$
(18)
This gives the following mixed model:
$$ {R_{f}^{c}}=\gamma_{1}^{\alpha}+\eta_{r}^{\alpha}+\sum\limits_{i=2}^{6}\gamma_{i}^{\alpha}T_{i}+\gamma_{1}^{\beta} {R_{f}^{p}}+\sum\limits_{i=2}^{6}\gamma_{i}^{\beta}T_{i}\,{R_{f}^{p}}+\eta_{r}^{\beta}{R_{f}^{p}}+{\varepsilon_{f}^{c}} $$
(19)
which allows the type of region during childhood to have an effect on both regional intercepts and slopes via \(\sum \limits _{i=2}^{6}\gamma _{i}^{\alpha }\) and \(\sum \limits _{i=2}^{6}\gamma _{i}^{\beta }\).
The model is built step wise, starting with a random intercept per region and adding then random slopes and predictors. After each step, a log-likelihood ratio test i used to assess if the model is a better fit to the data compared to classical regression (first model), or a better fit compared to the previous step.
Maximum likelihood estimation is used to fit the model. The “fixed effects” (regional average) parameters of intercept and slope given by the gammas in Eq. 12 are analogous to standard regression coefficients and are directly estimated. The regional effects given by \(\eta _{r}^{\alpha }\) and \(\eta _{r}^{\beta }\) are not directly estimated but summarized in terms of their estimated variances and covariances. The best linear unbiased predictors (BLUPs) of the regional effects and their standard errors are computed based upon those estimated variance components as well as the “fixed effects” estimates.Footnote 5