Intergenerational mobility in the Netherlands: models, outcomes and trends

We reconstruct the genealogical tree of all individuals ever appearing in Dutch municipalities records starting in 1995. Combining microdata from tax authorities with education records we compute a measure of permanent income as well as education. We estimate the degree of intergenerational persistence in education and income in the population and across time, showing that it is higher than what previous estimates would suggest, albeit it appears to be decreasing. Finally, exploiting information on the education of grandparents, we estimate a model of intergenerational mobility in which endowments are transmitted through a latent factor. Estimates suggest an even higher persistence.


Introduction
The degree to which labour market outcomes are transmitted from parent to offspring has been extensively studied in the literature to investigate the role of family background on the cross-distribution of economic outcomes. Scholars have debated the theoretical mechanisms behind intergenerationally transmitted inequalities (e.g., Becker and Tomes 1976, 1979, 1986Loury 1981) as well as between-and within-country differences in intergenerational mobility (e.g., Solon 2002;Hertz et al. 2007;Chetty et al. 2014) and their underlying drivers (e.g., Durlauf 1994;Borjas 1992;Dustmann 2004). More recently, the increased availability of administrative records allowed researchers to identify extended family linkages, leading to a growing number of contributions exploiting data on extended families to measure different parameters of the intergenerational process (e.g., Adermon et al. 2021;Collado et al. 2019;Lindahl et al. 2015). Importantly, the use of extended family linkages allows testing for the transmission of latent advantage across generations (e.g., Nybom and Stuhler 2019). In the book The Son Also Rises: Surnames and the History of Social Mobility (Clark 2014) Clark and coauthors (Clark and Cummins 2015;Hsu 2021) argue that socioeconomic status across generations is transmitted through an underlying latent factor that is not only much higher than the usually observed parent-to-child correlation of socioeconomic status but is also constant across time and space. Relying on partial measures of socioeconomic status leads one to "systemically overestimate the underlying mobility rate" (Clark 2014, p. 110). While early research on the claim seemed to reject it (Vosters and Nybom 2017;Vosters 2018;Braun and Stuhler 2018), other studies found that, in some countries, Clark's hypotheses prove difficult to reject (Colagrossi et al. 2020; Barone and Mocetti 2021). 1 We exploit administrative data from the Dutch Centraal Bureau voor de Statistiek (CBS) to reconstruct the genealogical tree of all individuals who ever appeared in Dutch municipality records starting from 1995. Matching individual-level microdata from tax authorities with education records we provide measures of permanent income and educational attainmentwhich we convert into years of education.
We compute intergenerational correlation coefficients (IGC) in education and income following both the patrilineal and matrilineal lineages. By (partially) addressing the attenuation bias using three positive years of income streams, we show that intergenerational persistence in income is higher than what suggested by previous estimates using Dutch administrative data (Carmichael et al. 2020). Father-to-son and mother-to-daughter pairs provide IGC estimates of 0.22 and 0.20, respectively. Persistence in education is even greater, with coefficients for both patrilineal and matrilineal lineages around 0.28.
We then exploit information on the education of grandparents to estimate a model of intergenerational mobility in which endowments are transmitted through a latent factor (as in Braun and Stuhler 2018). Such a model can help distinguish between "family's surface or apparent social status and their deeper social competence" (Clark 2014, p. 108), thus potentially addressing any errors-in-variables critique.
We estimate separate latent factor parameters for male and female offspring. For this set of estimates, we use, for each individual from the offspring generation, the highest educational attainment (in terms of years of schooling) among her/his parents. Similarly, we use the highest number of years of schooling among of her/his grandparents. As in previous studies (e.g., Braun and Stuhler 2018;Neidhöfer and Stockhausen 2018;Colagrossi et al. 2020), latent factor estimates are higher than what traditional parent-to-offspring models would suggest, being around 0.6 for both male and female offspring -more than two times larger than estimates of IGC in education.
Finally, to provide a picture of the temporal evolution of the intergenerational transmission process we provide estimates by year of birth of the offspring for education (years of birth 1950-1989) and income (years of birth 1970-1989). For both men and women, IGC in education, partly thanks to the educational expansion, decreases starting for those born in the 70 s. Income mobility shows instead a different pattern. While mother-to-daughter correlation has been fairly stable over time, that between fathers and sons show diverging patterns in the 70 s (where it is stable around 0.25) and in the 80 s, when it decreases rapidly to 0.20. Results from the latent factor model (for years of birth 1975-1989) are less unambiguous. Although the point estimates are not statistically different across years (also because of imprecise estimates in older birth cohorts), our analysis does not unreservedly support Clark's hypothesis of a universal law. Estimates for daughters range from 0.49 to 0.78; similarly, estimates for sons range from 0.43 to 0.68, implying a large difference in economic terms across birth cohorts. While this is in line with much of the literature on the topic (as reviewed in Solon 2018), it has also to be acknowledged that Clark and co-authors have a longer time horizon in mind when arguing that intergenerational mobility is constant across time and space.
The remainder of this paper is structured as follows: Section 2 explains the data used and the selection criteria we used for each of the samples in our analysis; Section 3 discusses our empirical approach while Section 4 presents population-wide findings and trends over time in intergenerational mobility; Section 5 offers some concluding remarks.

Data
We use administrative data from the Dutch Centraal Bureau voor de Statistiek (CBS). CBS collects information on all individuals residing (or that have resided) in the Netherlands. We use 2020 municipal population registers, which contain anonymised demographic information on all persons ever appearing in the municipal population registers starting 1 January 1995. By merging this information and those on the legal parent(s) of each person appearing in the register, we recreate the family lineages for the entire Dutch population up to their earliest available ancestors.
We then match individuals through a unique anonymised identifier with their corresponding microdata on income and education. Income microdata is collected from Dutch public administrations, of which the most important data provider is the Tax and Customs Administration. This provides information on the persons belonging to the population of the Netherlands on 1 January of each year. Our analysis is based on what CBS defines as gross personal income, which includes income from employment, income from own business, income insurance benefits as well as social security benefits, except for child benefits. It instead excludes income from property, other child-related transfers and housing benefits. Microdata on education, which is provided by Dutch education authorities, contains the highest attained education level of individuals. CBS derives it by combining information on education levels from registers and the Labour Force Survey. 2 We convert the highest educational attainment into years of education (see Table A.1 for the conversion scheme).
Finally, we create three samples based on different selection criteria. The first sample, Education, which will be used to examine intergenerational mobility in education, contains all individuals born in 1989 or before for whom we can observe the highest educational attainment as well as that of the mother or the father. It results in about 1,000,000 daughtermother and son-father pairs.
The second sample, Income, which will be used to study intergenerational mobility in income, contains all individuals born in 1989 or before for whom (at least) three positive annual income observations are available in the period 2003-2020 in the age range 28 -60. 3 On the one hand, sons and parents are at different points of their life-cycle, a single-year estimate might lead to underestimating offspring permanent income and overestimating that of parents (Haider and Solon 2006;Nybom and Stuhler 2016), resulting in a downward biased estimate of intergenerational income persistence. On the other hand, the individual income process can be represented as a mixture of a permanent component and transitory shocks (Jenkins 1987;Solon 1992;Zimmerman 1992). To capture the intergenerational transmission process of the permanent component, we compute a proxy of permanent income by averaging out all available income observations. Despite the imposed restrictions, this sample contains about 1,200,000 daughter-mother and son-father pairs.
The third sample, Latent Factor, which will be used to estimate a latent factor model of intergenerational transmission (as in Braun and Stuhler 2018; Nybom and Stuhler 2019), contains all individuals born in 1989 or before for whom we can observe the highest educational attainment (irrespective of their sex) for both their parents and grandparents. This allows us to exploit education data for approximately 74,000 male offspring and 76,000 female offspring from generation t and their parents (generation t − 1) and grandparents (generation t − 2). 4 Figure 1 shows the distribution of income and education in the different samples considered. A similar figure for the year of birth of the individuals in our samples is available in the Appendix, Fig. A.1. Light grey and dark grey bars detail, respectively, female and male offspring. In the Education and Income samples, t −1 values refers to those of the mother and the father of the female and male offspring. In the Latent Factor sample, t −1 and t −2 values correspond to the highest educational attainment among their parents and grandparents, respectively. Descriptive statistics for the full set of variables are presented in Table A.2 in the Appendix.
Overall, the Education and Income samples present similar age profiles. In both the Education and Income samples, offspring are born, on average, in 1979, while their mothers and fathers in 1952 and 1949, respectively. The gender gap is more noticeable across income rather than education. While for the offspring generation the average educational attainment is similar between men and women, women from generation t (the daughters in our samples) have an average income of about 34,000 Euros compared to 50,000 Euros for men. The gender pay gap worsens at t −1 , where women earn, on average, about 30,000 euros less than men. There is instead only a one-year difference in the average years of schooling at generation t −1 , with an average of 12 and 13 for women and men, respectively. Importantly, while women in generation t earn, on average, 12,000 Euros more than their mothers in generation t −1 , men in generation t have an average earning slightly lower than that of their fathers. These descriptive findings might be a consequence of two distinct phenomena. First, a residual downward income bias for generation t due to the different income-age profiles at which offspring and parents are observed despite the use of more than one year of observations; Fig. 1 Years of education and gross income: descriptives. Note: Each bar represents, from top to bottom, the 75th, the 50th and the 25th percentile of the distribution of the completed years of education (top panel) and gross income (bottom panel) for each of the samples used -either education, income or latent factor. The lines extend up to the 10th percentile (bottom) and the 90th percentile (top). Light grey and dark grey bars detail, respectively, female and male offspring. In the Education and Income samples, t −1 values refers to those of the mother and the father of the female and male offspring. In the Latent Factor sample, t −1 and t −2 values correspond to the highest educational attainment among their parents and grandparents, respectively second, in particular for women, this signals the increased labour force participation witnessed over the last decades, doubling from 35% in the early 1980 s to 70% in 2016 (OECD 2017).
Individuals in the Latent Factor sample are, on average, younger. Offspring are born, on average, in 1983/84, while their mothers and fathers in 1957 and 1955, respectively. The grandparents of daughters and sons from generation t are born, on average, around 1928. The educational attainment at generation t is similar to that of the Education sample (approximately 15 years of schooling). The maximum educational attainment among mothers and fathers (generation t − 1) is approximately equal, on average, to 14 years of schooling for both female and male offspring in the Latent Factor sample, compared to 12 years and 13 years for, respectively, mothers and fathers in the Education sample. The highest number of years of schooling among grandparents (generation t − 2) in the Latent Factor sample is approximately 3 years lower on average than the corresponding value for generation t − 1 for both female and male offspring.

Methods
To summarise the degree to which offspring's socioeconomic status depends on that of their parents, scholars usually estimate the slope coefficient of the regression of the socioeconomic outcome y i,t of the offspring i from generation t on that of their parent y i,t−1 : The slope parameter β −1 measures the persistence of socioeconomic status across generations. In our analysis, instead of using the regression coefficient β we estimate the intergenerational correlation coefficient (IGC) which accounts for changes in the variance of the socioeconomic outcome of interest across generations. Yet, this parameter might underestimate the persistence of inequalities across generations if the outcome chosen is only a partial representation of a broader (latent) measure of social status including preferences, cognitive skills, family commodities (e.g., reputation and connections) and other endowments not captured by the realised socioeconomic outcome(s). Braun and Stuhler (2018) formalise this "latent factor" representation of the intergenerational process as follows: Where e i,t is the family's underlying social status and u i,t and v i,t error terms uncorrelated with other variables and past values. The heritability coefficient λ captures the degree to which the unobserved latent endowment is transmitted from parents to their offspring. The transferability coefficient ρ measures how much of the latent endowment is transformed into the observed outcome. Equation 2 can be estimated by exploiting information on multiple generations, e.g., estimating also the grandparents-to-offspring relationship. Within the framework of Eq. 2, assuming g generations, it can be shown that β −g = ρ 2 λ g , which yields a parent-to-child estimate β −1 = ρ 2 λ and a grandparent-to-child estimate β −2 = ρ 2 λ 2 . It follows that: We estimate separately β −1 and β −2 using, respectively parent-offspring and grandparentoffspring correlations. We use the estimated parameters to compute λ and ρ using the respective Eq. 3. 5 It is worth noticing that estimating (2) reduces the classical measurement error that leads to a downward bias in the estimated social status persistence β −1 in Eq. 1 (Solon 2014). Indeed, if the signal-to-noise ratio remains constant across generations, the attenuation bias cancels out in the ratio λ = β −2 /β −1 . Yet, estimating the latent factor model presents some challenges. In particular, our data do not allow estimating λ using a measure of permanent income, as we only observe pension income for individuals from generation t −2 , which would yield a distorted estimate of the transmission process. Therefore, parameters λ and ρ can only be estimated using education as a proxy of socioeconomic status.
As mentioned in the introduction, we estimate both the IGC (for income and education) and the parameters of the Latent Factor model separately for female and male offspring (generation t). We also provide both population-wide estimates -pooling the entire sample irrespective of the birth cohort -and cohort-specific parameters. The latter are meant to provide a characterization of the temporal evolution of intergenerational mobility.

Results
Results for both IGC and the latent factor model are presented in Table 1. IGC in education (school years) is about 0.28 for both father-son and mother-daughter pairs. Our estimates are in line with what is shown by Colagrossi et al. (2020), who document a correlation of about 0.25, but lower than those reported by Hertz et al. (2007), who document a correlation of 0.36. Correlations in income are 0.20 for mother-daughters pairs and about 0.22 for father-son pairs. Results are similar if we compute permanent gross income differently - Table A.4. Considering only individuals for which we can collect at least five years of positive annual income observations in the period 2003-2020 in the age range 28-60, changes are negligible; the matrilineal ICG moves from 0.199 to 0.204, while the patrilineal one from 0.224 to 0.238. Mostly identical results are also obtained if (i) income is deflated at 2015 constant prices; or (ii) only individuals for which we do not observe any unemployment spell over the years of income observability included are kept in the sample.
Finally, estimating a latent factor model yields a considerably higher persistence than suggested by standard IGC measures. For both male and female offspring, we estimate a heritability coefficient λ ≈ 0.61 and a transferability coefficient ρ ≈ 0.65. We also estimate the parameters of the full matrilineal and full patrlineal latent factor models, i.e., using information on maternal grandmothers-mothers-daughers and paternal grandfathers-fatherssons. The point estimate of the heritability coefficient for the matrilineal lineage is equal to  Therefore, to understand the degree of the potential bias arising from using only one yearly income observation at the earlier stage of the income-age profile, we begin by replicating the set-up of Carmichael et al. (2020). We compute IGC for parent-offspring pairs for which we can observe a positive (gross) income for the daughter (son) of age 28 in 2016, matching it with their mothers (fathers) for which we can observe a positive (gross) income in 2003. Results are available in Table 2. Using this estimation strategy yields an IGC of 0.11 and 0.14 for mother-daughters (panel A) and father-son pairs (panel B), respectively. We then increasingly add an additional consecutive year (up to five years) of positive gross income observation for each offspring (left facets), parent (middle facets), or both (right facets).
These set of results show that, by using only one income observation early in the offspring income life-cycle, results might be severely downward biased. For the matrilineal lineage, using five years of income observations for both mothers and daughters yields estimates that are about 50% larger, i.e., 0.18 against the initial 0.11. For the patrilineal lineage, using five years of income observations for both fathers and sons yields estimates that are about 37% larger, i.e., 0.20 against the initial 0.14. The correction of the bias seems more difficult for the matrilineal rather than the patrilineal lineage. Adding increasing years of income observations for both daughters and mothers yields increasingly larger estimates, whereas in the patrilineal lineage is the addition of income observations for the son that yields the larger increase in the point estimates -i.e., adding income observations to only the fathers matters marginally. This could be potentially explained by the fact that women might have, on average, more volatile careers. In this case, proxying permanent income with fewer years of income observations might reflect transitory deviations. Note: Bootstrapped standard errors are in parentheses (200 replications). N details the number of observations. Panel (a) presents mother-to-daughter correlations whereas (b) to father-to-son's. Years Inc. represents the number of years included for calculating the intergenerational correlation in income. Daughter (Son) panel implies changes in the number of years of income observed for the offspring while keeping the mother (father) years of observations constant to 1. Mother (Father) panel implies changes in the number of years of income observed for the parent while keeping the daughter (son) years of observations constant to 1. Panel both implies increasing changes in income observability in both parents and offspring Overall, this evidence places the Netherlands closer to Central European Countries rather than Nordic European countries concerning intergenerational mobility, in particular with respect to income mobility and latent factor estimates. Indeed, the coefficient on income mobility is 1/4 higher than what is usually reported for other Nordic Countries (see Aaberge et al. 2002). In addition, estimates for Eq. 2 are in line with those for Germany (Braun and Stuhler 2018;Neidhöfer and Stockhausen 2018), albeit they are slightly higher than the latent factor estimates reported in Colagrossi et al. (2020) (≈ 0.51) using cross-country European survey data. Figure 2 shows IGC estimates separately for each offspring's year of birth in education and income. In both cases, the left panel shows the mother-to-daughter persistence while the right panel the father-to-son. The sample is restricted to those years in which we can observe at least 1000 matrilineal or patrilineal pairs, therefore covering the period 1950-1989 for education and 1970-1989 for income. Detailed yearly point estimates and the number of yearly observations for education and income are reported in the Appendix, Tables A.5 and A.6, respectively.
The matrilineal and the patrilineal lineages also show different trends for income. Motherto-daughter IGC in income shows no significant decrease over the period 1970-1989. On the contrary, a slow upward trend can be noticed. Daughters born in the early 70 s enjoyed slightly higher social mobility than those born in the 80 s. Father-to-son IGC is instead higher. However, starting in the 80 s, and in particular, in 1985, a large drop in the persistence of income can be noticed. However, it is important to highlight that these estimates might be driven by life-cycle issues, as for those born in the years 1985-1989 we can only observe income streams in the years around the late 20 s (starting from age 28) and early 30 s.  [1970][1971][1972][1973][1974][1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989] Concerning education, for offspring born in the period 1950-1970, the matrilineal and the patrilineal lineages show different trends. While the mother-to-daughter IGC increases from around 0.20 to 0.30, the father-to-son IGC remains stable with values not statistically different from 0.30. In both cases, starting from those born in the 70 s, the IGC steadily decreases reaching a value of roughly 0.25 for both daughters and sons. This can be explained by the educational expansion that men and, in particular, women witnessed starting in the aftermath of World War II (Fig. A.2). 6 Few other studies discuss trends in intergenerational mobility over time, mainly due to stringent data requirements. While other studies look at different measures of mobility over time (e.g., intergenerational elasticities, absolute mobility), we focus on intergenerational correlations in income to partially address changing variances (i.e., inequality) across birth cohorts. Lee and Solon (2009) document how, in the US, for the birth cohort 1977-2000 no sizable changes can be observed in intergenerational elasticities for either men or women. However, looking at absolute upward mobility rather than elasticities (i.e., whether offspring have a higher income than their parents), Chetty et al. (2017) find that rates of absolute mobility have fallen from 90% for children born in 1940 to approximately 50% for those born in the 1980 s. These findings for the US are in line with those by Manduca et al. (2020), which find a similar effect. Manduca et al. (2020) also show that (i) for Finnish children born between 1965-1985 absolute upward mobility remained constant at less than 70%; (ii) the same trends apply to Norwegian children over the same time span albeit the degree of mobility is higher (about 75%); (iii) Danish children show a decrease in their upward mobility opportunities -from about 70% for those born in the late 60 s to 50% for those born in the early 80 s; and (iv), importantly, they document how absolute upward mobility in the Netherlands decreases from 80% for those born in the mid-70 s to 70% for those born in the mid-80 s. Pekkala and Lucas (2007) document instead how, in Finland, there is a decline in elasticities from the 1930 birth cohort until the baby boom cohorts of the early 1950 s, whereas for 1950s and 1960s birth cohorts they document increasing persistence, a result that holds for both daughters and sons. 7 Finally, Fig. 3 shows yearly estimates of the latent factor parameters, λ and ρ. The left panel shows the results for daughters from generation t, whereas the right panel refers to sons. As for Fig. 2, the sample in Fig. 3 is restricted to those years in which we can observe at least 1000 triplets, therefore reducing the period to [1975][1976][1977][1978][1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989]. Detailed yearly point estimates and the number of yearly observations are reported in the Appendix, Table A.7. For both daughters and sons, point estimates do not appear to differ significantly across cohorts. Both ρ and, more importantly, λ, are not statistically different from the values recorded for the overall population (Table 1). However, our analysis does not unconditionally support Clark's hypotheses of persistence as high as ≈ 0.75 and constant across time. While our point estimates for λ are indeed higher than what Markovian father-to-son models would suggest, they are rarely as high as 0.75. In addition, estimates variation, although not statistically significant due to low precision in the earlier cohorts because of a reduced number of observations, suggests meaningful differences in economic terms. Estimates for daughters range from 0.49 to 0.78; and for sons from 0.43 to 0.68. This evidence, albeit partial and restricted to only 15 years of observations, while partially supportive of Clark's idea of an underestimation of the true underlying mobility rate (Clark 2014), does not fully corroborate the idea of persistence as high as 0.75 and constant over time. Of course, it has also to be acknowledged that Clark and co-authors had a longer time horizon in mind when claiming that mobility follows an universal law.

Conclusion
Estimating the degree to which inequalities are transmitted across generations provides information on whether, as Becker and Tomes (1986) argue, offspring can go to "shirtsleeves to shirtsleeves in three generations" or whether offspring born in relatively poor circumstances would perpetuate poverty.
In this paper, we addressed the issue by exploiting administrative records to reconstruct the genealogical tree of all people residing in the Netherlands starting in 1995. We then matched individuals with microdata from tax and education authorities to compute a measure of permanent gross income as well as education. We showed that intergenerational persistence in education is as high as 0.28 for both male and female offspring. Persistence in income is lower (around 0.20). We then argued that previous estimates are likely to be downward biased due to the use of a single income observation at the early stages of the income life-cycle. For young offspring, adding additional years of income observations yield figures more in line with those recorded in the population.
We also documented the temporal evolution of the intergenerational transmission process in the Netherlands. We provided evidence of a slow but steady decrease in education persistence for individuals born between the 70 s and the 80 s, after the increase recorded for women born between 1950 and 1970. Regarding income, we showed diverging patterns. Motherto-daughter correlations slowly increased in the period 1970-1989, whereas father-to-son correlations were fairly stable for those born in the cohorts between 1970-1985. Finally, we showed the results from a latent factor model in which social status is transmitted through (unobserved) endowments. Population-wide results partially support Clark (2014) hypothesis, pointing towards a much higher transmission (≈ 0.6) of inequality across generations. However, when we analyse the transmission of a latent factor model across time, the results are more ambiguous. The variation of point estimates in the offspring's years of birth 1975-1989, although not statistically significant, suggest economically meaningful changes. While Clark's and co-authors had a larger time horizon in mind when discussing their universal law of mobility, these results could signal that some variation might occur.

Funding
The authors did not receive support from any organization for the submitted work.

Data Availability
The data that support the findings of this study are available from the Centraal Bureau voor de Statistiek, (CBS). Restrictions apply to the availability of these administrative data, which are not publicly available and accessible only directly through CBS via a secure connection. Additional information is available at https://www.cbs.nl/en-gb/onze-diensten/customised-services-microdata/microdata-conductingyour-own-research.

Code Availability
The code used in the preparation of this study is available on the authors' GitHub page.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.