Intergenerational transmission of education: the relative importance of transmission channels
- First Online:
- Cite this article as:
- Wendelspiess Chávez Juárez, F. Lat Am Econ Rev (2015) 24: 1. doi:10.1007/s40503-014-0015-1
- 1.5k Downloads
This paper aims at quantifying the relative importance of different transmission channels generating the high levels of intergenerational correlations in education, especially in Latin America. A simultaneous equations model is applied to rich survey data from Mexico. The results show that the economic situation of the family has the highest impact, even more than heritability of cognitive abilities. The long-run economic situation seems to matter more than the current consumption level. Parental education affects the schooling outcome directly but also indirectly through the economic situation, which is particularly true for the father.
KeywordsIntergenerational transmission of education Social mobility IQ transmission Inequality Mexico
JEL Classification:D31 I21 I24 I62
Education is a main ingredient for a successful life in modern societies. However, in many countries the opportunities to get well educated strongly depend on the family background, particularly in Latin America. As a result, we observe very low intergenerational mobility. Understanding the mechanisms generating this intergenerational persistence in education is essential to target policy measures adequately.
In this paper, I try to shed some light on this mechanism by applying a simultaneous equations model to data from Mexico. The main goal of the study was to estimate the relative importance of different transmission channels and their interactions.
The three main mechanisms put forward in recent theoretical and empirical contributions can be broadly described as being the biological, the economic and the direct education-to-education channel.
The biological channel refers to the genetic transmission of ability, often measured by the IQ, which explains a part of the relationship (Anger and Heineck 2010; van Leeuwen et al. 2008; Björklund et al. 2010; Black et al. 2009).
Poor families facing credit constraints are an example for the economic channel, because they cannot borrow against the expected future earnings of their offspring, which generates a link between the socioeconomic situation of the parents and the schooling of their children (see for instance Black and Devereux 2010; Attanasio and Kaufmann 2009; Stinebrickner and Stinebrickner 2007; Carneiro and Heckman 2002; Alfonso 2009).
Finally, a higher return to education for children with highly educated parents is an argument for the direct education-to-education channel (Black and Devereux 2010). It might also include preferences for education, non-cognitive skills, aspirations and many other factors.
The empirical literature focuses on estimating the causal effect of parental education on children’s education. Holmlund et al. (2011) revise this literature and propose a comparison of different methods applied to the same dataset from Sweden. They conclude that the estimates differ substantially across identification strategies and that no method is perfect. They find relatively modest causal effects of parental education and point to the importance of analyzing in more detail other mechanisms explaining the intergenerational transmission of education. They hypothesize about the possibility of an indirect effect of education through better socioeconomic environment that can be offered to children. This conclusion on the need of better understanding the mechanisms is shared by recent literature surveys such as Black and Devereux (2010), Björklund and Jäntti (2009) and Piketty (2000) who coherently argue that more empirical research must be undertaken to understand the mechanisms behind educational mobility and social mobility in general. Black and Devereux (2010, p. 69) conclude that “[...] there is still much work to do to pin down which family background factors are most important” and Björklund and Jäntti (2009, p. 516) argue that “a major challenge for future research is to find out what in the family other than income is important for the future of children”.
This study tries to contribute in the proposed direction by moving from the estimation of one single causal effect to the estimation of a larger system of effects, incorporating simultaneously the three channels previously outlined. This focus on different mechanisms at the same time aims at getting a better understanding of the larger picture—the whole process of intergenerational transmission of education. In this respect, this approach should be seen as a complement to the single causal effect estimation and not as an alternative.
The choice of Mexican data is primarily motivated by the importance of high intergenerational correlations in education in Latin America. Hertz et al. (2007) compare educational mobility of 42 countries including 7 Latin American ones. These seven countries take the first seven places ranked according to their intergenerational education correlation. The country with the highest correlation is Peru (0.66), followed by Ecuador, Panama and Chile.
Within Latin American Countries, Mexico displays relatively high intergenerational persistence in education. Dahan and Gaviria (2001) compare 16 Latin American countries using data from the late 1990s and find that Mexico has the second lowest intergenerational mobility level behind El Salvador. de Hoyos et al. (2010) use recent data on social mobility in Mexico. They report correlations between the education in years of parents and children, finding the highest correlation of about 0.6 for the children cohorts born between 1942 and 1951, followed by a reduction of the correlation to about 0.5 for the cohort 1962–1971 and finally a new increase of the correlation to 0.55 for the youngest cohort, composed of children born between 1972 and 1981. According to the same authors, this recent increase is even higher when using different data sources. The same pattern of increasing educational mobility prior to the economic crisis in the 1980s and a subsequent decrease was found by Binder and Woodruff (2002), who use different cohorts to estimate the intergenerational link.
Hence, the Mexican case is not only interesting on its own but might be representative for other Latin American Countries. Moreover, another advantage of Mexico is the high-quality data available. The suitability of the survey for this study is underlined by the availability of cognitive ability scores. Nevertheless, the analysis faces a series of empirical challenges in the form of trade-offs. This study tries to find a balance between a more complete model with very high data requirements and a simpler model in which less data are lost due to unavailable information.
The main result is that even when controlling for parental education and ability, the economic situation of the family has the largest direct effect on the schooling outcome of children. This important source of inequality of opportunity could be reduced by policy interventions targeting the link between economic requirements and schooling. Moreover, the estimation shows that there are important interactions between the channels, suggesting that the exclusion of some channels might seriously bias estimates.
In Sect. 2 I will review the literature on the mechanisms of educational mobility motivating my empirical models I present in Sect. 3. In Sect. 4 I describe the data used and especially the needed transformations in detail and present some descriptive evidence. In Sect. 5 I present the main results, which are complemented by some figures in “Appendix B”. Finally, Sect. 6 concludes the paper.
2 Theory and empirical evidence on educational mobility
Intergenerational mobility in education is a complex phenomenon that does not rely on a single mechanism. The literature identified three main channels of transmission (Chevalier 2004). The first channel is the biological transmission of ability, the second refers to the dependence of schooling outcome on the economic situation of the parents and the third deals with direct education-to-education effects. In this section, I present some theoretical and empirical contributions to the understanding of these channels.
2.1 Ability transmission through genes: the biological channel
The direct transmission of abilities, which is not limited to simple IQ transmission, represents a biological explanation of the phenomenon. For instance, Becker and Tomes (1979) use the term endowments acquired from parents to describe this direct transmission. They provide a theory of intergenerational transmission based on rational choices through a human capital theory approach, where the ability level of the offspring is a key determinant of the decision. Their model was consequently extended by Loury (1981) and Solon (2004) and serves as a benchmark in many analyses.
Empirically, much work has been done to determine the importance of this channel. In a meta-analysis of 212 IQ studies, Devlin et al. (1997) quantify the genetic transmission and find the broad-sense heritability of IQ to be 48 %.
Social scientists put more emphasis on quantifying the overall IQ correlations between parents and children, which might also include environmental effects in addition to the pure heritability measured by Devlin et al. (1997). For instance, Anger and Heineck (2010) use German panel-data with two ultra-short IQ-tests to estimate the parent-offspring relation. They find that a 1-point increase in parents’ score results in a 0.45-point increase in the coding speed (inherent ability) and 0.50-point increase in word fluency scores. The estimated coefficients remain stable at the inclusion and exclusion of control variables. Björklund et al. (2010) use Swedish data from military IQ tests and official registers. They estimate intergenerational and sibling IQ correlations. The estimated values are all highly significant and attain values of 0.346 for father-son, 0.510 for siblings and 0.65 for twins. According to the authors, their estimations represent rather a lower bound of the true values. Black et al. (2009) find a similar father-son IQ-correlation (0.38) in a comparable study with Norwegian data.
van Leeuwen et al. (2008) go even further in the analysis of IQ-transmission by dividing it further up. No evidence for cultural transmission of the IQ was found and no indication that intelligent parents provide children with intelligence promoting circumstances. Individual differences in intelligence were found to be largely accounted for by genetic differences. Moreover, they find a spousal IQ-correlation of about 0.33 suggesting a relatively high degree of assortative mating.
2.2 Credit constraints and the economic situation: the economic channel
According to the logic of the economic channel the intergenerational correlation in education is the fruit of an underinvestment in education by poor families. This idea that poorer families might face credit constraints making the optimal investment in the human capital of their offspring impossible can be found, for example, in Banerjee and Newman (1994) and Loury (1981).
Empirical research was not able to conclude on the exact importance of credit constraints and the economic environment in general. While it seems that the impact of credit constraints is relatively modest in richer countries, some evidence was found that in developing countries the effect is larger.
Stinebrickner and Stinebrickner (2007) analyze the situation at a college in the US using panel data of students. They find that a group of students is credit constrained in consumption during their stay at the college, but that many of them are not willing to borrow.
Carneiro and Heckman (2002) critically revise the literature on the question of credit constraints. They compute that using modern US data, only about 8 % of students really face short-term credit constraints. They argue that long-term effects, such as the family environment during the whole schooling period of children, play a much bigger role. Winter (2007) elaborates a computable general equilibrium model to evaluate the role of credit constraints in the decision whether to go to college. The model is calibrated for the US economy and predicts observed patterns quite well. The findings contrast the results of few credit constrained students found by other studies and argue that econometric estimates such as in Carneiro and Heckman (2002) are downward-biased. In line with the results of other studies is the observation that the share of people financially constrained has increased (dramatically) over the past decades.
Alfonso (2009) presents a study of 4 Latin American countries (Mexico, Chile, Colombia and Peru). She shows that the effect of credit constraints disappears in regression analysis when controlling for long run family variables (parental education, family assets, etc.). However, the relatively small effect of credit constraints increases from the oldest to the newest datasets used in the study.
Attanasio and Kaufmann (2009) use Mexican data to analyze the relationship between post-secondary school decisions and subjective expectations. Among other findings on the role of expectations, they show that credit constraints represent an important issue for poor Mexicans, in contrast to some literature coming from higher developed countries, where these effects do not seem to be as present.
To sum up, the literature finds evidence for the existence of the economic channel in producing high intergenerational correlations in education. It remains somewhat unclear if short-run credit constraints or the long-run economic situation are more important determinants. It seems, however, that in developing countries, both contribute to the low educational mobility.
2.3 Education to education transmission
Finally, the third channel considered is the direct effect of parental education on the schooling attainment of the children. This channel is generally known as the nurture effect, capturing the direct causal effect of parental education (Holmlund et al. 2011; Chevalier 2004). Dickson et al. (2013) show that this direct causal effect starts at very early ages and remains visible years later when comparing students’ performances. Different explanations why parental education should have a direct causal effect can be found in the literature. One possible explanation is that highly educated parents tend to encourage their children more to achieve high levels of education (Merton 1953; Boudon 1973, 1974; Sewel and Shah 1968). For instance, Steinberg et al. (1992) show that parental encouragement and parental school involvement have important effects on the school performance of children. Besides this active encouragement and involvement of parents, it can also be argued that the child’s aspirations increase when parents have more education (Sewel and Shah 1968; Ermisch et al. 2006). Ermisch et al. (2006) argue that parental education can alter the productivity of parents’ time investments in children. It could also be argued that expected returns to education depend on parental education. Jensen (2010) shows that students with higher educated parents tend to perceive higher returns to education. Hence, this third channel is motivated by a series of arguments and most likely composed of different sub-channels. The distinction of these different sub-channels is beyond the scope of this paper. In this paper I consider a compound channel linking high parental education to high offspring’s schooling.
3.1 Conceptual model of educational mobility
First, there is a direct link between abilities of parents and children, presented with the dotted line. The ability of the parents influences the ability of children, which in turn increases their propensity for education. The economic channel is represented by gray arrows using the compound term economic situation, which includes short-term credit constraints and long-run effects of assets. Finally, the third channel illustrated with solid black arrows represents the direct education-to-education transmission, which is based on many different hypotheses as explained above.
A channel that I do not consider in this study is the health channel. The health status of the child is likely to be influenced by the family background on the one hand (Bradley and Corwyn 2002; Rosa Dias 2009; Delajara and Wendelspiess Chávez Juárez 2013), while it might also have important effects for education on the other hand (Case et al. 2005; Doyle et al. 2009). There are two reasons for not including this channel, both related to the data. First, there is a problem of timing in the health data. While the literature emphasizes on the importance of the prenatal period and early childhood in the health dimension (Heckman 2006; Doyle et al. 2009; Delajara and Wendelspiess Chávez Juárez 2013), in the data I can at most observe the current health status of the child. Unfortunately there is no retrospective information on the child’s health conditions at birth available. Second, as I will highlight in Sect. 4 the inclusion of additional variables would seriously reduce the sample and increase the risk of sample selection problems. In the “Appendix E” I present some regressions where I include variables for health and personality traits to illustrate the problems just described.
Different strategies are possible to analyze such a framework empirically. One way is to focus on one particular link. For instance, Holmlund et al. (2011) present different methods to estimate the causal relationship between the education of parents and children. In this paper I use two different strategies. First, I focus on a single equation approach where I aim at estimating all the determinants of the child’s education outcome in one regression. In a second step, I move to a simultaneous equations model where I estimate several equations to describe the whole system outlined in Fig. 1. I will now explain the two approaches along with their intuition, advantages and challenges.
3.2 The single regression approach
The main goal of the empirical application is to estimate the intergenerational links determining the educational outcome of the child. Therefore, a first empirical approach consists in regressing the educational outcome on the possible intergenerational determinants while controlling for some contemporaneous effects. The intergenerational determinants are parental education and the economic situation of the family. Moreover, I also control for parental age, whether the parents have an indigenous background and some variables capturing the family structure1. The most important contemporaneous effect I control for is the cognitive ability of the child. Additionally, I also control for contemporaneous effects such as child labor, government program benefits, state fixed effects and indicators for girls and rural areas.
Besides its appealing simplicity, the single regression approach has the advantage that we do not have to impose a lot of structure in the model. Therefore, only the standard assumptions for ordinary least squared must be fulfilled. One concern that could arise is that some variables do not satisfy these conditions and are likely to be correlated with the error term. The most likely reason for this to happen in our context is an omitted variable bias. A first critical variable is the ability level of the child, which might also capture for instance motivation or the ability to perform well in a situation of examination. Both potentially omitted variables are also likely to influence the schooling outcome. To a large extent, these concerns are reduced by the type of cognitive ability measure I am using in this study. The ability measure is based on a short version of the Raven’s progressive matrices test, which is one of the most culture-free and education independent IQ-tests (Désert et al. 2009). I discuss this in more detail when describing the data in Sect. 4. A second variable that might suffer an omitted variable bias is parental education, as parental education might also be influenced by preferences and taste for education. If these preferences and tastes are also transmitted to the next generation, we are likely to have an endogeneity problem as well.
In order to account for these potential endogeneity issues I also perform instrumental variable regressions for the single regression approach. Father’s and mother’s cognitive ability is used to instrument both the parental education and the child’s ability level. The cognitive ability of the parents should be a strong predictor of both parental education 2 and the cognitive ability of the child. At the same time, the cognitive ability of the parents should not have a direct impact on the schooling outcome of the child. For the case of parental education, I additionally use information on the place of living of the parents when they were 12 years old. A dummy capturing whether they lived in a town or not is used.3 The idea is that parents living in cities had substantially more access to education than parents living in rural areas. At the same time, the place of living of the parents when they were 12 years old should not directly affect the education outcome of their children today. In the results section, I will present in detail the different tests for the instruments, which clearly indicate that the instruments are valid and strong.
Let me first introduce the second estimation approach, where I simultaneously estimate the whole system presented in Fig. 1.
3.3 The simultaneous regression approach
In addition to the single regression approach previously presented, I also use a simultaneous equations model approach to estimate not only the intergenerational links, but also the different transmission channels in more detail. There are two main advantages of focusing on the whole system. First, it allows us to estimate the relative importance of the three channels put forward by the literature. This is important because analyzing a specific channel and finding significant effects does not tell us much about the relative importance of the analyzed channel with respect to other possible channels. A channel might show very significant effects and at the same time be relatively irrelevant for the whole system. Second, estimating a system allows us to consider interactions between channels and as a consequence direct and indirect effects. For instance, parents’ ability is likely to have both of them. The direct link effect of parental ability refers to the biological channel introduced earlier. The indirect effect goes through parental education and the economic situation. More able parents are likely to have more education and a better socioeconomic status. Behrman and Rosenzweig (2002) show that intergenerational correlations between mother’s and children’s education might be biased when such system aspects are not considered. For instance, they mention that such estimates are upward biased, when not controlling for the ability channel or for assortative mating.
The white filled boxes refer to exogenous variables, the gray boxes represent endogenous variables and the arrows describe direct effects. Note that for the sake of readability of the graph, I present both parents and both economic indicators together. Even though the graphical representation is more illustrative, I will now discuss the model mainly based on the equations presented above.
Equation (1) describes the genetic transmission of cognitive ability, where the main explanatory factors of the child’s ability are the parental cognitive ability scores.4 Additionally to the parental ability scores, I add some control variables such as the gender of the child, a dummy for first-born children and two dummies for children with a small (\(<\)20 years) and large (\(>\)40 years) age difference to their parents. Parental education is excluded from this regression because we can assume that there is no direct effect on the cognitive ability score of the child. This assumption is based on the nature of the used cognitive ability test, which is education- and culture-independent (Désert et al. 2009). For the same reason, no feedback effect from education to ability is included. The economic situation is not included as an explanatory variable because the transmission through genes took place at birth, while the economic situation indicators are contemporaneous values and have, therefore, no direct effect.
Equations (2) and (3) are simplified education production functions for the father and the mother, respectively. The idea is to link parental ability to their schooling outcome and to control for cohort- and ethnicity-based differences in the educational level of parents. I also include the instrument of the single regression approach capturing the place of living of the parents when they were 12 years old. The idea is that parents that grew up in towns had substantially better access to schooling than those living in the countryside. Both indicators of the economic situation are excluded from this regression because of the timing of these variables. The economic situation today does not directly explain the educational achievement of the parents when they were in the schooling age.
Equations (4) and (5) estimate the effect of parental education and ability on the two indicators of the economic situation. The economic situation is split into consumption and a wealth index. This index was obtained by taking the first component of a principal component analysis on several indicators of durable good holdings and housing conditions.5 Taking this index instead of the full set of indicator variables allows me to reduce the dimensionality and to use the wealth index as an indicator for the long-run economic situation. The use of both, the wealth index and the current consumption level, is motivated by the findings in the literature, saying that the (long-run) economic environment is more important than current consumption. In addition to the exogenous variables, the economic situation is also influenced by parental education.
Finally, Eq. (6) is the main equation corresponding to the single equation approach outlined before. The explanatory variables of interest are parental education, the economic situation indicators and the ability score of the child. I also control for some contemporaneous effects by including control variables capturing the family structure, the place of living, the government program benefits, the working conditions of the child and the gender of the child.
To sum up, the coefficients of main interest are the \(\beta \)’s and to a lesser extent the \(\psi \)’s. The \(\beta \)’s estimate the direct impact of family background variables and the child’s ability on educational attainment. The \(\psi \)’s permit us to estimate the relationship between parental and child’s ability, i.e. estimating the biological transmission. Through the \(\psi \)’s and \(\beta _1\) the total effect of the biological transmission on the educational outcome can be estimated. This setting allows us to estimate the relative importance of the different channels in the educational transmission. The model is estimated using the maximum likelihood method under normality assumptions (Muthén 2004). In contrast to the instrumental variables techniques used in the single equation approach, the identification of the simultaneous equation model is somewhat more complicated to show. The model presented in this paper is easily identified due to its quasi-recursive structure. I use the term quasi-recursive because I do not assume independence of the error terms for contemporaneous equations. As a consequence, I use more restrictive identifying conditions to show that all parameters in all equations are identified. A detailed discussion of the identification conditions along with the proofs can be found in the “Appendix D”.
4.1 Data description
The analysis of this paper requires very complete data at the micro level for both children and parents. The Mexican Family Life Survey (MXFLS)6 is a very rich and award-winning panel data project from Mexico and fits these requirements quite well. I use information from the first two waves (2002 and 2005), focusing on the latter wave. The panel structure was mainly used to reduce the amount of measurement errors, for instance, by identifying and correcting impossible values for time invariant variables.7 To the extent of my knowledge, this is the best data source from a Latin American Country for this kind of analysis, particularly because it includes short cognitive ability tests. Nevertheless, the data are not perfect and before starting with the analysis I discuss some trade-offs faced and the resulting decisions taken.
4.1.1 Choosing the age range of the sample and the schooling outcome variable
A first challenge is to choose correctly the age range of the primary units of analysis. In order to estimate properly the correlation of years in education one would have to limit the analysis to people having finished their education, i.e. mostly people over 25, implying two major problems.
First, older individuals are probably no longer living with their parents. However, as I do not have administrative data, I can only establish the link between children and parents when they are living in the same household. Those still living with their parents years after completing school are most likely not representative for the whole population.
The second problem is that the schooling period of older people having finished education lies potentially far in the past. Therefore, the economic situation for that time would be hard to proxy and the mechanisms I would analyze would be those prevailing some years ago, which is not necessarily very policy relevant.
At the lower bound of the age range we cannot include too young children, as they are only about to start their educational path. Therefore, the information on years of schooling is likely to be much less related to their final schooling outcome as compared to slightly older children.
For these reasons, I focus on children and young adults from 12 to 25 years and use a constructed education index instead of years of schooling. The idea behind this index is very simple: instead of measuring the final outcome, I consider the delay in schooling that people have with respect to their peers. The index is computed by dividing an individual’s years of schooling by the average years of schooling of her age cohort. A value of 1 corresponds to a child that is just on time compared to its peers; a value below 1 suggests a delay.
On the left graph we can see that the dispersion of the index is relatively stable for the ages corresponding to secondary and high school. For the ages corresponding to tertiary education the dispersion increases, especially the 95th percentile increases. This change is due to the fact that a substantial proportion of individuals do not continue education beyond the high school level. Therefore, the reference level remains at lower levels and those actually attending tertiary education achieve higher levels of the education index. On the right-hand side graph we can see that there is a considerable amount of variation in the index, starting at values close to zero for those with no or very little education and going up to almost 2. For the younger age group a stronger concentration around the value of 1 can be observed. This is due to the fact that less variation in the years of education is observed.8
The underlying assumption of this indicator is that a delay in schooling is translated later on in fewer years of schooling. In “Appendix A” I present some empirical evidence of this relationship and provide some additional information on the Mexican education system, along with some basic statistics such as enrollment rates.
4.1.2 Variable selection and construction
A second data challenge is to include as much relevant information as possible by minimizing the cost in terms of loss of observations due to missing values. My strategy in the variable selection process was to give absolute priority to the three main channels discussed earlier. At the same time, I tried to avoid unnecessary loss of observations due to less relevant variables. To face this trade-off I started by defining a set of absolutely needed variables which cannot be excluded from the analysis without seriously changing the model. For this type of variables, dropping observations due to missing values is unavoidable. A second series of interesting but not absolutely indispensable variables was selected trying to avoid variables that would cause a large loss of observations. In this respect, some variables potentially able to capture soft skills and personality traits were excluded, because too many observations would have been lost. 9
One of the main reasons to use the MXFLS data is the availability of cognitive ability measures based on Raven’s progressive matrices (RPM). According to Désert et al. (2009) RPM is a frequently used intelligence test with proven reliability and validity in measuring cognitive aptitudes and reasoning. Désert et al. (2009) further highlight that this IQ test is less education dependent than others, reducing the risk of feedback effects from education.10 Different versions of the test were applied to children (5–12 years) and adults (13–65 years). In order to have comparable scores across age groups, the values were normalized to a distribution with mean 100 and standard deviation 15 for each age. The choice to normalize to the mean and standard deviation of the IQ is essentially for illustrative purpose, but it does not imply that the cognitive ability scores can be seen as a complete measure of IQ. Moreover, the normalization is not relevant for the results, because I report only standardized coefficients, which are by definition independent of previous normalizations. Given the panel structure of the survey, two test scores per person are available, allowing us to compute the average score of the person to reduce measurement errors. Observations where the two scores had a difference of more than 2 standard deviations were dropped from the analysis. For people with only one valid test score this was taken to avoid losing too many observations.
Parental education in years was obtained by computing the average time spent in school to achieve the reported education. Repeated years are, therefore, not considered as schooling years, as one can argue that they do not provide additional human capital. Note that the question on the achieved education level is asked twice in the survey. Once it is asked in the roster questionnaire and once in the individual questionnaires. I primarily took the information from the individual data and completed it by the roster data when the individual data was missing.
The family log-consumption per capita was obtained out of a series of information on consumption and normalized to the consumption per equivalent adult following the methodology proposed by Rojas (2007), who provides estimates for Mexico based on the subjective well-being approach.11 The wealth index was obtained by taking the first component of a principal component analysis performed on several household assets and indicators of the housing conditions. A list of the included indicators and their relative importance for the wealth index and its possible relation with parental age are reported in the “Appendix C”. The remaining variables included in the study were constructed in a straightforward way according to the standards in the literature and are reported with a short description in Table 1.
4.1.3 Sample size and sample selection
Initially 11,273 children and young adults aged between 12 and 25 years were present in the database. From these only 8,155 individuals lived with both parents in the same household. This is a necessary condition for this study, since otherwise no cognitive ability scores of the parents are available. Proxies for other variables such as education of absent family members would be available, but the cognitive scores are not. Missing values in parents’ and children’s characteristics introduced another loss of observations, reducing the sample to 4,266 observations. The large loss of observations is not surprising considering the data requirements of the study and the fact that they are survey data from an emerging country. These data are obviously not as good as administrative data from European countries that were used in some other studies on the topic. It could be argued that the loss of observations introduces sample selection biases. Being fully aware of this fact, I try to show that the sample used in this study produces some very comparable results to findings in the literature and that the analysis is quite robust to changes in the sample. I also estimate the benchmark model with larger samples, where I relax some data requirements. For instance, excluding the channel of the father allows me to take into account the numerous single-mother households and merging the effects of the father and the mother, allows me to include every single-parent household. These larger sample regressions are reported in “Appendix B”.
4.2 Descriptive statistics
Let us now have a closer look at the data. Table 1 presents some univariate descriptive statistics of the sample I use in the econometric models. The different variables are divided into blocks corresponding to their role in the econometric model. The main dependent variable is the education index previously introduced. The index was constructed using the largest sample possible and not only the observations used in the econometric estimation. Therefore, the average value is slightly higher than 1. The same logic applies to the ability measures, which were estimated using all available information.
Variables used in the study
Dependent outcome variable
Years of education divided by the average years of education of the age group
Average log consumption per equivalent adult
Father’s years of education
Mother’s years of education
Child’s ability measure
Exogenous regressors and control variables
Age of the father
Age of the mother
Ability measure for the father
Ability measure for the mother
Dummy variable for indigenous father
Dummy variable for indigenous mother
Father grew up in a urban area
Mother grew up in a urban area
Dummy for girls (=1)
Age in years
Dummy variable for rural areas
Dummy for working activities in 2002
Dummy for working activities in 2005
Number of children below 12 years
Number of teenagers (12–18 years)
Dummy for program Alianza para el campo
Dummy for program Coinversión social
Dummy for program Crédito a la palabra
Dummy for program FONAES
Dummy for program Fondo para la Micro, Pequeña y Mediana Empresa
Dummy for any other assistance program
Dummy for program Programa de empleo temporal
Dummy for program PROCAMPO
Dummy for program VIVAH
Dummy for program Oportunidades
The average and the standard deviation of the fathers’ years of education are slightly higher than for the mothers. The proportion of indigenous parents is around 15 % which corresponds to the national average. The age of parents is measured in years and fathers are slightly older than mothers. About one-quarter of parents grew up in a city. The sample is strongly balanced between girls and boys and also between families living in rural and urban areas. The indicator of rural areas is based on the official definition of rural zones in Mexico and the place of living at the time of the survey. As people might have lived in a different place during their education, I use additional information on migration to correct the variable accordingly.12 Two variables (Work02 and Work05) capture whether the children were working in 2002 and 2005, respectively. The proportion grows from around 17 to 25 %, reflecting the aging of the cohort. The indicators on the number of children and teenagers allow us to control for the composition of the households. On average, there is one child below 12 years and about 1.6 teenagers present in a household. The set of dichotomous program variables captures the beneficiary status of families for different government programs. The proportions of beneficiaries are generally very low, with the exception of Oportunidades and Procampo where the proportion is above 10 %.
More interestingly than the averages of the variables are the relationships among them. I now present some simple linear correlations between important variables. They should provide a good impression of the data and outline some potentially interesting phenomena. On the other hand, they should give us an impression of comparability of the data with data used in other studies. I hope to reduce some concerns regarding the sample selection issues and the definition of some main variables by showing that the descriptive statistics are surprisingly comparable to other studies in the literature.
A first issue that one might discuss regarding the data is the use of Raven’s progressive matrices test as a measure of cognitive ability or even IQ. In the sample, the correlation of the ability measure with the one of the father is 0.363. This value is very close to the 0.347 and 0.38 estimated by Björklund et al. (2010) and Black et al. (2009), respectively, both using more detailed IQ measures. The same correlation with respect to the mother was found to be 0.387, which is slightly higher than the father–son correlation. Considering only the two oldest siblings in a family gives a siblings IQ-correlation of 0.506, which is again relatively close to the values reported by Björklund et al. (2010) who find estimates between 0.473 and 0.510. Interestingly, and giving a first evidence for assortative mating, the spousal IQ-correlation is 0.400. The spousal education-correlation based on the years of schooling is 0.646, which is even higher and supports the idea of an important role of non-random spousal selection.
Intergenerational correlation of education
Secondary and high school
The correlations are substantially higher for the older age group as compared to the younger group.13 Several possible explanations for this can be found. First, the intergenerational transmission is likely to be a cumulative process, thus the older the children become, the larger is the relationship between parental education and the educational outcome of their children. Second, it could also be due to the precision of the education attainment indicator used in this study. The older the children are, the more values the indicator can take and, therefore, the correlations might be estimated with more precision.14
The correlations for the older age group are slightly below the correlation of 0.55 estimated by de Hoyos et al. (2010) for children born between 1972 and 1981 in Mexico. The likely reason for the difference is that de Hoyos et al. (2010) use older individual with finished education. Looking at the difference from the younger to the older age group, it is very likely to end up with similar values as de Hoyos et al. (2010) if we could include older individuals.
Finally, Table 3 gives a comparison between the used and the full sample for the main variables of interest. We can see that the differences are not statistically significant for father’s education and both parental ability measures. For consumption the difference is only significant at the 10 % level. For other variables we observe statistically different means, which is not very surprising with that many observations. However, by looking at the column ‘Diff/SD’ we can see that the difference in terms of standard deviation of the variable never exceeds 0.2; thus they are probably not as problematic as the statistical tests might lead us to think.
Comparison of sample with excluded observations
Ability of the child
Number of children
Number of teenagers
In Sect. 3 I introduced the two approaches to estimate the intergenerational transmission in education. I now present the result following the same structure. First I present the single regression approach where I estimate simple OLS and IV models and then I move to the discussion of the simultaneous equations model.
5.1 Single regression approach
Single equation results (OLS and IV estimates)
Ability of the child\(^+\)
Worked in 2002
Worked in 2005
Number of children
Number of teenagers
Child’s ability instrumented
Parental education instrumented
Hansen J-statistic (p value)
Weak instr. test (statistic)
Weak instr. test (p value)
Endogeneity test (\(\chi ^2\) stat.)
Endogeneity test (p value)
Let us first discuss the main coefficients of the OLS regression, which are all presented with standardized coefficients. The ability of the child has a strong and highly significant effect on the schooling outcome. The direct effect of parental education is also highly significant and positive. The effect of the mother is larger than the one of the father, which is in accordance with the literature. Both indicators for the economic situation of the family display positive and significant effects. Note that this estimation does not directly allow us to conclude about the biological channel, as we do not estimate the link between parental ability and child’s ability. In order to see whether these OLS estimates are reliable, I move now to the discussion of the IV estimates. First, we can see in the models IV-1 and IV-2 that the coefficients of the child’s ability does not change a lot as compared to the OLS estimates. By looking at the endogeneity test based on Baum et al. (2007), we can actually see that the variable is not endogenous and, therefore, instrumenting it is not required. However, this test is only valid under the assumption of valid instruments. To test the validity, I use the Hansen J statistic, which indicates that the instruments are valid.15 Hence, for the ability measure of the child we do not seem to have an endogeneity problem.16 As mentioned earlier, this might be due to a large extent to the nature of the cognitive ability test, which is much less related to education and cultural aspects than other ability measures.
First stage regressions
Father’s ability measure\(^+\)
Mother’s ability measure\(^+\)
Father grew up in a town
Mother grew up in a town
Worked in 2002
Worked in 2005
Number of children
Number of teenagers
Ability of the child\(^+\)
Angrist-Pischke \(p\) value
The Angrist-Pischke F-statistic is very large and suggests that the instruments are strong (Angrist and Pischke 2009). Thus, in terms of the standard test for IV-regression, these estimates seem to be valid. However, there is another problem stemming from the underlying nature of the analysis, which goes beyond the standard challenges of IV. Note that both father’s and mother’s education are correlated with the cognitive ability measure and the place of living of either parents. This is a direct result of assortative mating. It is clear that these correlations are not causal. The consequence of this is that instrumented variables of the two parents very strongly correlate. I computed the predicted education of the father and the mother using the first stage regression and find a correlation of nearly 0.95, while the actual parental education correlation is close to 0.65. Hence, in the main regression, we have a strong problem of multicollinearity, which can explain why the increase in the coefficient related to the father is compensated by the coefficient related to the mother. This problem can also be highlighted with the additional regressions IV-3b and IV-3c reported in Table 4. In these two regressions I excluded one of the two parents and used only the instruments related to the parent included in the regression. We can see that in both regressions the parameter of the parents is highly significant.
Overall, the single equation approach provided very coherent and expected results. The education of the mother seems to matter slightly more than the education of the father. The wealth index seems to matter more than the short-run consumption and the cognitive ability of the child is also an important predictor of the schooling outcome. Finally, the endogeneity tests performed on the IV estimates did not allow us to conclude that we have a serious problem of endogeneity. I now move to the simultaneous equations model which will allow us to learn more about the different channels and their relative importance.
5.2 Simultaneous regression approach
Estimation results of Eq. (6)
Model 1 is the complete model including ability, father’s and mother’s education and the economic situation proxied by two variables. These main regressors were accompanied by control variables such as gender, a rural area dummy, state fixed effects, social program dummies and child labor indicators which are not reported in Table 6. The full estimation results of model 1, including the remaining equations of the model, can be found in Table 11 in “Appendix B”.
Considering model 1, the estimation is quite precise and all coefficients are significant at the 1 %-level. The coefficient related to the child’s ability measure attains with 0.218 the highest value. Both father’s and mother’s education have a highly significant and positive effect. The size of the coefficient for the mother is substantially higher than the one for the father. With respect to the economic variables we can also observe a significant difference between the two. The effect of the wealth index is substantially higher than the one of consumption. This finding is coherent with the findings by Carneiro and Heckman (2002) who argue that the long run economic environment matters more than short-term credit constraints. When considering both economic effects, we see that the economic situation has the largest direct intergenerational effect on the schooling outcome of the child. In general, these results are relatively close to what was found in the OLS regression in Table 4.
Model 1 is estimated on a sample of 4,266 individuals and could potentially suffer from a sample selection bias. As discussed in the data section, I also estimate the full model relaxing some requirements on the data. In a first step, I include single mother households by dropping the channel of the father increasing the sample size to 6,547 individuals. In a second step, I include all households where data are available on either of the parents and taking the maximum value when both are available. This allows us to include 143 additional individuals, because in model 1 some were dropped just because of one missing parental characteristic. These larger sample estimates are reported in Table 11 in the “Appendix B”. The general pattern is very encouraging, as almost no changes in the main regression are observed. Most coefficients increase slightly, but remain at very similar levels. The relative importance of the effects remains unchanged. Overall, these additional regressions give some support on the validity of model 1 since the results hold even when changing the sample a lot. In what follows, I take model 1 as the benchmark model, as it is the only one allowing us to control for all different channels.
5.2.1 Direct versus indirect effects
Let us now return to the discussion of model 1 from Table 6. An interesting feature of simultaneous equation models is that they enable us to compute direct and indirect effects. For example, it is clear that parental education does not only affect the schooling outcome through the direct effect discussed before, but also through the economic situation of the family. Figure 4 shows the direct and indirect effects based on the results of model 1, fully reported in Table 11. As in the discussion before, one can easily see that the ability measure of the child has the largest direct effect (black bars). The wealth index has the second largest direct effect, followed by the mother’s education. However, the total effect of mother’s education is larger than the total effect of the wealth index. This is due to the fact that besides the direct effect we also have an indirect effect of maternal education through both economic indicators. The same is true for the father, where the relative importance of the indirect effect is even bigger. Nevertheless, the total effect of father’s education remains smaller than the one of the mother.
Finally, parental ability has no direct effect but only indirect effects through the genetic transmission and the other two channels. The total effects attain values of about 0.13 for the mother and 0.10 for the father.
5.2.2 Biases when neglecting channels
Models 2–5 in Table 6 only include one of the four possible channels assuming the others to have no impact. The last model includes the often available data on the education of parents and the economic situation but not the ability measures. We can notice that the one-covariate models always give strongly upward biased estimators of the coefficients, when comparing them to the benchmark model in the first column. Not surprisingly, the bias in relative terms is lower for the important channels, namely ability and the wealth index, where the new coefficient is roughly 1.5 times higher than in model 1. The upward bias of parental education is much more important, since the coefficient attains 2–3 times higher values for mother’s and father’s education, respectively. However, the biases become much smaller when all but the ability measure are included. Due to missing information of ability measures in most of the surveys, this setting corresponds to the best we can normally do. The coefficients are about 20 % higher than that in the benchmark model, which is considerably less than in models 2–5. More importantly, the relative importance of the coefficients is very similar in model 6 as compared to model 1.
5.2.3 Regression by age groups
Estimation results of Eq. (6) for different samples
As for the simple correlation, I find differences between the two age groups. In general, the coefficients are slightly higher for the older group. A sharp increase is observed for the effect of mother’s education. The model fit also increases substantially from the younger the older age group. As for the simple correlations presented before, there are several possible explanations. First, it could be argued that this is due to a more precise measure of the dependent variable for the older age group. The second explanation is that the inequalities in education are a cumulative process and that the relative importance of the channels can evolve with the age of the child. Most likely both phenomena are present in these results. The fact that all indicators become more important supports the idea that the measurement is more precise for the older group. However, the fact that not all explanatory variables increase their effect in the same way points to something beyond this argument. In particular, the coefficient of the mother’s education increases substantially more than that of the others. Hence, we might have reasons to believe that the impact of the mother becomes more important with age. This could be due to the role of the mother in pushing the child to continue at school. Of course, additional research is required to confirm this conclusion, because the results could also be driven by the larger precision of the dependent variable.
5.2.4 Regression by gender
Given that mother’s and father’s education have different effects, it might be interesting to see whether the effects are also different for boys and girls. The last two columns in table 7 present model 1 for girls and boys, respectively. We can see that the two economic indicators have slightly higher coefficients for boys while the child’s ability seems to matter a bit more for girls. The education of the father is somewhat more important for boys. A large difference can be observed for the role of mother’s education, which has an almost twice as large effect for girls as compared to boys. The exact reasons for this difference are beyond the scope of this analysis, but it might be a very interesting question for future research.
The present study tries to contribute to the understanding of the mechanisms generating the high intergenerational education correlations observed all over the world and especially in Latin American countries. A particularly important issue is to distinguish the different channels of transmission outlined by the literature over the past years. Using very rich data from Mexico, a simultaneous equations model of the educational transmission can be estimated, allowing me to distinguish between the different channels: the biological transmission of ability, transmission through the economic situation and the education-to-education channel. Additional channels such as health or non-cognitive abilities are not considered in this study. Unfortunately, the data and especially the unavailability of retrospective information on health did not allow me to include such channels. However, these channels might be important as they might upward bias the importance of the included channels, particularly the education-to-education channel. This caveat must be kept in mind when discussing the results.
The results suggest that the economic situation of the family is the most important direct intergenerational channel, which has an even larger effect than the ability of the child when considering the effects of both economic indicators together. I distinguish between consumption as a proxy of the current economic situation and a wealth index to capture the long-term economic situation in the analysis. I find a larger effect of the wealth index, which is in accordance with findings in the literature. Parental education matters to explain children’s schooling but not in a very strong way as the intergenerational correlation might lead us to expect. The mother’s education directly and significantly influences the schooling outcome of children. The education of the fathers also affects the schooling outcome directly and has additionally a strong indirect effect through the economic situation of the family.
The finding that the economic situation plays an important role suggests that the current situation is likely not to be efficient. This is due to the non-optimal investment in education of the poorer children and, therefore, they cannot exploit their potential. On the other hand, the finding is encouraging in the sense that the low educational mobility does not seem to be a fatality. The strong influence of the economic situation of the parent can be targeted by public policies. In this respect, cash transfer programs (conditional or not) might help us to increase social mobility as they allow poorer families to invest more in education. Over the recent years many programs were implemented and it is, therefore, possible that they already generate beneficial effects in terms of educational mobility.
In addition to the main results I also performed the same analysis on sub-samples. These additional estimates provided interesting insights.
First, the intergenerational links are higher for older children. All coefficients increase with the age group and particularly the education of the mother becomes much more important with age. This result might suggest that the intergenerational links are following a cumulative process, suggesting that even at higher ages policy interventions can be useful. However, the differences found for the different age groups could also be due to the more precise measure of the educational attainment for the older age group. Additional research is required to distinguish the two possible explanations found in this study.
Second, I find differences in the relative importance of transmission channels between girls and boys. The economic situation of the family matters slightly more for boys while the ability of the child is somewhat more important for girls. The biggest gender difference is found for maternal education. The effect of maternal education is almost twice as high for girls as compared to boys.
The analysis demonstrates that estimates ignoring important alternative channels of transmission tend to overestimate the effects of the analyzed variables. Remaining unobserved channels such as personality traits could upward bias my results to a certain extent. Nevertheless, the used data does not allow me to consider additional channels as they would imply a large drop in the sample sizes and increase the problem of a non-random sample. Finally, the analysis should be seen as a piece among others in the recent literature aiming at understanding the mechanisms of educational mobility. For future research I see mainly three interesting directions. First, it would be useful to conduct similar analyses for other countries with low educational mobility to see whether the findings hold also outside the Mexican context. Second, the results suggest that cash transfer programs could potentially help us to increase educational mobility. Future research could look at this effect and try to find out more about the most effective specificities of such programs. Third, while most effects were relatively stable across sub-samples, the effect of maternal education changes substantially with age and gender of the child. It would, therefore, be interesting to further investigate the role of the mother in the intergenerational transmission of education.
These variables include the number of children (up to 12 years old) and teenagers (12–18 years old) in the household and a dummy for the first-born child.
The use of parental cognitive ability to instrument parental education could be problematic if the ability measure was influenced by the education. However, as mentioned in the data section, the used RPM test is less education dependent than other measures of cognitive ability. Therefore, I argue that the assumption of no reversed causality might be reasonable. Additionally, I discuss the empirical tests related to the validity of the instruments.
The original variable included more categories to describe the situation outside towns. However, they were relatively unclear and did not provide additional explanatory power. For this reason, I regrouped all non-town answers.
Note that ability refers to cognitive ability and does not include non-cognitive ability. For this reason, I do not include indicator variables for non-cognitive abilities and estimate a latent factor. This choice allows me to focus on the nature and not on the nurture effect.
The original Spanish name is Encuesta Nacional sobre Niveles de Vida de los Hogares (ENNVIH).
For instance, if in one wave the father was younger than the child and in the other wave the difference was plausible, then only the plausible value was taken. However, if there was no plausible value, the observation was dropped.
Note that when plotting the cumulative density function by age, we even find for the youngest children a similar range of values.
Experimental regressions were performed including such variables in order to see if their exclusion alters the results. I discuss these briefly in the “Appendix E”.
Using the official Mexican equivalence scales based on CONEVAL (2008) gives essentially the same results where only the third digit after the comma changes by at most two units. I prefer to follow Rojas (2007) as his definition is concave in the number of people, while the official equivalence scales are not.
Unfortunately, it is not possible to use exclusively the information on the place of living when people where at the age of education, because the variable is measured differently. I correct the variable rural only in cases where people reported that they lived in a city during education and living in rural areas at the time of the survey. The large majority of the individuals (around 90 %) never changed the place of living and, therefore, the information of the place of living at the time of the survey is accurate for the education period as well.
Note that when computing the same correlation for younger children (say 7–11 years), the values are even lower.
This argument is particularly true when considering even younger children at the age of primary school. A previous version of this study included them. The decision to take them out of the study is mainly due to this argument saying that the precision of the education attainment indicator is not sufficient for the youngest individuals.
The null hypothesis of the test is that the instruments are valid.
In the regression IV-2 we could reject the null hypothesis at the 10 % level. I, therefore, performed the endogeneity test on child’s ability only in the model IV-1, where we have clearly valid instruments. The test shows also that child’s ability is not endogenous.
However, the two coefficients are not significantly different from each other (\(p\) value of 0.344 and 0.320 for IV-1 and IV-3, respectively). Hence, they are not contradicting the results found in the OLS regression.
Except the state fixed effects and dummies for beneficiaries of government programs other than Oportunidades.
The covariance matrix is symmetric. For presentational purpose, I only present the upper triangular version.
I only assume uncorrelated error terms for the ability transmission Eq. (1) and the child’s education Eq. (6). This last assumption is confirmed by the IV estimates I will present in Sect. 5.1. Note, however, as I will show below, these restrictions are not required for the identification.
I excluded the paramters of some control variables to save space. They are not required for the identification.
I am grateful for very helpful comments on earlier versions of this paper by Tobias Müller, Jaya Krishnakumar, Marcelo Olarreaga, Dirk Van de gaer, Duncan Thomas and the participants at conferences and seminars in Geneva, Buenos Aires, Bordeaux, Washington DC, and Mexico City and two anonymous referees. A special thanks to Isidro Soloaga for enlightening discussions on the topic and to Ian MacKenzie for correcting the writing of an earlier version.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.