Throughout this analysis, we look at the relationship between intergenerational educational mobility and four outcome variables at age 25/26: the probability of employment, log annual wage, hours worked, and log hourly wage. Employment is a binary variable indicating whether the individual is employed or not. Hours worked is continuous variable indicating how many hours one works during a usual work week. Log wages are continuous variables capturing the natural logarithm of self-assessed gross annual and hourly wages. Naturally, we only observe wages only for those who were working at the time of the data collection and reported wage data. As mentioned before, on average, 81% of the sample work and 82% of those employed report wage data (Table 1). In Table A6 in the Appendix A, we provide a robustness check to investigate any potential estimation bias due to selection to employment and reporting a wage.
We are using observational data and cannot exploit a random or natural experiment to identify the causal effects of being FiF on labor market outcomes. We do not claim that our findings are causal; instead, we aim to decrease the selection bias by using a rich set of control variables, including prior educational attainment to control for ability and compulsory school progression to get closer to the causal impacts of intergenerational educational mobility on labor market outcomes. As robustness checks, we explore quasi-experimental methods, entropy balancing and propensity score matching techniques.
This paper looks at FiF graduates from two angles. First, we look at differences in the labor market outcomes of FiF and non-FiF university graduates. Second, we estimate returns to graduation among those who could have been able to go to university based on their secondary school achievements.
Comparing the labor market outcomes of FiF and non-FiF graduates
We start by examining whether being first in family influence the probability of employment, hours worked, and wages among graduates, conditional on pre-university individual characteristics. Note that being FiF, i.e. parental education, could theoretically have already affected some of these characteristics well before going to university (such as test scores at age 11 and 16), and thus, they might be bad controls (Angrist and Pischke 2008). This would most likely cause a downward bias in terms of the magnitude of the estimated FiF coefficients. To address these concerns, we differentiate between control variables and potential channels of the effects of being FiF on labor market outcomes based on the timing of observation. Individual characteristics observed before university participation are considered as controls and they are included in our main model in Section 5.1, while variables observed after going to university are considered as channels and added to the model in Section 5.2.
We estimate the following linear regression models:
$${y}_{i} = {a}_{1} + {b}_{1} \times \mathrm{ Fi}{\mathrm{F}}_{i} + {c}_{1} \times {X}_{i} + {u}_{1i}$$
(1)
where
- yi:
-
is one of the four outcome variables;
- FiFi:
-
is a binary variable taking the value “1” when neither of the individual’s (step) parents have a university degree;
- Xi:
-
is a vector of pre-university individual characteristics; and.
- u1i:
-
is an error term, robust and clustered by sampling schools.
In the first model, we do not include any control variables besides FiF (model 1). In model 2, we control for whether the individual belongs to the boost sample. Then, following the empirical strategy of Blundell et al. (2005) and Belfield et al. (2018), we control for demographic and family background characteristics (individuals’ age measured in months, ethnicity, fixed effects (FE) for the region of school at age 13/14, whether individuals were born in the UK, and mother’s and father’s age, mother’s and father’s social class, and the number of siblings, all measured when individuals aged 13/14, and lastly, for free school meal (FSM) eligibility in age 15/16), as well as whether individual i belongs to the sample boost added to the survey in wave 4, in model 3.Footnote 5
Lastly, we extend the model with key stage 2 exam score quintiles,Footnote 6 measured at age 11, in math and reading as a proxy for cognitive abilities, and with capped linear GCSE (key stage 4) scoreFootnote 7 quintiles measured at age 16 to control for educational progression in compulsory schooling in model 4. We include the quintiles of test scores instead of their continuous values because it allows us to include a missing category for the proportion of our dataset that did not have the successful link to administrative education data. To make sure that not missing values (or the categorization) drive our results, we provide a robustness check in Table A3 in the Appendix A where we use the scores themselves and apply mean imputation (and a separate missing dummy) for the missing values.
We consider model 4 as our main model. First, we control for the missing values of the explanatory variables using missing flags as mentioned above, except in the case if first in family. The number of missing values of FiF among graduates is eight among men and 10 among women in the total sample of graduates and six and nine, respectively, among those reporting hourly wage. We drop these observations and provide a robustness check showing that not dropping these observations lead our results. In particular, we re-estimate our main results allocating either 0 or 1 to all individuals with missing FiF and show that our results stay similar in Table A5 in the Appendix A. We provide a robustness check where we employ mean imputation for the missing values of the key control variables in Table A3 in the Appendix A. The descriptive statistics of all variables in the models are shown in Table A1 in the Appendix A of Adamecz-Völgyi et al. (2022).
We provide three further robustness checks to these main results in the Appendix. First, we apply two quasi-experimental evaluation methods in Section A4: entropy balancing and propensity score matching. These results confirm that the negative FiF wage gap is robust among women; however, the positive FiF wage gap among men is not (Table A4).
Second, as mentioned before, we do not observe wage data for all individuals. We aim at controlling for selection to employment and reporting wage using a selection model (Heckman 1979) in Table A6 in the Appendix A. While we have to rely on the same control variables that we used before (i.e. no exclusion restriction), we believe that the fact that these models are estimated on the full sample, we still exploit additional information. These results again confirm that the negative FiF wage gap in hourly wages is robust among women; however, the positive FiF wage gap among men is not.
Finally, we apply two methods to look at the potential channels of the estimated relationship between FiF and labor market outcomes. First, we extend the main model (model 4) with a set of university and post-university variables using the same regression framework in Section 5.2. Second, we decompose the raw FiF gaps using a Kitagawa-Blinder-Oaxaca decomposition (Blinder 1973; Oaxaca 1973) and estimate the share of the gap originating from the different distribution of individual characteristics (endowments) across FiF and non-FiF graduates in Section 5.3. This method reveals how large of a share of the gap is the consequence of the different endowments of FiF and non-FiF graduates, and how large of a share remains unexplained. We apply common coefficients estimated from a pooled regression (Neumark 1988); thus, the estimated coefficient of the unexplained gap is identical to the coefficient of FiF in a regression model that pools together the data of the two groups and controls for FiF as well as the same control variables (as model 5 in Table 4). In other words, the unexplained gap in the pooled Oaxaca model is the gap that still remains after controlling for all control variables. The value added of the method compared to a regression is that it shows how large is the relative contribution of each endowment to the raw gap as well as how the returns to these characteristics differ across the two groups in one step.
Estimating returns to graduation
In Section 6, we estimate the returns to graduation for a subsample of Next Steps (including those who did and did not go to university) and look at whether they are heterogeneous by parental graduation. We follow Belfield et al. (2018) and construct a subsample of those who could theoretically have gone to university, i.e. achieved high-enough grades at the GCSE exams at age 16 (at least five A*-C GCSEs). This would have enabled them to pursue A-levels, and therefore university, and should assuage some concerns about the comparability of the control group. We then estimate the following wage models separately by gender:
$${\mathrm{wage}}_{i} = {a}_{2} + {b}_{2} \times {\mathrm{graduate}}_{i} + {c}_{2} \times {X}_{i} + {u}_{2i}$$
(2)
where
- wagei:
-
is log hourly wages,
- graduatei:
-
is a binary variable capturing whether individual i is a university graduate;
- Xi:
-
is a vector of individual characteristics, which in some models includes:
- parents_nodegreei:
-
is a binary variable capturing whether individual i’s parents do not have university degrees;
- FiFi:
-
is the interaction of “parents_nodegree” and “graduate”;
- u2i:
-
is an error term, robust and clustered by sampling schools.
We estimate Eq. (2) using ordinary least squares and sequentially introduce our control variables as before. In model 1, we do not control for any other characteristics than the variables of interest, “graduatei.” In model 2, we add whether the individual belongs to the sample boost added to the survey in wave 4, along with demographic and family background characteristics (age in months, mother’s and father’s social class, region at age 13/14, ethnicity). In model 3, we add pre-university educational attainment (GCSE and A-level raw scores) as well as indicator variables for A-level subjects (math, sciences, social science, humanities, arts, languages, and other), whether attended Level 3 studies, whether obtained vocational qualifications, and whether attended independent secondary school at age 13/14. In model 4, we add potential FiF (i.e. parents without a university degree, non-graduates), and in model 5, we add the interaction term of potential FiF and whether or not the individual obtained a university degree. This allows us to disentangle the effects of an individual’s own graduation from their parents’ educational attainment.