1 Introduction

Overeducation is a widespread phenomenon that characterizes the labour market in developed countries, especially for recent tertiary graduates. It refers to the attainment of an educational level that is higher than the one the current job requires to perform it adequately (i.e. human capital surplus).Footnote 1 It is usually understood as a market phenomenon resulting from the increase in the number of university graduates and the inability of labour demand to absorb them in the short run (Chevalier & Lindley, 2009). The economic consequences of overeducation have been widely documented. Overeducation is negatively related to satisfaction (e.g. Allen & van der Velden, 2001), job productivity (e.g. Duncan & Hoffman, 1981) and workplace labour relations (e.g. Belfield, 2010), and translates into a wage penalty relative to their well-matched peers (e.g. McGuinness & Bennett, 2007). The latter is consistent with assignment theory (Sattinger, 1993) by which wages are determined by workers’ human capital and job characteristics together with potential educational mismatch. Furthermore, mismatched workers experience longer unemployment spells (e.g. Ordine & Rose, 2011). Because of this, the identification of the drivers of overeducation is relevant to policy.

A growing body of research has explored the factors that distinguish overeducated from well-matched workers. Great emphasis has been placed on the effect of regional mobility, both before (e.g. Di Pietro, 2012) and after graduation (e.g. Jauhiainen, 2011), on-the-job training (e.g. Alba-Ramírez, 1993), internships (e.g. Meroni & Vera-Toscano, 2017), field of study (e.g. Barone & Ortiz, 2011) and job search strategies (e.g. Albert & Davía, 2018). However, less attention has been paid to the potential existence of gender differences in overeducation. McGuinness, Bergin, et al. (2018) show that overeducation is higher among females in most European countries. This raises important policy implications for both higher education and labour market institutions. According to the theory of differential qualification (Frank, 1978), gender differences emerge due to females’ higher reluctance to migrate and their higher preference for fields of knowledge with larger overeducation prevalence (Buser et al., 2014). Nonetheless, empirical evidence on this is mixed. Whereas some scholars show females are more likely to be overeducated (Charalambidou & McIntosh, 2021; Di Pietro & Urwin, 2006; McGuinness & Bennett, 2007; Rubb, 2014), others do not find significant differences (Acosta-Ballesteros et al., 2018; Chevalier, 2003; McGoldrick & Robst, 1996). Since the college gender gap has reversed over time (Goldin et al., 2006), it seems relevant to explore whether human capital credentials are differently related to the risk of overeducation by gender.

This paper examines gender differences in the drivers of overeducation. Our research question is whether males and females exhibit different overeducation risks and whether its drivers are different depending on gender. Unlike most studies that compare differences in the likelihood of overeducation between males and females through a dummy variable, ceteris paribus, we assess if personal and background characteristics are differently associated with overeducation based on gender. We study whether pre- and post-graduation mobility, pre- and post-graduation labour experience, on-the-job training, first job search strategies, field of knowledge and English language translate into distinct overeducation likelihoods for males and females. Therefore, the paper contributes to the literature by shedding new light on whether educational credentials are differently valued depending on gender.

While several studies analyse overeducation in the general population (Bar-Haim et al., 2019; Battu & Sloane, 2004; Frei & Souza-Poza, 2012; Jauhiainen, 2011), we focus on a cohort of recent university graduates. Consistent with career mobility theory (Sicherman & Galor, 1990), overeducation is more prevalent among new entrants into the labour market and decreases with age (e.g. Alba-Ramírez, 1993). As individuals get experience and on-the-job training, they move to better matched jobs. As a result, younger cohorts are more exposed to overeducation for the same educational attainment. From this perspective, our paper is close to those by Aina and Pastore (2020), Caroleo and Pastore (2018), Cattani et al. (2018), Chevalier and Lindley (2009), Di Pietro and Urwin (2006), Dolton and Silles (2008) and Ordine and Rose (2009), who study the case of young graduates.

We analyse graduates’ overeducation in Spain, a country that has experienced rapid tertiary education expansion coupled with exhibiting one of the highest incidences of overeducation among young workers (McGuinness, Bergin, et al., 2018). Previous works on overeducation prevalence in Spain include those by Acosta-Ballesteros et al. (2018), Alba-Ramírez (1993), Albert and Davía (2018) and Turmo-Garuz et al. (2019). We use microdata from a representative sample of graduates observed in 2014 that graduated in the academic year 2009–2010. This graduation cohort is relevant for several reasons. Firstly, graduating during a recession translates into earnings decline in the long term, particularly for first entrants (Escalonilla et al., 2021; Oreopoulos et al., 2012). Moreover, educational mismatches between labour demand and supply have been shown to be greater during downturns (Liu et al., 2016). Therefore, young workers who graduated during the Great Recession might have a greater risk of being overeducated. Secondly, overeducation appears to have risen faster among females following the 2008 economic crisis (McGuinness, Bergin, et al., 2018), which highlights the need to study gender differences in overeducation following recessionary shocks. Thirdly, the scars and crowding-out effects from entering the labour market in a recession might be deeper for young graduates in Spain than in other European countries given the peculiarities of its labour market, which has exhibited persistent unemployment rates (Blanchard & Jimeno, 1995), low intensity of on-the-job training (Dolado et al., 2000), and a high incidence of temporary work (Dolado et al., 2002).

Despite being a widespread phenomenon, overeducation does not affect all population groups in the same way. Similar to Albert and Davía (2018) and Carroll and Tani (2015), we distinguish between graduates aged 30 years and younger at the time of the survey. As shown in some studies (e.g. Vera-Toscano & Meroni, 2021), birth cohorts exhibit different overeducation risks because of lifetime outcomes, which can conflate the role of labour experience (McGuinness, 2006). Moreover, by doing separate analyses for the two cohorts, we reduce the heterogeneity in unobservables within each age group.

We evaluate the classic conceptualization of overeducation as excess education based on the self-assessed method (Dolton & Silles, 2008). The statistical approach (Verdugo & Verdugo, 1989) and an overeducation measure based on the earnings penalty in the spirit of Gottschalk and Hansen (2003) are used as robustness checks. Therefore, overskilling and field of study mismatch are beyond the scope of the paper. First, we use matching estimators (Abadie & Imbens, 2006) to compare whether females and males exhibit different overeducation risks conditional on their characteristics and labour market participation propensities. Next, we estimate a Heckman probit regression that deals with self-selection into the labour market considering interaction terms between the drivers of overeducation and the gender dummy.

The remainder of the paper is structured as follows. After this introductory section, a review of the related literature is presented in Sect. 2. The database, definition of the variables and summary statistics are outlined in Sect. 3. We describe the econometric modelling in Sect. 4. The estimation results are shown and discussed in Sect. 5. Finally, Sect. 6 concludes.

2 Literature review

In this section, we first review the economic theories that explain overeducation and then outline the empirical evidence for its drivers. Finally, we discuss the rationale for the existence of gender differences in the overeducation prevalence.

2.1 Conceptual framework

The labour economics literature has proposed different explanations for the overeducation phenomenon, though all are related. Within Becker’s human capital framework (Becker, 1964), overeducation is the result of a temporary mismatch between firms’ technology and the human capital of the labour force that vanishes in equilibrium. However, sustained overeducation could arise as a penalty for the lack of other human capital components such as training, experience or ability. Relatedly, the job mobility theory (Sicherman & Galor, 1990) postulates that workers get into overeducated job positions because at the beginning of their labour career they lack clear signals about their productivity. Workers are thus temporarily employed in jobs with lower education requirements to acquire work experience (on-the-job training) that signals their true productivity and to have greater probability of being promoted. In this viewpoint, overeducation is a short-run stepping-stone phenomenon. Once the signal becomes credible, workers are promoted to better jobs (Sicherman, 1991).

From a different perspective, the job-matching or job-shopping theory (Jovanovic, 1979) postulates that job switching takes place as a search for better quality and matched jobs. Due to the existence of frictions in the labour market, matches are imperfect and so workers need to engage in an on-the-job search in pursuit of a better job (Dolado et al., 2009). An alternative explanation for overeducation is job competition theory (Thurow, 1975), by which workers compete for jobs based on their training costs. Under this framework, overeducation could be the result of the distribution of queues within occupations and the matching between schooling and job-specific requirements (Barnichon & Zylberberg, 2019). When the market demand for qualified jobs is scarce, only a low share of qualified workers is assigned to them, the rest being allocated to jobs requiring comparatively less education. Finally, assignment theory (Sattinger, 1993) considers that workers’ selection into job positions responds to a utility maximization problem that takes into account both monetary and also non-monetary job characteristics. From this viewpoint, workers could accept overeducated job positions in exchange for other job aspects.

2.2 The drivers of overeducation

There are many factors that explain why some individuals are overeducated in their jobs.Footnote 2 We block them into five groups: (i) socioeconomic factors; (ii) credentials and field of study; (iii) geographical mobility; (iv) labour experience and on-the-job training; and (v) job search strategies.

2.2.1 Socioeconomic factors

Several studies have analysed the incidence of educational mismatches based on nationality, focusing on differences among ethnic minorities. These studies typically find that overeducation is higher among non-whites (Battu & Sloane, 2004) and recently arrived migrants (Kifle et al., 2019). Regarding differences by age cohorts, Bar-Haim et al. (2019) document that tertiary education has become more necessary and less sufficient for younger cohorts to survive in a competitive labour market, which implies that younger generations are more exposed to overeducation. When it comes to family responsibilities, some studies have looked at the effect of having children. Empirical evidence on this is unconclusive: whereas some works detect a marginally significant relationship between having children and the risk of overeducation (Barone & Ortiz, 2011; Caroleo & Pastore, 2018; Jauhiainen, 2011; Rubb, 2014), some others do not detect significant effects (Acosta-Ballesteros et al., 2018; Büchel & van Ham, 2003; Devillanova, 2013; Dolton & Silles, 2008).

The family background is another important determinant of educational outcomes and has also been associated with overeducation. Research has shown that those who come from low family backgrounds are more likely to be overeducated (Barone & Ortiz, 2011; Meroni & Vera-Toscano, 2017; Turmo-Garuz et al., 2019). In this sense, the father’s education increases the speed towards a job (Ordine & Rose, 2015) and reduces the likelihood of overeducation (Ordine & Rose, 2009). This is because those who face higher schooling costs underinvest in education (Charlot & Decreuse, 2010). In the absence of valid data about family characteristics, some studies use information on whether the individual enjoyed a scholarship for tertiary education, finding that it is positively related to overeducation (Verhaest & Omey, 2010).Footnote 3 In this vein, evidence by Denning et al. (2018) shows that grants for college students increase the probability of degree completion and later earnings.

2.2.2 Credentials and field of study

Since firms cannot directly observe workers’ abilities, all type of credentials that help signal a worker’s human capital play a role in lowering the overeducation prevalence. Graduating with distinction (high final marks) has been shown to increase weekly earnings after college graduation (Khoo & Ost, 2018) and to reduce the likelihood of overeducation (Albert & Davía, 2018; Devillanova, 2013; Di Pietro & Urwin, 2006; Turmo-Garuz et al., 2019; Verhaest & Omey, 2010). In this sense, Agopsowicz et al. (2020) document that graduates with low grade point averages (GPA) are less likely to hold a college job. The same applies to years of schooling, those graduating later than expected being more exposed to overeducation (Caroleo & Pastore, 2018; Ordine & Rose, 2009). Another relevant credential is knowledge of a second language, especially English. Albert and Davía (2018) find that a good level of English protects Spanish graduates from overeducation.

Universities and vocational programmes also play a role in students’ acquisition of human capital. Ordine and Rose (2009, 2011) point to universities instructional quality as a major determinant of overeducation. Their public or private character might be important due to their different financial resources. Additionally, Verhaest et al. (2018) point out that programmes that combine theoretical knowledge with specific workplace learning provide new labour entrants with qualifications that are immediately usable, thereby reducing educational mismatches.

Another issue that has attracted attention is whether overeducation is more concentrated in some academic fields. Barone and Ortiz (2011), Dolton and Silles (2008), Meroni and Vera-Toscano (2017), and Turmo-Garuz et al. (2019) find that graduates from humanistic areas are at a higher risk of being overeducated. In contrast, graduating from scientific, health and technical disciplines protects workers from overeducation (Acosta-Ballesteros et al., 2018; Aina & Pastore, 2020). This has been explained by the fact that technical fields of knowledge may put more emphasis in their curricula on the acquisition of specific skills (Reimer et al., 2008). Nevertheless, skills obsolescence also matters, since fast-changing occupations like STEM fields continually demand new abilities (Deming & Noray, 2020).

2.2.3 Labour experience and on-the-job training

According to the job mobility theory, on-the-job training and internships help workers both to signal their productivity levels and to invest in occupation-specific human capital (Barron et al., 1989), so that overeducation prevalence should smoothly reduce with labour experience. However, empirical evidence on this is inconclusive. On the one hand, some scholars present evidence that the lack of experience of new labour market entrants makes them more exposed to overeducation (e.g. Sloane et al., 1996) and that experience is negatively related to the risk of overeducation (Kiker et al., 1997; Robst, 2008). On the other hand, other studies do not find significant differences (Dolton & Silles, 2008; Verhaest & Omey, 2010).

2.2.4 Geographical mobility

There is inconclusive evidence on the career benefits of transnational educational mobility (i.e. stays abroad during the completion of studies). A well-known example of an exchange programme is Erasmus. On the one hand, internationally mobile students exhibit better transitions to employment (e.g. Di Pietro, 2012) and achieve higher incomes after graduation (e.g. Kratz & Netz, 2018). This is due to the acquisition of soft skills related to cultural understanding, open-mindedness and sociability during international experiences (Crossman & Clarke, 2010). These social skills are highly valued in the labour market (e.g. Deming, 2017). What is more, those who engaged in Erasmus are more likely to move to other regions to find a better job match (Krabel & Flöther, 2014) and to work abroad later in life (Oosterbeek & Webbink, 2011; Parey & Waldinger, 2010). In this sense, Albert and Davía (2018) find that having studied abroad is negatively related to overeducation. However, other studies find that individuals who stayed abroad need more time to find a first job after graduation and are more likely to be overeducated (e.g. Wiers-Jenssen & Try, 2005), possibly due to problems with human capital transferability to local labour markets.

Since there are regional differences in the incidence of overeducation (Lenton, 2012), one way to overcome labour demand and supply mismatches is through mobility, either domestically or abroad. Jauhiainen (2011) shows that long-distance migration reduces the probability of overeducation. However, evidence on this is not robust. Devillanova (2013) finds that short distance mobility is negatively correlated with overeducation but emphasizes the need to control for workers’ characteristics. In this respect, a large body of research finds that immigrants and minority ethnic groups are more likely to be overeducated (Beckhusen et al., 2013). As such, mobility increases the likelihood of migrant graduates being matched but at the cost of higher overeducation exposure in the short run.

Another relevant factor is past job turnover. In line with the job-matching theory (Jovanovic, 1979), overeducated workers are those with shorter average job durations and a higher turnover (Alba-Ramírez, 1993). Rubb (2013) finds that it might be optimal for overeducated workers to engage in firm-switching behaviour for occupational mobility. Similar results are reported by Romanov et al. (2017), who show that the number of employers in the past three years is negatively related to the likelihood of overeducation.

2.2.5 Job search strategies

Although some studies show that overeducation is transient and is the first step to climb the occupational ladder (Frei & Souza-Poza, 2012), others find that being overeducated in the first job positively influences being mismatched in subsequent occupations (Acosta-Ballesteros et al., 2018; Baert et al., 2013; Meroni & Vera-Toscano, 2017). Therefore, first employment characteristics can exert an impact on the long-run prevalence of overeducation. In light of this evidence, another stream of research has explored the role of universities and third-party job placement institutions as intermediaries that connect recent graduates with firms. Carroll and Tani (2015) and McGuinness et al. (2016) report that job placement assistance by education institutions reduces overeducation incidence in graduates’ first jobs. Concerning the use of the internet for getting the first job, these authors show it is positively related to overeducation. Furthermore, old work networks have been shown to improve labour market opportunities for newly entrants (Simon & Warner, 1992). Albert and Davía (2018) report that those who resort to temporary work agencies have a higher risk of underemployment and overeducation. The same finding holds for mass media and the internet. However, informal job search methods such as contacting employers directly or getting a job through friends and relatives are not related to skills’ underutilization.

2.3 Gender differences

The literature has explored gender differences in overeducation with mixed findings. Whereas most studies show that females are more likely to be overeducated (Barone & Ortiz, 2011; Charalambidou & McIntosh, 2021; Di Pietro & Urwin, 2006; Jauhiainen, 2011; Meroni & Vera-Toscano, 2017; Ordine & Rose, 2009; Rubb, 2014), others do not find significant differences (Acosta-Ballesteros et al., 2018; Aina & Pastore, 2020; Caroleo & Pastore, 2018; Chevalier, 2003; Devillanova, 2013; Dolton & Silles, 2008; Kiker et al., 1997). One argued reason for a gender gap in overeducation is the theory of differential overqualification (Frank, 1978). According to this theory, females are more exposed to overeducation because they are less mobile due to family responsibilities. However, McGoldrick and Robst (1996) do not find evidence supporting this.

Another source of gender differences could stem from differences in career-prospects. Robst (2007) argues that males and females have different reasons for accepting a mismatched job position, and this directly relates to career expectations and preferences. Redmond and McGuinness (2019) document that females prefer jobs that are closer to home and offers job security, whereas males are more motivated by financial gains and promotion. A large body of literature has documented gender differences in risk-taking behaviour (e.g. Jetter & Watter, 2020), competitiveness (e.g. Gneezy et al., 2003) and ambition (e.g. Chevalier, 2007) in favour of males. This has been thought to explain the gender wage gap (e.g. Le et al., 2011), deterrence factors in relation to the job entry decision for females (e.g. Flory et al., 2015) and occupational choice (e.g. Kleinjans, 2009).

Furthermore, occupational sorting and gender segregation might contribute to explaining potential gender differences in overeducation. Females continue to be underrepresented in STEM fields and prestigious academic tracks (Buser et al., 2014). Recent evidence by Barone and Assirelli (2020) points out that early curricular track choices affected by peers’ preferences strongly impact the subsequent choice of field of study. Females’ engagement in STEM is affected by the share of females that graduate from STEM fields (Griffith, 2010). Moreover, females are more likely to leave the STEM field for majors that are less competitive (Astorne-Figari & Speer, 2019) and to prioritize the work-life balance (Jiang, 2021).

Finally, gender differences in overeducation could arise due to taste-based or statistical discrimination by employers (Guryan & Charles, 2013). Taste-based discrimination refers to the employer consciously prioritizing offering a matched job to a male due to biased preferences, even though they have the same human capital. Statistical discrimination implies that statistical information about the group to which the worker belongs is used to infer his/her unobserved productivity. For instance, Deming et al. (2016) show that employers strongly prefer applicants with degrees from public institutions because they negatively value credentials from for-profit institutions. Oreopoulos (2011) reports that job applicants with foreign names and backgrounds are much less likely to receive a call back for a job interview. Additionally, gender differences could be explained by implicit discrimination (e.g. Bertrand et al., 2005), by which employers hold unconscious negative attitudes towards the capacity of females, especially in certain job positions. This relates to the existence of stereotypes, which shape beliefs about the capacities of others in different domains (Bordalo et al., 2019).

3 Data

3.1 Database

Our analysis is based on the 2014 Labour Insertion of University Graduates Survey conducted by the Spanish National Statistics Institute. This dataset is representative of all graduates in the Spanish university system during the 2009/2010 academic year. The data were collected between September 2014 and February 2015, leaving a minimum period of four years since the completion of university studies. The sample has a cross-sectional structure and consists of 30,379 observations.

As mentioned, this graduation cohort is of particular relevance because university graduates that enter the labour market in the middle of the Great Recession are the most exposed to overeducation in their jobs. Overeducation has been argued to arise due to mismatches between the labour supply and demand for educated workers (O’Leary & Sloane, 2016). During economic downturns, highly educated workers compete for a lower number of qualified job positions through underbidding the wages they demand (Leuven & Oosterbeek, 2011), thereby making new entrants during recessions more exposed to suffer overeducation in their jobs.Footnote 4 In this regard, several studies have shown that the poor performance of the youth labour market in Spain in terms of overeducation and high unemployment is partially due to highly educated workers crowding-out lower educated ones because of the excess supply of college graduates (Dolado et al., 2000, 2009).

We follow Albert and Davía (2018) and Carroll and Tani (2015) and split the sample between graduates aged ‘less than’ and ‘equal or greater than’ 30 years at the time of the survey.Footnote 5 They are denoted ‘less than’ and ‘over 30’ throughout the paper, those exactly 30 years old being included in the latter group. Consistent with the theory of career mobility (Sicherman & Galor, 1990), the literature typically finds that older workers are less likely to be overeducated (Rubb, 2003). Beyond this, recent studies like that by Bar-Haim et al. (2019) show that younger cohorts face greater labour market competition, plausibly due to a gradual upgrading of schooling requirements. Accordingly, we conduct separate analyses to inspect whether overeducation prevalence differs across birth cohorts.Footnote 6

3.2 Measuring overeducation

There is debate in the literature about the appropriate way to measure overeducation (Verhaest & Omey, 2006). Given the worker’s educational level, the main problem is how to define the required education to perform a certain job. Three alternatives have been proposed.

  1. 1.

    Job analysis (JA): This procedure consists of developing ‘dictionaries’ of the different occupations by which experts value the education required for each job (Rumberger, 1987). Studies that rely on the JA approach include those by Acosta-Ballesteros et al. (2018), Baert et al. (2013) and Sellami et al. (2017). However, apart from being expensive and demanding, this approach is likely to become obsolete if the dictionary is not frequently updated (Hartog, 2000; McGuinness, 2006), especially in those occupations that are related to new technologies. Moreover, the same job occupation might have different educational requirements across industries and firms.

  2. 2.

    Individual self-assessment (ISA): Workers are directly asked to report the necessary education level ‘to do’ their job. Obviously, this method has the drawback of being self-reported, which implies that respondents might consider their own expectations and disappointments (Cattani et al., 2018; Verhaest & Omey, 2010). As discussed in Hartog (2000), individuals might exhibit a tendency to overstate the job requirements to upgrade the status of their position. Moreover, this method could be subject to qualification inflation in firms’ hiring strategies in the case of newly hired workers (Barone & Ortiz, 2011). Nonetheless, it is the most used (Alba-Ramírez, 1993; Chevalier, 2003; Dolton & Silles, 2008; Ordine & Rose, 2009).Footnote 7

  3. 3.

    The statistical approach (SA): This method defines overeducation by comparing the worker’s educational level with the mean (or mode) value within a given occupation. Some examples include Charalambidou and McIntosh (2021), Kiker et al. (1997) or Verdugo and Verdugo (1989). Although it is easy to implement, the SA merely reflects the credentials of all workers in a job position. Therefore, it can gather the education level ‘to get’ the job rather than ‘to do’ it (McGuinness, Pouliakas, et al., 2018). Other flaws include the aggregation of broad occupational groups and the fact that occupation averages might reflect historical entry requirements. Furthermore, because overeducation within occupations is the result of the interaction between demand and supply forces, this method does not reflect job requirements only (Leuven & Oosterbeek, 2011).Footnote 8

The three approaches have advantages and weaknesses. Verhaest and Omey (2010) examine how the effect of the determinants of overeducation is sensitive to the measure used. They show that objective and subjective assessments can lead to different results, but the JA does not result in more reliable outcomes than the ISA. A similar discussion is provided in Capsada-Munsech (2019) and McGuinness (2006). The use of one or another typically depends on the information available. Nevertheless, there is wider acceptance of the ISA, as illustrated by Capsada-Munsech (2019). The reason is that it measures ‘genuine’ overeducation (Chevalier, 2003) rather than ‘apparent’ one because it exploits individual perceptions about the educational requirements to perform the job.

3.3 The self-assessment approach (ISA)

In the survey, respondents are asked the following:

Which do you think it is the most suitable educational level to perform your current job?

This question mimics the one implemented in Alba-Ramírez (1993) and Dolton and Silles (2008) for assessing self-reported overeducation in Spain and the UK, respectively. It is also very close to the one implemented in the REFLEX survey (Barone & Ortiz, 2011; Capsada-Munsech, 2019). This question captures educational requirements ‘to do’ the job. This is different from the job recruitment standards used in Duncan and Hoffman (1981) and collected in the PIAAC survey, which could relate to screening issues. Answers to this question can be any of the following: (a) doctorate, (b) university degree, (c) higher national diploma, (d) certificate of higher education, or (e) compulsory education. This variable is denoted by EDUC_REQ.

To create an indicator of educational mismatch, we define a variable that compares the self-assessed educational level to perform the job adequately (EDUC_REQ) with actual educational level (EDUC_LEVEL) as follows:

$$EDUC\_M=\left\{\begin{array}{l}1\\ 0\\ -1\end{array}\right.\begin{array}{l}if\,\,EDUC\_REQ<EDUC\_LEVEL\\ if\,\, EDUC\_REQ=EDUC\_LEVEL\\ EDUC\_REQ>EDUC\_LEVEL\end{array}$$

Positive educational mismatch (overeducation) emerges when the required level of education is lower than that attained. Conversely, there is negative educational mismatch (undereducation) when the required education is higher than that attained. The latter case (\(EDUC\_M=-1\)) is not considered here since the share of individuals in this situation is very low (1.4%). These graduates are excluded, and so our final sample comprises 26,807 observations. Our dependent variable is therefore a dummy (OVEREDUC) that takes value 1 if EDUC_M > 0 and 0 otherwise. Nonetheless, in subsection 5.4 we use alternative dependent variables for robustness: (i) the degree of overeducation through an ordered indicator (labelled OVEREDUC_DEGREE, see Supplementary Material, Section A), (ii) the statistical approach (labelled OVEREDUC_SA, see Supplementary Material, Section F) and (iii) an alternative overeducation indicator based on wages differences within occupations (labelled OVEREDUC_EARNINGS, see Supplementary Material Section G) in the spirit of Gottschalk and Hansen (2003).

Our definition of overeducation has an important advantage over other self-assessed methods used in the related literature. For instance, unlike the European Community Household Panel that asks workers to directly indicate if they feel they are qualified to do a more demanding job, overeducation is defined here as a comparison between the reported educational level required to perform the job and the one the respondent holds. That is, as opposed to directly asking whether they feel they are overeducated in their job, the elicitation is more indirect and minimizes the risk of misinterpretation as discussed in Capsada-Munsech (2019).

3.4 Explanatory variables

Here we describe the explanatory variables to be used in our analysis.

  • Sociodemographic characteristics (SocDem): To explore the potential gender gap in overeducation, we initially include a dummy variable for females (FEMALE). We also consider a dummy for whether the graduate has Spanish nationality (SPANISH).

  • University studies characteristics (Univ): We include a dummy for whether the university from which the individual graduated is public (PUB_UNI). We also take into account the field of knowledge to which they belong: sciences (SCIENCES), arts and humanities (ARTS), engineering and architecture (ENGINEERING), and health sciences (HEALTH). Social sciences (SOCIALS) is the base category. Since the data include university graduates from academic degrees with different lengths, we also control for the years of tertiary schooling through the variable STU_LENGTH.Footnote 9 Additionally, to proxy graduates’ marks, we define two dummy variables relative to the attainment of grants during their university stage: (i) a general scholarship for university students (GRANT_STU); and (ii) a scholarship of excellence for those with the highest marks (GRANT_EXC). Since the latter is only awarded to those who hold outstanding grades, this variable partially captures ability in the form of cognitive intelligence and other soft skills like diligence.Footnote 10

  • Additional studies (AddStu): The question about the required level of education to perform the job (EDUC_REQ) does not distinguish between types of university certificate (graduate, Master’s, etc.). It might happen that those with a Master’s degree declare themselves overeducated because their education level is higher. To control for this, we include a dummy for whether the individual has completed a Master’s degree (MASTER). In the same vein, we consider a dummy for whether the graduate has other vocational training courses (VOC_TRAINING) apart from university studies, and whether they are currently studying (CONT_STUDY).

  • English language (English): A good level of English is expected to protect graduates against overeducation. To proxy it, we define two dummy variables for whether the individual has (i) a certificate in English from Cambridge (CAMB_CERTIFICATE) or (ii) a certificate in English from the Official Language School (OLS_CERTIFICATE).Footnote 11

  • Pre-graduation work experience (PregradExp): Other variables related to work experience prior to graduating are also included. In particular, we consider dummies for whether the respondent (i) worked while she was studying (WORK_STU); (ii) performed work practice before graduation (i.e., internships as part of the study program, denoted by CURRIC_PRACT); and (iii) performed work practice by the end of her university studies (i.e., non-curricular practices, labelled NONCURRIC_PRACT).

  • Pre-graduation mobility (PregradMob): To control for the completion of stays in other universities, we define two dummy variables: NAT_STAY for staying at another Spanish university, and ERASMUS if the graduate participated in an Erasmus mobility programme.

  • Post-graduation work experience (PostgradExp): We include a dummy for whether the graduate has been working for at least two years since graduation (EXP_2YEARS). As a proxy for job turnover, we also add a variable for the number of employers she has had (N_EMPLOYERS). Implicitly, the latter variable gathers whether the graduate remains in her first job.

  • Post-graduation mobility (PostgradMob): To control for graduates’ mobility after graduation, we define LAB_MOB if the individual moved to another Spanish region for labour reasons, and ABROAD_RESIDENCE if the graduate has lived abroad.

  • Job search strategies (JSStrat): We consider information on how graduates looked for their first job. Specifically, we add four binary variables for whether the respondent resorted to: (i) newspaper or internet advertisements (JS_ADV), (ii) temporary employment agencies (JS_TEA), (iii) university platforms (JS_UNI) or (iv) work or family networks (JS_NETWORKS). The base category involves other methods like self-employment or public competition.

  • Job rejection (JobRej): To control for heterogeneity in career prospects, we define a binary variable for whether the respondent has rejected a job because they considered it unsuitable (REJECT_JOB).

  • Regional fixed effects (RegFE): Variations in overeducation prevalence might be affected by regional factors like the size of the regional labour market (Büchel & van Ham, 2003; McGuinness, 2006). Indeed, McGuinness, Bergin, et al. (2018) point out that gender differences in overeducation across Europe relate to country-specific dimensions. To capture this, we include a full set of regional fixed effects (autonomous community).

3.5 Summary statistics

Summary statistics of the variables are presented in Table 1, for the whole sample and separately for males and females. The last column reports a t test for the mean equality. In general, graduates’ sample characteristics differ by gender. About 77 per cent of females are working compared to 81 per cent of males. Conditional on working, 24 per cent of male and 27 per cent of female graduates are over-educated in their jobs.Footnote 12 This difference is statistically significant, suggesting that females are more overeducated in the sample. As for age, while only 49 per cent of males are below 30, in the case of females this percentage is 64 per cent.Footnote 13 Likewise, we observe significant differences in the gender composition by field of study. Males are mainly concentrated in engineering (40%) and social sciences (36%). In contrast, there is a higher percentage of females in social sciences (50%) and health (17%). About 35 per cent of graduates have benefited from a general scholarship during their studies. However, only 6 per cent have achieved a scholarship of excellence.

Table 1 Descriptive statistics

The average length of the university studies is around 4 years. Around 33 per cent have completed a Master's degree and 13 per cent have additional vocational training studies. Interestingly, the share of females who have a certificate in English from the Official Language School is significantly greater than that of males (22% vs 17%). In contrast, the share of graduates with a certificate in English from Cambridge is the same for box sexes (29%). Pre-graduation educational mobility is quite low: 11 per cent of males and 8 per cent of females in the case of stays at another Spanish university and 9 per cent of males and 8 per cent of females took part in an Erasmus program. By contrast, post-graduation labour mobility is somewhat larger (16%). Regarding work experience, about 62 per cent of graduates combined studying and working, with 69 per cent having more than two years of labour experience. The average number of past employers is almost three. Remarkably, 71 per cent of females undertook curricular practices compared to 54 per cent of males. The most used job search strategies for first employment are work/family networks (37%) and job advertisements (32%). Finally, 36 per cent of males and 28% of females rejected a job because they considered it unsuitable.

Table 2 distinguishes the sample sizes, proportion of employed graduates and overeducation prevalence for males and females separately by age cohort. Employability rates are higher among males, with almost no differences between those aged less than 30 and over 30 in the case of females. Overeducation is four percentage points higher among graduates over 30 for both sexes (29% vs 25% for females and 26% vs 22% for males). Since all the sample graduated in 2010, this result could reflect that those over 30 graduated later than expected, which is associated with greater overeducation exposure (Caroleo & Pastore, 2018; Ordine & Rose, 2009).

Table 2 Summary statistics of dependent variables by gender and age cohort

4 Empirical strategy

4.1 Matching estimators

To evaluate the existence of gender differences in overeducation, we first use matching estimators based on covariates (Abadie & Imbens, 2006). Males and females are likely to exhibit distinct labour market participation propensities because of different reasons, which might induce selection effects into the analysis of their different overeducation risks. To deal with this, we match employed and non-employed graduates by the time of the survey based on characteristics using propensity score matching (PSM). In doing so, we impose the common support condition. Next, we implement Inverse Probability Weighting Regression Adjustment (IPWRA), which consists of running separate regressions for males and females by reweighting on their labour market participation propensity scores.Footnote 14 In this way, we compute the average gender difference in conditional-on-covariates overeducation prevalence while considering males and females’ potential differences in employability based on their endowment of observed characteristics. This is done both for the pooled sample and separately for the less than 30 and over 30 cohorts.

4.2 Heckman probit

Having analysed whether males and females have a different average prevalence of being overeducated, the subsequent step is to examine the factors that could drive it. Therefore, we move to regression analysis that studies the differences in the drivers of overeducation by gender.

Overeducation is only observed for the subsample that works. Individuals are likely to self-select into the labour market based on observed and unobserved characteristics. If the unobservable characteristics that drive participation correlate with the error term in the overeducation equation, then the estimates from the probit suffer from selectivity bias. Accordingly, we propose a probit model with sample selection (Heckman probit), as done by Büchel and van Ham (2003), Jauhiainen (2011), Ordine and Rose (2009), and Rubb (2014). This model comprises two equations that are jointly estimated: a binary probit for the probability of working (selection equation) and a probit model for OVEREDUC for those who participate in the labour market (outcome equation).

Because our purpose is to examine whether human capital credentials are differently valued by gender, in both equations we consider interaction terms between the explanatory variables and the dummy for being a female. In this way, the model allows for gender heterogeneity in the determinants of the labour participation decision and the overeducation prevalence.Footnote 15

The model structure of the Heckman probit is:

  1. 1.

    Selection equation:

    $${WORK}_{i}^{*}={\gamma }_{1}{Z}_{i}+ {\gamma }_{2}{Z}_{i} x {FEMALE}_{i}+ {v}_{i}$$
    (1)

    where \({WORK}_{i}^{*}\) is the latent utility from the market perspective of hiring individual i, \({Z}_{i}\) is a vector of exogenous explanatory variables, \({\gamma }_{1}\) and \({\gamma }_{2}\) are vectors of coefficients to be estimated and \({v}_{i}\) is a random error term.

  2. 2.

    Outcome equation:

    $${OVEREDUC}_{i}^{*}= {\beta }_{1}{X}_{i}+{\beta }_{2}{X}_{i} x {FEMALE}_{i}+ {u}_{i}$$
    (2)

    where \({OVEREDUC}_{i}^{*}\) is a latent variable that measures the gap between an individual’s level of education and the required level of education for the job, \({X}_{i}\) is a vector of exogenous explanatory variables, \({\beta }_{1}\) and \({\beta }_{2}\) are vectors of coefficients to be estimated and \({u}_{i}\) is a random error term.

The selection issue is considered by allowing the error terms \({u}_{i}\) and \({v}_{i}\) to be correlated so that:

$${v}_{i} \sim N\left(0, 1\right)$$
$${u}_{i} \sim N\left(0, 1\right)$$
$$Corr (u_{i} , v_{i} ) = \rho_{w,OV}$$
(3)

The correlation term \({\rho }_{w,OV}\) will capture any omitted factor like unobserved ability that impacts employability and overeducation risk. The greater its magnitude, the larger the weight of shared unobservables.

The model is estimated by maximum likelihood. We consider all the explanatory variables introduced before (SocDem, Univ, AddStu, English, PregradExp, PregradMob, PostgradExp, PostgradMob, JSStrat, JobRej and RegFE) in both equations. To minimize omitted variable bias, scholars recommend including a wide set of controls for worker heterogeneity (McGuinness, 2006). For identification, it is convenient that \({Z}_{i}\) contains at least one variable not included in \({X}_{i}\) (exclusion restriction) (Puhani, 2000). We use the regional employment rate that corresponds to the respondent’s gender as our exclusion restriction (EMP.RATE). This variable is retrieved from the Spanish National Statistics Institute for the year 2014 and complemented by Eurostat for the low share of graduates that live abroad (5.53% of the sample).Footnote 16 Büchel and van Ham (2003) document that regional unemployment rate does not impact overeducation but affects employment probabilities. Similar findings are reported in Charalambidou and McIntosh (2021). Moreover, Jauhiainen (2011) uses the unemployment rate as the exclusion restriction in a similar Heckman probit regression for modelling overeducation likelihood. We, instead, use the employment rate for such purpose. The rationale is that with an increasing rate of employed people over the working age population, the probability of being working increases, plausibly through lower inactivity rates. At the same time, the extent to which available labour resources are being used is predicted to be unrelated to the risk of overeducation since it does not hinder (at least directly, given other controls) the possibility of finding a matched job.Footnote 17 Auxiliary regressions (Table A4 in the Supplementary Material) support that EMP.RATE is strongly correlated with employability but uncorrelated with overeducation conditional on covariates. Therefore, this variable can be considered a valid exclusion restriction.

5 Results

5.1 Gender differences in overeducation prevalence based on matching

Table 3 presents the average difference in overeducation risk between females and males using inverse probability weighting regression adjustment, for the whole sample and separately for graduates under and over 30 years of age. That is, the gender average gap in overeducation prevalence is computed while reweighting observations by the employment propensity scores after PSM. Tables A5 and A6 in Supplementary Material report the number of observations per propensity block. Plots for overlapping in the propensity scores by employment status are shown in Figures A5 and A6, Supplementary Material. These diagnostic analyses suggest that covariates are sufficiently balanced, thereby allowing an appropriate comparison.

Table 3 Average differences in overeducation likelihood between females and males using Inverse probability weighting regression adjustment (IPWRA)

We find no significant differences in the probability of being overeducated between males and females conditional on their characteristics and their employment propensities. This is sustained for the pooled sample and for the two age cohorts considered. Therefore, it seems the gender difference in overeducation detected in descriptive statistics could be the result of differences in the sample composition of workers (self-selection into the labour market) and job matches based on characteristics. In the following subsection, we move to the Heckman probit regression with interactions to inspect whether the predictors of overeducation exert significantly distinct effects by gender.

5.2 The drivers of overeducation

Tables 4 and 5 report the parameter estimates and robust standard errors for the Heckman probit model with interaction terms specified in Columns (1–3) for the less than 30 and over 30 age groups. Since the specification contains many interaction terms and the coefficients do not have a direct interpretation, we also present the average marginal effects (AME) for each equation. We focus first on the overall results for the drivers of overeducation; then we come back to gender differences.

Table 4 SE-Probit model parameter estimates and AME for less than 30 cohort
Table 5 SE-Probit model parameter estimates and AME for over 30 cohort

Starting with the selection equation, the likelihood of being employed is not significantly different by gender for the over 30 cohort, everything else being equal. However, females under 30 are significantly more likely to be working (+ 5.4%). For both cohorts, labour market participation is positively associated with the employment rate at the place of residence and graduating from health sciences (relative to social sciences). In contrast, graduates from sciences and arts are significantly less likely to work. Labour participation is positively associated with holding a certificate in English, the length of the university studies, pre-graduation mobility through an Erasmus programme, and post-graduation mobility in terms of moving to another region. Similarly, the working likelihood is higher among those who worked while studying, those with two or more years of labour experience, and those who have rejected a job in the past. Conversely, having lived abroad after graduation, the number of past employers and currently studying are negatively associated with labour participation.

For the less than 30 cohort, employability is also more prevalent among those who earned an excellence grant (+ 3.5%) and graduates in engineering (+ 5.1%). On the contrary, individuals who accessed their first job through a temporary employment agency (− 3.6%) or through the intermediation of the university (− 1.4%) are less likely to be working. The same applies to having received a general grant for completing their university studies (− 1.3%).

Moving to the outcome equation, there are no gender differences in the prevalence of overeducation for both cohorts, ceteris paribus. This is consistent with the results from the matching estimators presented in Table 3. Males and females appear not to have different probabilities of being overeducated conditional on their characteristics, in line with Acosta-Ballesteros et al. (2018). Nonetheless, we explore this in detail in the following subsection. We find that graduates from public universities are more likely to be overeducated (+ 2.5% for less than 30 and + 11.5% for over 30, respectively). In contrast, overeducation is unrelated to being Spanish.

Overeducation is less prevalent among graduates in the fields of health (− 25% and − 24.5%), engineering (− 9.1% and − 9.3%) and sciences (− 6.6%, only for less than 30 cohort). However, graduates in the arts are the most likely to be overeducated (+ 6%, only for less than 30 cohort). This is consistent with Aina and Pastore (2020), Barone and Ortiz (2011), Meroni and Vera-Toscano (2017) and Turmo-Garuz et al. (2019). Something similar happens with those who received a grant for completing their university studies (+ 4.3% and + 6.7%) and those who have a vocational training certificate (+ 12% and + 8.3%). The former can reflect a lower family background (Verhaest & Omey, 2010) whereas the latter might imply that, even though they hold a university degree, those with additional vocational training studies find it easier to work in occupations that only demand such formation. Having worked while studying also increases overeducation prevalence, but only for graduates under 30 (+ 2.4%). Conversely, the likelihood of overeducation is negatively related with holding a Master’s (− 2.7% and − 5.7%), the length of tertiary studies (− 2.5% and − 2.6% per year), currently studying (− 5.1% and − 2.9%) and having earned an excellence grant (− 4.6% and − 5.7%), which can be understood as a proxy of innate ability. This confirms earlier results by Devillanova (2013), Di Pietro and Urwin (2006), Verhaest and Omey (2010), and suggests that the greater the accumulation of human capital, the lower the likelihood of overeducation. We also find that having a certificate in English, either from the OLS (− 3.8% and − 3.3%) or from Cambridge (− 7.1% and − 2.7%), significantly protects Spanish graduates from overeducation, in line with Albert and Davía (2018).

Concerning pre-graduation mobility, having participated in the Erasmus programme reduces the probability of being overeducated, but only for graduates under 30 (− 4.3%). Curiously, completing part of their university studies at another Spanish university reduces overeducation only for the over 30 cohort (− 3.6%). As discussed before, this could be explained by the acquisition of soft skills and personal independence during the stays. The same happens with having moved to another Spanish region to look for a job (− 7% and − 4.4%). Therefore, national mobility helps to reduce the incidence of educational mismatch, in line with Devillanova (2013). Similarly, we do not find evidence that internships within the study programme (curricular practices) reduce the likelihood of being overeducated. This could indicate that internship programmes offered by Spanish universities are not effective at providing graduates with the necessary on-the-job training to get matched jobs. However, non-curricular practices do reduce the likelihood of overeducation for the less than 30 cohort (− 2%). Having at least two years of work experience also reduces the incidence of education mismatch (− 6.2% and − 5.6%), in line with Kiker et al. (1997) and Robst (2008).

We also document that those with a larger number of past employers are more likely to be overeducated in their jobs (+ 1.2% and + 0.8%). This contrasts with Romanov et al. (2017) and Rubb (2013), who suggest that firm-switching reduces the likelihood of overeducation. Our findings could be explained by the particularities of our study period. Borgna et al. (2019) show that in the aftermath of the 2008 economic crisis, overeducation risks were higher among those who experienced job mobility after the outbreak of the crisis. Regarding job search strategies, looking for the first job through advertisements on the internet or in newspapers (+ 3.4%, only for the less than 30 cohort), temporal employment agencies (+ 7.9% and + 10.3%) or work/family networks (+ 3.6% and 3.3%) is positively related to overeducation prevalence. This is consistent with the findings by Albert and Davía (2018) using the same dataset. Please note that the omitted category gathers other options like self-employment or becoming a civil servant, which seem to be the most effective ways to avoid overeducation at the beginning of the labour career. Finally, having rejected a job in the past because it was considered unsuitable is negatively related to the likelihood of overeducation (− 4.5% and − 5.6%). This could account for individuals from less advantaged backgrounds being in more need of work so that they are more forced to accept jobs requiring lower qualifications (Barone & Ortiz, 2011). Another explanation relates to McCormick’s theory of signalling during the job search (McCormick, 1990). Since individuals know their own ability, those who are more skilled might be reluctant to accept overeducated job positions and will wait for a better job opportunity (they have a higher reserve wage) to signal their capabilities.

The error terms of the two equations are not significantly correlated. This implies that there seems not to be common unobservables in the residuals of the two equations given the wide set of observed characteristics. Robustness checks on this are discussed in Sect. 5.4.

To rule out potential concerns and misinterpretation of our results, we want to clarify the following. Firstly, the interaction coefficients do not have an associated AME. The reason is that the interaction term is contained in the AME formula for each variable. Furthermore, Greene (2010) warns about the difficulties of interpreting the partial effects for interaction terms in non-linear models. Secondly, as a direct consequence of this, the significance of the AME does not always correspond to that of the associated coefficient estimates. This is because if there is some heterogeneity in the effect of a given covariate on the dependent variable by gender, the effects might cancel out and lead to an AME non-distinguishable from zero. This is also affected by the coefficient covariances. Thirdly, the statistical significance (and the sign) of the interaction effects based on the t-test must be taken with caution (Ai & Norton, 2003).Footnote 18

5.3 Marginal effects and gender differences in predictors

To further examine the existence of gender differences in the drivers of overeducation, we proceed as follows. First, we compute the overall AME for the outcome equation separately for males and females for each age cohort. Then, we test the hypothesis of whether they are statistically different (i.e. H0: \({AME}_{female}={AME}_{male}\)) using a Wald test. In this way, computation of the separate AME considers the differences in characteristics and coefficient estimates across gender, with the additional advantage that the coefficients are measured on the same scale and are directly comparable. Columns 1–2 (5–6) in Table 6 presents the AME for males and females under 30 (over 30). Column 3 (7) reports the difference between the two and column 4 (8) the chi-squared statistic for the significance of the difference.

Table 6 AME by gender and age cohort

We find that graduating from health sciences protects females from overeducation to a greater extent than males for the less than 30 cohort. For graduates over 30, the difference is not statistically significant. As regards other fields of study, males from architecture and engineering degrees seem to be less likely to be overeducated than females for both cohorts. However, the difference is not significant. Descriptive statistics (see Table 1) show that health sciences is a female-dominated field of knowledge while males are a majority in engineering and architecture. Statistical discrimination might make employers believe males (females) are better at engineering (health sciences). Independently of whether females are initially, on average, less productive in STEM fields of knowledge due to past discrimination or lower self-esteem (Gneezy et al., 2003), implicit discrimination might be present. Social stereotypes as presented in Bordalo et al. (2019) might unconsciously make employers believe that females are better for health care and males for mechanics.

Strikingly, having moved to another Spanish region protects females more than males from overeducation in the less than 30 cohort. This could be associated with the theory of differential overqualification developed by Frank (1978) and the findings of Redmond and McGuinness (2019). If females are more reluctant to move and stick more to local markets, the fact of having moved to another region might signal the possession of soft skills that could be valued more in the case of females. Nonetheless, there are no differences for the over 30 age group. The estimates also indicate that holding a Master’s degree signals males at a greater magnitude than females in the over 30 cohort. Additionally, Erasmus (national) stays reduce the incidence of overeducation only for males (females) in the less than 30 cohort.

Remarkably, the increased prevalence of overeducation associated with having looked for the first job through advertisements is only significant for females in the less than 30 cohort. This suggests that accessing first employment though public calls in which there are no intermediaries seems to damage females. In job interviews with more candidates, females might shy away from showing their capacities and end up working in overeducated jobs in a greater proportion than males. For the rest of the variables, although the AME slightly differ in magnitude, the differences are not statistically different. Overall, human capital seems to be valued equally for males and females.

5.4 Robustness checks

We have performed some robustness checks. Firstly, we inspected collinearity problems. The Variance Inflation Factor (VIF) after separate OLS regressions with interactions are 5.50 and 6.33 (5.12 and 4.98) for the selection and outcome equations for the less than 30 (over 30) cohort.Footnote 19 All of them lie within acceptable boundaries. Secondly, we ran separate Heckman probit regressions for males and females (Tables A7–A8, Supplementary Material). Although the coefficient estimates cannot be compared due to scale differences, the statistical significance of the variables is consistent with our main findings.

Thirdly, we inspected the robustness of the non-significance of the gender dummy in the overeducation equation and the correlation between the error terms of the selection and outcome equations by conducting a stepwise estimation in which each group of variables was included in the model specification sequentially. Figures A3 and A4 in Supplementary Material present the coefficient estimates for the gender dummy and the rho parameter (\({\rho }_{W, OV}\)). We document that the coefficient estimate for the gender dummy is consistently non-significant under different specifications. Even though we face the risk of omitted variables, under the standard assumption that selection in unobservables is proportional to observables, the stability in the female dummy across specification indicates unobserved factors should weight more than observables to rule out the non-significant effect. Additionally, the correlation between the error terms of the selection and the outcome equations becomes non-significant after the inclusion of the regional fixed effects, which might capture omitted factors at the local labour market level. In this regard, we further examined the sensitivity of our findings to different potential values of the correlation between the error terms using the procedure recently developed by Cook et al. (2021). Our main findings hold (see Figure A7 in Supplementary Material).

Fourthly, we examined the potential existence of non-random measurement error in our self-assessed measure of overeducation that could vary by gender or correlate with workers’ ability. To this end, we first implemented the test to detect heteroskedasticity in binary outcomes as proposed by Wooldridge (2010). Next, we ran a heteroskedastic probit regression in which the variance of the error term is assumed to be an exponential function of gender and having earned an excellence grant as a crude proxy of ability. The results and technical details are shown in Appendix E, Table A9. According to this check, there is no evidence of measurement error associated with gender or ability. Therefore, even though we cannot completely rule out the possibility of measurement error in overeducation because of being self-reported, it appears this does not alter our findings.

Fifthly, instead of the dummy variable, we used an ordinal indicator capturing the degree of overeducation. Details on the variable construction and the parameter estimates are presented in the Supplementary Material (Tables A1 and A2). The significance and direction of the effects remain unchanged. Finally, we repeated our analysis measuring overeducation using (i) the statistical approach (SA), and (ii) a novel method that combines the earnings premium procedure proposed by Gottschalk and Hansen (2003) with the statistical approach. The variable construction and the regression results are shown in Sections F and G in Supplementary Material. Although the prevalence of overeducation slightly differs across the three measures (which is something common in related studies, see Verhaest and Omey (2010)), in all cases the percentage of overeducated workers is larger among females and for the over 30 cohort.Footnote 20 Additionally, overeducation based on the ISA approach is consistent with other indicators of a bad job match like the share of graduates that regret having studied the attained college degree or having even enrolled at university studies (Table A19 in Supplementary Material). Importantly, the results obtained from regression analyses using these alternative measures of overeducation are similar to those derived from the self-assessed approach.Footnote 21

6 Conclusions

This paper studied the drivers of overeducation of recent university graduates paying particular attention to the existence of gender differences. Using a representative sample of two age groups from the same graduation cohort in Spain observed four years after the completion of their university studies, we first examined gender differences in overeducation prevalence using matching estimators. Next, we estimated a Heckman probit model with interactions between the explanatory variables and the gender dummy that considers self-selection into the labour market.

Our results show that, conditional on their characteristics, males and females have about the same probability of being overeducated in their jobs. The differences in overeducation prevalence documented in descriptive statistics appear to be driven by differences in the probability of being employed and gender differences in some of the drivers of overeducation. Conditional on labour market participation, the probability of being overeducated is lower among graduates from sciences, engineering and health sciences, among those who earned an excellence grant, and among those with an English language certificate. Mobility, labour experience and years of tertiary schooling also reduce the prevalence of overeducation. In contrast, graduates from arts, those who worked while studying, those who accessed their first employment through advertisements, temporary employment agencies or family networks, and those who benefited from a scholarship (proxy for lower family background) are more exposed to overeducation. Interestingly, we document some differences in the magnitude of the effects across age cohorts, with non-curricular practices or Erasmus stays reducing overeducation only among graduates aged less than 30.

By analysing the differences in the drivers of overeducation by gender, the study has assessed potential implicit discrimination based on characteristics. Based on separate regressions by age cohort with gender interactions, we exploit the non-linearity of the model and examine the average marginal effects by gender. Apart from the finding that there are no gender differences everything else being equal, personal characteristics appear to have the same effect on the likelihood of overeducation for both sexes. As such, the characteristics of highly educated males and females are valued equally by the market.

Nevertheless, the estimates show that, ceteris paribus, female (male) graduates from health sciences (engineering and architecture) under 30 are less likely to be overeducated in their jobs than their male (female) peers. This could reflect a stereotyping of cognitive ability or statistical discrimination that emanates from the usual higher proportion of males (females) in the STEM (health science) fields of knowledge. Similarly, females are more protected against overeducation than males if they earned an excellence grant (for the over 30 cohort), which could partially capture intelligence or innate ability. Conversely, females under 30 are significantly more likely to be overeducated if they accessed their first job through public advertisements that involved directly contacting the employer. It seems that when third parties intermediate, the gender gap vanishes. We also show that moving to another Spanish region for labour-related reasons contributes to reduce the likelihood of overeducation to a greater extent among females. This could be due to females’ higher reluctance to move, which could convey a positive signal for female movers.

From a policy viewpoint, our results contribute to the debate about the economic consequences of the educational mismatch phenomenon. Education represents one of the most relevant public expenditures and its return to society is only achieved when graduates are well matched to their jobs. The evidence on the differences in overeducation by gender based on the field of study suggests that educational institutions should redesign the supply side of their study programmes and incentivize females (males) to access masculinized (feminized) fields of study to avoid stereotypes and statistical discrimination. Although our findings show, at least for the case of recent university graduates, that females and males have an equal likelihood of being overeducated conditional on characteristics, the fact that labour mobility and the completion of national stays reduces the risk of overeducation to a greater extent among females deserves further attention. Since females have traditionally been more reluctant to move, it appears that moving to another region to look for a job is an effective job search strategy for them to avoid the risk of overeducation. This calls for educational institutions to encourage graduates, and particularly females, to undertake stays at other Spanish universities, since apart from increasing their chances of finding a suitable job, it also provides them with soft skills that are highly valued by the market. On a more general level, public policies aimed at improving the matching of university graduates with local job positions appear to be necessary in Spain to reduce the share of university graduates that end up in overeducated positions.

Our study has some limitations. The cross-sectional nature of the dataset limits the analysis to a ‘between’ comparison at a specific period. Therefore, since we cannot completely rule out omitted confounders, our findings need to be interpreted with caution. Future research using longitudinal data could extend our study by examining the dynamics of gender differences in overeducation throughout the labour career or in expansionary rather than recessionary periods. We also lack information on family background or having children. Future studies should go deeper into the effect of parental characteristics and family responsibilities on education mismatch.