1 Introduction

In the last decade in Italy, many researchers have focused their attention on describing university students’ mobility flows between different geographical regions by highlighting that mobility choices are strictly associated to several demographic, economic and social features of the Italian contemporary history and society. Indeed, this phenomenon is related to: (1) demographic issues linked to the depopulation of the disadvantaged areas of the country; (2) the so-called brain drain phenomenon, which contributes to depleting the less developed regions; (3) families strategies addressed to ensure better education opportunities and future employment of young generations and to inter-generational mechanisms of reproduction of disparities and inequalities; (4) gender stereotypes and students’ aspirations; (5) the consequences of educational policies carried out in the last decade and tertiary education reforms (6) and the effects of the local context in terms of geographical and sociocultural characteristics (Attanasio and Enea 2019; Barone and Assirelli 2020; Ciriaci 2014; D’Agostino et al. 2019b; Impicciatore and Tosi 2019; Santelli et al. 2022). Empirical studies have well documented the presence of an almost unidirectional flow of students from southern to central or northern Italian regions for bachelor’s studies. A phenomenon that becomes even harsher in the transition from bachelor’s to master degrees (Columbu et al. 2021; Enea 2018) and that may anticipate future migration choices of highly educated individuals (Oggenfuss and Wolter 2019). Moreover, these elements are even more crucial given that students and universities affect regional competitiveness and growth potential (Salter and Martin 2001; Valero and Van Reenen 2019), and may promote intergenerational mobility (Chetty et al. 2020; Chetty and Hendren 2018). Moreover, several studies document that an important share of variability in students’ mobility choices is ascribable to divergences in contextual factors and the socioeconomic contexts between the geographical areas of origin and destination (D’Agostino et al. 2019b; Giambona et al. 2017). Santelli et al. (2022) highlight that students attending a liceo in the main cities in Campania have a higher propensity to be attracted by the reputation of the universities; moreover, other studies confirm the higher propensity to select non-local universities for students attending schools with the highest profiles in terms of socioeconomic conditions of students’ families, namely classical and scientific high schools (Rizzi et al. 2021).

According to the literature, the divergences in students’ educational choices can be related to two theoretical paradigms: investment and consumption theories (Attanasio 2022; Foot and Pervin 1983). According to the former, individuals’ education choices depend on comparing the expected returns of education and its expected costs, which are functions of both tangible and intangible assets. Meanwhile, according to the consumption theory, the demand for education follows the conventional economic theory, and it depends on the interplay between students’ preferences and income, education’s relative price and universities’ characteristics (Krezel and Krezel 2017). Recent findings provided by Rizzi et al. (2021) suggest that in Italy the mobility of northern first-year students toward cities such as Milan and Turin is mainly driven by the investment perspective; nevertheless, as highlighted by Attanasio (2022), the Italian framework is characterized by an almost unidirectional flow toward the main northern cities that are also those that can provide students with higher standards in terms of services, infrastructures, leisure activities and opportunities. Indeed, these peculiarities would suggest an intersection of the two perspectives behind students’ mobility choices.

Nevertheless, both perspectives see students’ decision process as a multi-stage process in which individuals develop their educational aspirations, search information on universities and colleges, and finally choose their university (see for example Hossler and Gallagher 1987). In this decision process, the role of the high school environment is paramount since it affects both the development of students’ capabilities as well as the information available to them regarding the universities, which, in turn, determine inter-generational social mobility patterns and the presence of social inequalities.Footnote 1 However, despite the proliferation of studies on the phenomenon, the literature is characterized by a relevant gap in understanding the role that high schools play in shaping students’ mobility decisions.

Another critical aspect of the previous literature is a high level of heterogeneity in the definition of students’ local area and, therefore, students’ mobility choices. Indeed, these are usually classified by using ad hoc definitions that depend essentially on the aims of the analysis. For example, studies interested in analyzing the pull and push factors that drive students’ and families’ choices mainly focus on the analysis of the South-North mobility pathways or bound the analysis to flows between macro geographical areas (e.g. Pitzalis and Porcu 2015; Genova et al. 2019). Others focus their attention on the patterns of mobility of students that live in specific areas in order to shed light on educational policies which may counteract the outgoing flows of students by enhancing local universities’ attractiveness (e.g. Santelli et al. 2019). In the same frame, studies that investigate university prestige, and its influence on students’ university choice, consider student flows as an indirect indicator of university quality and mainly focus on the classification of students as stayers or movers with respect to a given university or a given town (e.g. Giambona et al. 2017).

Our contribution aims to fill these gaps by proposing a robust classification of students’ mobility choices and estimating how these choices relate to the high school attended by students.

Concerning students’ choices classification, we define our outcome variable based on: (1) the supply of tertiary education in students’ local area, (2) the chosen subject of study, and (3) the travel time needed to reach the nearest university. In particular, students’ local area is defined as the territory between students’ town of residence and the nearest university. This definition allows us to avoid assumptions that rely only on administrative borders or deterministic thresholds, but it rather depends on the availability of universities in each student’s residence area. Based on this setting, we define three choice categories: local universities, forced non-local universities, and free non-local universities. Local universities are those placed within students’ local areas. Forced non-local universities are those placed outside the student’s local area but are the nearest university that provides a program in her/his chosen field study. Finally, free non-local universities are those hosted in towns farther than the local and forced non-local universities. This definition allows us to distinguish between two kinds of mobility: one related to students’ preferences toward a specific field of study, the other related to students’ preferences regarding a specific university/territory in Italy. Indeed, students who choose the forced non-local alternative may be more prone to stay in their local area than those who choose the free non-local alternative. Therefore, their mobility choices should be considered more similar to those of students who choose the local alternatives. Indeed, our results show that students in local universities have similar characteristics to those that choose forced non-local universities. This difference is significant from a policy point of view since the out migration due to students that choose the forced non-local alternatives may be reduced by increasing the supply of degree programs in their local area. Moreover, to account for the uncertainty in the definition of the local and non-local universities, we generate multiple thresholds for students’ local area definition by adding a random amount of travel time to the observed thresholds. Based on this categorization, we apply Multiple Imputation Analysis procedures to assess the sensitivity of the random selection of the thresholds and to combine the results in a single statement.

As for the effect of high schools on students’ mobility decisions, we start from the hypothesis that the high school environment, characterized by a set of peers, teachers, and counseling policies, can be treated as a not observable variable that affects students’ preferences toward local or non-local universities. At this aim, we look at the choices of students who have attended the same high school before enrolling at the university and compare students who belong to different schools but have similar socio-demographic characteristics. Nevertheless, there is high variability in the propensity to move across disciplinary fields. These divergences are not just related to the presence of the curricula in the local area but also to the opportunities in terms of lifestyle changes offered by the new destination. Moreover, the propensity to choose a non-local university and the choice of the field of study are also connected to the type of secondary school attended. Thus, to disentangle the school effect in orienting student choices, removing the variability in the propensity to be in mobility connected to these aspects is crucial. To our knowledge, this is the first attempt to use Italian micro-data from the National Student Archive managed by the Italian Ministry of University to shed light on the effect of the secondary school environment on students’ university choices in terms of selection of local or non-local universities.

At this aim, a multilevel multinomial approach has been considered to estimate the choice probabilities associated with local, forced non-local and free non-local choices. In particular, we jointly consider the clustering of university applicants in secondary schools and the heterogeneity in the choices between macro geographical areas and the subject of study. This approach allows us to estimate two parameters for each high school that inform on the role that the school plays in affecting students’ propensity to choose a non-local university (free non-local or forced non-local), holding constant across schools the other sources of heterogeneity related to students’ characteristics and geographical peculiarities which can affect students’ mobility choices.

The analysis indicates that high schools have a relevant role in affecting students’ mobility choices, especially when considering local universities. In this case, variance partition coefficient results indicate that, depending on students’ macro areas of residence, differences between high schools explain between 24.38% and 53.41% of the variability in students’ mobility choices that is not explained by students’ characteristics or differences between fields of study. The high school effect is stronger for students that reside in the Islands and central Italy while weaker for those residing in the North of the country. From high school perspectives, the results show that schools with a higher than average propensity to steer students toward local universities may have a relevant role in keeping southern students and those residing in the Islands close to their local area and counteract the well known asymmetric flow of students from southern to northern regions of the country.

This paper is structured as follows. Section 2 briefly reviews the relevant literature on students’ university choices and peer effects. Section 3 provides information on the higher education system in Italy and describes the data. Section 4 presents the empirical method adopted and explains our analysis strategy. Section 5 presents the main empirical results. Finally, Sect. 6 provides our discussion of the results and conclusions.

2 Setting the background

Among the causes of students’ mobility choices, many studies have highlighted how the economic conditions of the territories that host the universities affect students’ universities choices, pointing out as the differences in the employment opportunities are one of the main drivers of families’ decisions to invest in tertiary education studies in the areas where the return of education can be faster and more profitable (Dotti et al. 2013; Impicciatore and Tosi 2019; Giambona et al. 2017).

Besides economic rewards, many other important factors may influence students’ and families’ decisions, such as the willingness to travel to reach the university, the distance between students’ town of residence and the university, the accessibility of competing institutions, the availability of public means of transports, and policies aimed to financially support students such as scholarships or places in dormitories (Castleman and Long 2016; Cattaneo et al. 2017b; Pigini and Staffolani 2016; Spiess and Wrohlich 2010; Suhonen 2014; Türk 2019). For universities’ characteristics, other authors have highlighted the role played by tuition fees (Dwenger et al. 2012; Hübner 2012; Long 2004) and institutions’ quality in specific domains related to research, services, teaching, internationalization, and the perception that graduates have of their overall university experience (Biancardi and Bratti 2019; Bratti and Verzillo 2019; Ciriaci 2014).

Cattaneo et al. (2017a) highlight that students’ university choices can be rationalized within three main theories: Human Capital Theory, Signalling Theory and Preference Theory. The Human Capital Theory interprets student mobility from an investment perspective, where individuals’ choices concerning investments in education result from a decision process addressed to maximize students’ job opportunities. Focusing on the Italian case, where the demand for qualified jobs is lower than the supply, the Signaling Theory assigns a predominant role to the credentials of the universities in terms of which act as a criterion to differentiate the values of students’ degrees. Lastly, Preference Theory is much related to the consumption perspective in explaining human migration choices: individuals make their choices based on the lifestyles that the socioeconomic environment of the destination place offers. The prevalence of one of these theories in explaining students’ choices depends on a complex interaction between many factors, such as the business cycle and gender-related stereotypes, as documented by Cattaneo et al. (2017a) in analyzing the determinants of students’ choices during the economic crisis period.

Recent researches focus on the peculiarities of the internal mobility of students in Italy as a process that contributes in reproducing social inequalities and widens the disparities in the opportunities between southern and northern students or between those who come from less-educated families and those that belong to families with higher educational levels. For example, Impicciatore and Tosi (2019) provide evidence that parental education is one of the main determinants of students’ university choices and that the South-North migration flow is strictly linked to high cultural resources of families which perceive the investment in education as the foremost opportunity for reinforcing their social status. In contrast, internal mobility within southern regions is not associated with parental background.

Other studies have recently investigated the hypothesis of the existence of chain migration effects by looking at the role of private information on university experiences shared by communities of students (Genova et al. 2019). The joint use of Social Network Analysis tools and clustering techniques allowed the authors to disentangle the role played by family and friendship ties as the residual effect after accounting for all the push and pull effects related to origin and destination places (Santelli et al. 2022).

Barone and Assirelli (2020) investigated the gender segregation in higher education in Italy by highlighting that peers’ behaviours influence student choices regarding the field of study selection. Moreover, individuals’ choices are also driven by the school environment, counsellor services, teachers’ recommendations and classmates’ preferences. These findings suggest that students’ university choices are not merely the results of rational choices based on cost-benefit analysis but are mediated by several factors related to student cultural identity and sense of belonging to a community. Focusing on the United States, Engberg and Wolniak (2010) develop a conceptual framework in which the high school’s environment is the driver of college choice decisions. Their research shows that the effects of the high school context on university enrollment depend on the school’s endowment in terms of human, social and cultural capital resources above and behind the individual characteristics. Using a multilevel approach on longitudinal data, which includes individual and school-level predictors (as compositional variables of individual characteristics), the authors conclude that both peers’ and parents’ networks have a relevant impact on enrollment decisions in two and four-year programs. Moreover, a comparison between school-level and student-level coefficients reveals a higher impact on four-year program enrollments of school-level variables, college aspirations of family and friends, parent-to-parent contact, and the number of friends who plan to attend a four-year program.

3 Data

This section describes the main features of the educational system in Italy and the data used in the present analysis. We focus on high school graduates’ decisions to continue their studies at the tertiary level in a local or non-local university, considering the availability of university degree programs in their residence area, their secondary school background, and their characteristics.

3.1 Data sources and eligibility criteria

The analysis relies on the administrative data from the database MOBYSU.IT regarding all the population of Italian high school leavers enrolled in a bachelor’s program in an Italian university between 2016 and 2020.Footnote 2

Given our interest in understanding the role of secondary school background in shaping students’ mobility choices, we define our data according to a set of eligibility criteria. First, we do not consider the students enrolled in programs with a national entrance test since their location decisions depend on their ranking positions rather than their preferences. Second, we do not consider students enrolled in e-learning universities not requiring students to move to attend their degree program classes. Third, we retain in our data only those freshmen who gained their high school diploma in an Italian institution after 2015. This choice is because we have available information at the school level since the academic year 2016/2017. Therefore, starting from a population of 1,244,934 freshmen enrolled in an Italian university, our population of interest consists of 1,041,755 students. Among these students, we do not consider those that do not report any information on their high school background (18,434 records) or their city of residence (1138 records) and those that have enrolled at university before their high school diploma (818 records). In total, we lose information on 20,390 students. However, they represent only 1.96% of our population of interest. Thus, since our results should not be affected by these students, we advance the hypothesis that they are missing at random.

Following this strategy, our final data consists of 1,021,365 records. For each student, we observe several characteristics regarding the chosen university, their town of residence, and the type of high school they attended that will be described in the following sections.

3.2 University supply in students’ local area

The dataset includes enrollments in 82 universities located in 209 towns in all the Italian regions. Each university provides a set of degree programs that may be classified into three categories: 3-year bachelor’s programs, 5- or 6-year degree programs, and masters’ degree programs. In this analysis, we focus only on bachelor’s programs.Footnote 3 Moreover, we consider only universities that for admissions rely on an ex-post screening system in which the only requirement for enrollment is a high school diploma.Footnote 4 Following this strategy, we observe 49 different degree programs that can be classified according to the ‘International Standard and Classification of Education: Field of education and training’ (ISCED-F 2013, see UNESCO Institute for Statistics 2014) in 10 fields of study shown in Table 8 in the Appendix. The ISCED-F 2013 classification is based on the similarities between programs' disciplinary contents. Therefore, it allows us to classify the programs that students may consider as similar or substitutes in the same category. In our analysis, this feature is crucial since we want to distinguish between students that choose a non-local university to attend a degree program that would also be available at their local university (i.e., closest to their residence), from the ones that are forced to move to attend a course that is not provided in their local universities.

We classify students’ mobility choices depending on: (1) the supply of universities in their local area, (2) the travel distance between their city of residence and the nearest university, (3) the chosen field of study. Data on travel distances are gathered by considering the minimum travel distance by car between any two pairs of Italian towns available from the National Statistical Institute (ISTAT) and the information from Google Maps.Footnote 5 This data has been used to compute the travel distance (in min) between each student’s city of residence and all the Italian universities. Table 1 shows, for each Italian region and macro region, the descriptive statistics on the average travel time needed by students residing in the region to reach their chosen university from their city of residence, along with the minimum distance from the nearest university computed considering (a) all the fields of study (column 3) and (b) the field of study chosen by the student (column 4).

Table 1 Descriptive statistics on travel time in minutes

As we can note, the average distance students travel depends heavily on the macro region of residence considered. The average distance recorded for students residing in the Islands is 2.5 times the national average, while in the South and North-West it is respectively about 1.5 times and 0.5 times the average distance recorded in the country. This element is related to the well known Italian North–South divide (e.g. Attanasio and Enea 2019), characterized by an almost unidirectional flow of students from southern to northern regions. This considerable difference among macro regions is not mirrored by the data regarding the distance to the nearest university, where the average distance varies between a minimum of 10.9 min in the North-West and a maximum of 20.6 min in the Islands. Indeed, we can see that, although students in southern regions and Islands need to travel more to reach the nearest university with respect to the others, this difference is minimal. Nevertheless, if we look at the differences between regions, we have a different picture: the values for minimum distances are more heterogeneous, ranging from a minimum of 6.1 min in Lazio to a maximum of 33.1 min in Calabria if we consider all fields of study, and from 13.9 min in Lazio to 53.9 min in Basilicata while considering only the chosen field.

These data highlight the presence of differences in the tertiary education supply across the country, with students in some regions that need to travel more to reach the nearest university and, therefore, may have a different perception of which university has to be considered as local or non-local as they have different habits to travel for studying reasons. To account for this element and to avoid inconsistent assumptions that rely only on administrative borders or deterministic thresholds, we classify students’ choices depending on two values: \(d_{i,univ}\), given by the distance between student i’s town of residence and the nearest university, and \(d_{i,field}\), given by the distance between student i’s town and the nearest university that provides a degree program in the chosen field of study. Moreover, since many Italian universities are located close to each other but in different cities, also universities located close to the nearest institution may be perceived as local. To account for this element, we apply a non-deterministic approach to define students’ local areas. Our strategy consists in increasing both thresholds by a random amount of time \(\delta \in [0;30]\) minutes and estimating a separate model for each value of \(\delta\). As we will show in Sect. 4, we generate \(m=5\) random values of \(\delta _m\) to consider the uncertainty in the thresholds’ definition, and then we combine the results of the analysis carried out using each value of \(\delta _m\) by using Rubin’s rule (Rubin 1987). It is worth highlighting that (i) since the distance is computed from cities the value of 0 can be observed only for universities in the same city, and that (ii) results do not change substantially by considering alternative and even more extreme intervals for \(\delta _m\) (Porcu et al. 2021).

Following this strategy, we define three categories of university choices: local, forced non-local, and free non-local. Universities are considered local if placed closer than \(d_{i,univ} + \delta _m\) minutes of travel from the student’s town of residence, while they are considered non-local if placed farther than \(d_{i,univ} + \delta _m\) minutes. For example, with a value of \(\delta =0\), the local area of all the students that reside in Milan comprises all the universities in the city of Milan. Instead, suppose a value of \(\delta =20\) is considered. In that case, the local area of students residing in Milan also includes other cities such as Sesto San Giovanni and Desio where there are two separate branches of the University of Milano-Bicocca.

Moreover, to account for differences in universities’ supply that depend on the chosen field of study, we distinguish between two types of non-local universities: forced and free. Universities are considered forced non-local if they are located closer than \(d_{i,field} + \delta _m\) minutes of travel from the student’s town. Namely, a forced non-local university is the nearest university providing a program in a student’s field of study. Finally, free non-local universities are those located farther than both thresholds.

3.3 Secondary schools

MOBYSU.IT database allows us to precisely identify the high school attended by each student. It provides information on the town hosting the school, the specific curriculum provided, and whether it is a private or a public funded one. We score 20,164 high schools.

Each observed high school provides several types of programs with significant differences in disciplinary contents. Table 9 in the Appendix lists the 13 high school curricula we consider in the analysis. Each curriculum is usually established at the national level, with a set of compulsory subjects that are taught in all the programs (mathematics, sciences, literature, history, one or two foreign languages, and gym classes) and a set of subjects that depend on the type of curriculum attended. Besides the differences in specializations, the curricula also differ in the amount of time allocated to each subject and expected outcomes. Indeed, following Contini et al. (2017), we can group these curricula into three general categories of institutes: the liceo, technical schools, and vocational schools. Liceo schools are generally oriented to provide students with a solid academic background and present four specializations: humanities, sciences, languages, and arts. Technical institutes provide students with a specialization in a particular field (e.g., accounting, surveyor, industrial) and general education. Vocational institutes are even more oriented to training students to enter the workforce with lab and professional activities programs.

These differences in programs and curricula are likely to impact also students’ choices regarding local and non-local universities. Indeed, students with a more specialized background may prefer a particular university or, for example, students from a liceo are more likely to come from advantaged socioeconomic families, which, on average, can provide more financial resources to support their studies in tertiary education (Barone and Assirelli 2020; Impicciatore and Tosi 2019).

We account for these differences in two ways. First, we consider the type of high school attended among the predictors of the model. Second, as explained in Sect. 4, we apply a Cross-Classified Multinomial Logit (CCMNL) by defining the second level of clustering as the interaction between the high school and the curricula offered.

3.4 Students’ characteristics

For each observed student, MOBYSU.IT provides information on several individual characteristics affecting students’ university choices. In particular, we observe students’ gender, age, the town of residence, school final grade, the attended high school, and the high school curriculum. Moreover, since we observe the years of diploma and enrollment, we can calculate the years between high school graduation and enrollment (indicated as Late Enrollment) and identify the students who obtained their high school diploma after turning 19 (indicated as Irregular). To account for students’ commuting experience at high school, we also computed the High school mover indicator that takes value 1 if the high school attended was not in the student’s city of residence. Table 2 shows the descriptive statistics on students’ characteristics by macro region of residence.

Table 2 Descriptive statistics on students’ characteristics by macro area of residence

From Table 2, we can see that 64.9% of university students in our data reside in central or northern regions, while 35.1% in southern regions and Islands. As for students’ characteristics, most students are females in all the macro regions and the average diploma grade is similar among areas, with lower average values in the North-West regions. Students generally enroll at university the same year when they get their high school degree. Indeed, the average number of years between high school graduation and enrollment is 0.12 at the national level. Most of the enrolled students were regular in the high school time schedule, with 13.2% who got their high school diploma with at least one year of delay. This share is lower in southern regions. Moreover, 50.9% of students in the country attended a high school in a different city than their city of residence. This share is higher in northern regions. The second panel of Table 2 reports information on the distribution of students among high school institutes. As expected, 70.5% of enrolled students came from a liceo, while only 21.7% came from a technical school. Students from vocational schools are the residual category, with an average share in the country of 7.8%. The last panel shows that the number of enrolled students has increased in the last five years, with similar patterns in all the macro regions.

Finally, to control for the economic conditions of students’ area of residence, we have collected from ISTAT the data regarding the provincial unemployment rate and the regional GDP per capita. Indeed, students in less economically developed areas, or regions with higher unemployment rates, may be more likely to migrate to a different region to seek better employment opportunities (see for example Cattaneo et al. 2017b; Giambona et al. 2017). Moreover, to account for differences in students’ preferences that may be related to regional characteristics, we estimate a separate model for each macro area of residence. This strategy allows us to model the heterogeneity in students’ preferences that depends on differences related to their macro area of residence without including also a full set of interactions terms between each predictor and the macro area fixed effects. Moreover, to account for geographical differences within each macro area we add the set of provincial fixed effects. This strategy allows us to control for all the time-invariant attributes that vary among provinces within the same macro area. Thanks to this strategy we aim to disentangle the high school effect on students’ mobility choices from the contextual effects that depend on regional and provincial differences.

4 Empirical analysis

To estimate the influence of the attended high school on students’ choices on whether to enroll in a local or non-local university, we specify a CCMNL model with three levels. Students (level-1 units) are classified according to the interaction between the attended high school and the curriculum (level-2) and the chosen field of study (level-3). Specifically, we observe 1,021,365 students cross-classified according to 126,138 high school/curricula pairs and 10 fields of study. This specification allows us to estimate the effect of high schools on students’ choices by accounting for the variability in students’ propensity to choose a non-local university that depends on differences among fields of study and students’ characteristics. Moreover, as explained in Sect. 3.4, we account for the heterogeneity in students’ preferences that depend on their macro area of origin by estimating 5 models, one for each considered area (North-East, North-West, Centre, South, Islands).Footnote 6

Let \(\pi _{ijkg} = P(Y_{i} = j \vert X,k,g)\) be the probability of student i to choose category j given the set of observable characteristics X, the high school/curriculum k, and the chosen field of study g. As explained in Sect. 3.2, students’ university choices are classified into three categories depending on: the distance traveled by the student \(d_i\), the minimum distance between students’ town of residence and the nearest university \(d_{i,univ}\), and the minimum distance needed to reach the nearest university which provides a program in the chosen field of study \(d_{i,field}\).

As said above, to classify universities as local or non-local with respect to each student’s local area, a random amount of time \(\delta _m\) has been added to these distances. In particular, j can take three values:

  • \(Y_{i}=0\), i chooses a free non-local university, i.e. \(d_i > d_{i,field}+\delta _m\)

  • \(Y_{i}=1\), i chooses a forced non-local university, i.e. \(d_{i,univ} < d_i \le d_{i,field}+\delta _m\)

  • \(Y_{i}=2\), i chooses a local university, i.e. \(d_i \le d_{i,univ}+\delta _m\)

This classification of students’ choices allows us to differentiate between local and non-local universities by accounting for the actual supply of universities in students’ local area and the chosen field of study. Indeed, the minimum distances are computed for each student by considering her/his town of residence. Nonetheless, we distinguish between students who move to attend a program that is not available in their local area from the ones that migrate even when the nearest university provides a program in their field of study, introducing two categories for classifying non-local choices: forced and free non-local.

Given this setting, and considering free non-local university (i.e. \(Y_i=0\)) as our baseline, the probability of student i to choose between local (i.e. \(Y_i=2\)) and forced non-local (i.e. \(Y_i=1\)) is estimated by using a multinomial logistic function:

$$\begin{aligned} \log \bigg (\frac{\pi _{ijkg}^m}{\pi _{i2kg}^m}\bigg )= X'\beta _j^m + u_{jk}^m + v_{jg}^m \,\text { with } j=0,1\, \text { and } m=1,\dots ,5 \end{aligned}$$
(1)

where X is the set of variables used in estimation, \(u_{jk}\sim {\mathcal {N}}(0,\,\Omega _u)\) is the random intercept that captures the between school/curriculum (k) variability, as it is shared by students nested in the same high school/curriculum. Meanwhile, \(v_{jg}\sim {\mathcal {N}}(0,\,\Omega _v)\) is the random intercept that captures the between fields of study variability which is shared by students who enrolled in the same field of study. The superscript m indicates that several models are estimated considering different values of the random term \(\delta _m\). X contains several variables that account for students’ characteristics, their high school background and the characteristics of their residence area. In particular, we apply the empirical approach by estimating Eq. 1 for each macro area of residence separately and including the set of provincial fixed effects to account for geographical differences within each macro area in students’ propensity to choose local universities. In Table 3 we report the definitions of each regressor used in estimation, along with the average value observed and the standard deviation.

Table 3 Definitions and descriptive statistics of variables used in estimation

Concerning the random parameters, the joint introduction in the predictors of the two random terms, \(u_{jk}\) and \(v_{jg}\), allowed us to capture divergences in the preferences for local and non-local universities over and above the individual characteristics and the preferences for different fields of study. Therefore, the set of parameters \(u_{jk}\) captures the residual heterogeneity in students’ mobility choices between students attending different high school/curriculum pairs after controlling for the differences in students’ preferences that depend on their characteristics and the chosen field of study. Thus, given this specification, the posterior estimates of \(u_{jk}\) can be used as a measure of the influence of the high school environment on students’ mobility choices.

Finally, to account for the uncertainty in the definition of students’ local areas and the related classification of students’ university choices as local, free non-local, and forced non-local, we estimate the model for each macro area of origin by considering \(M=5\) values for \(\delta _m\) selected as random draws from the interval \(\delta _m \in [0;30]\) minutes. The results are combined by using Rubin’s rule (Rubin 1987) to obtain a single statement for each macro area of residence considered. Formally, denoting by \(\theta ^m=[\beta _j^m, u_{jk}^m, v_{jg}^m]\) the set of parameters obtained in each estimation m, the pooled estimate \({\bar{\theta }}\) is computed as the average value over the M estimates:

$$\begin{aligned} {\bar{\theta }}= M^{-1} \sum _{m=1}^{M} {\hat{\theta }}^m. \end{aligned}$$
(2)

The standard error related to each parameter \({\bar{\theta }}\) is obtained as:

$$\begin{aligned} SE({\bar{\theta }})=\sqrt{W + B + \frac{B}{M}} \end{aligned}$$
(3)

where W is the within estimates variance:

$$\begin{aligned} W= M^{-1} \sum _{m=1}^{M} SE({{\hat{\theta }}^m})^2 \end{aligned}$$
(4)

and B is the between estimates variance:

$$\begin{aligned} B=\frac{\sum _{m=1}^{M} ({\hat{\theta }}^m - {\bar{\theta }})^2}{M-1} \end{aligned}$$
(5)

According to the steps described above, we can estimate the influence of secondary school on students’ mobility choices by accounting for the uncertainty in the definition of the thresholds \(d_{i,univ} + \delta _m\) and \(d_{i,field} + \delta _m\).

The main advantage of adopting such an approach to deal with missing information is that the estimates of parameters explicitly rely on different values of the thresholds to define a university as local or non-local, and the standard errors specifically account of the related uncertainty.

5 Results

We begin by reporting the parameter estimates of the CCMNL, together with a discussion of each predictor’s effect on students’ choice probabilities. Then, we focus on the role of high schools by exploiting the posterior parameter estimates associated with the random intercepts \(u_{jk}\). In particular, we show how the high schools’ effect, defined as the value associated with each \(u_{jk}\), varies among students and high schools.

5.1 Cross-classified multinomial logit results

Table 4 reports, for each macro area of residence, the parameter estimates obtained by pooling the 5 vectors of coefficients resulting from estimating the set of CCMNLs, one for each value of \(\delta _m\). All the parameters have been estimated with the runmlwin routine in Stata which allows to apply a Monte Carlo Markov Chain algorithm through the program MLwiN (Leckie and Charlton 2013).

Table 5 depicts the distribution of students in the three identified choice categories, which describe the preferences for local or non-local universities for the different values of \(\delta _m\). It is worth highlighting that the definition of the thresholds has a relevant influence on the classification of a university as local, forced non-local and free non-local. The difference in the rate of free non-local is about 15% between the two extreme values of \(\delta _m\). However, models’ estimates for each value of \(\delta _m\) are similar in terms of their signs and magnitude.Footnote 7 This element suggests that final results do not depend on the specific thresholds used to define students’ local areas. Indeed, using a non deterministic approach, considering different values for \(\delta _m\), ensures that the parameter estimates of the CCMNL model take into account the uncertainty in the definition of students’ local areas.

Turning on the pooled estimates presented in Table 4, we focus on interpreting the main results related to students’ and local areas’ characteristics in this subsection. In the following subsection, we will address the results regarding the posterior estimates of random intercepts.

Table 4 Pooled CCMNL (Baseline = Free Non-Local university)
Table 5 Shares of students in each choice category

The first element that it arises in Table 4 is related to the variation in the magnitude and signs of some coefficients when considering different macro areas. Moreover, as it will be clear from the discussion on the average marginal effects in Table 6, we can notice that there is a relevant difference in the relative magnitude and signs associated with the constants. This element suggests an heterogeneity in the average probability to choose one of the considered alternative that depends on differences among macro areas.

As for other students’ characteristics, we can see that females are more likely to choose a free non-local university than males when residing in northern or central regions of the country, with a more substantial effect in the contrast local vs free non-local. Instead, if we look at students from the South and Islands the coefficients are positive, even though with a lower relative magnitude with respect to the other characteristics (i.e. the type of high school curricula or the high school mover status). This result may be related, among others, to two elements. First, the presence of some cultural traits that may incentive families in the South to invest more in their sons’ education than their daughters’ one (see, for example, the discussion in Ballarino et al. 2022). Second, the uneven distribution in the country of some gender oriented degree programs and the heterogeneity in their quality (e.g. most important polytechnics are located in the North) may motivate males to leave their region of residence more often than females (D’Agostino et al. 2019a; Gibbons and Vignoles 2012).

Moreover, students who have obtained their high school degree in technical or vocational institutes and those who had to commute to reach their high school have a higher probability of choosing a local or forced non-local alternative. These results are similar across the macro areas and confirmed by analysing the average marginal effects reported in Table 6.

Looking at the economic characteristics of students’ areas of origin, once we control for the differences between provinces, the regional GDP has a heterogeneous effect based on the considered macro area. It negatively influences the probability of selecting a local university in northern regions and the Islands, while the effect is positive for students in the Centre (despite negligible) and in southern regions. Concerning forced non-local universities, we can see that the GDP has a negative and relevant effect in terms of magnitude when considering students coming from the North-West and in the Islands while it is positive for students that reside in the North-East, in the Centre, or the South. In contrast, the provincial unemployment rate has a negligible effect in most of the areas considered. However, it is important to note that the interpretation of these coefficients is not trivial. Indeed, the inclusion of provincial fixed effects may absorb a relevant part of the effect related to GDP and unemployment. Therefore, the inclusion of these variables in the model has the purpose to account for the variation in geographical economic conditions within the country rather than to have an estimation of the effect of these characteristics.

In general, we can see that all the coefficients associated to students’ characteristics have the same sign in influencing the probability of both outcomes in all the macro areas considered. This element suggests that local and forced non-local alternatives are considered similar from students’ perspectives. These alternatives attract students who prefer universities located within or near their local area, depending on the availability of a program in their field of study.

Table 6 Average marginal effects of students’ choices main determinants

To better understand these results, we reported in Table 6 the average marginal effects on students’ choice probabilities of the main predictors included in Table 4 and the differences in probabilities associated with students’ macro areas of residence. In particular, for each alternative considered, we report two values: the predicted choice probability associated with each profile and the difference in choice probabilities between each profile and its baseline. Following Long and Mustillo (2021), we computed the average marginal effects as average discrete changes. For example, the choice probabilities associated with the vocational institute indicator are computed by predicting the individual probabilities assuming that all the students have attended this high school curriculum and holding all the other characteristics to their observed values. These probabilities are computed by using the estimates associated with students’ macro areas of residence. Then, the average marginal effect associated with the indicator is computed by taking the difference between the choice probabilities related to the liceo profile and those associated with the vocational institutes. In this case, our results indicate that, ceteris paribus, the probability of choosing a free non-local university for students with a vocational or technical diploma is, respectively, 6.61 and 7.68 percentage points lower than the one for those who attended a liceo. This result can be related to the fact that students from the liceo are usually more oriented toward an academic career and, therefore, may be more interested in university characteristics besides the distance from their towns, such as quality of research and job opportunities. These elements may encourage students from liceo high schools to leave their local area to reach a university that satisfies their preferences better. With respect to the high school mover status, we can see that students who travelled to reach their high school in the past prefer to enroll in a forced non-local university more than the other students. This indicates that these students are willing to travel more to reach the first university providing a program in their preferred field of study. Indeed, most of these students do not have a university in their town of residence and, therefore, must commute to reach the nearest university. Thus, they may perceive as similar, in terms of travel time, local and forced non-local universities, and choose the second because it provides a program in their field of study.

As for students’ macro areas of residence, we can see that this is one of the most important determinants of students’ choices: changes in this variable are associated with the most relevant changes in students’ choice probabilities. From the first column of Table 6 we can notice that the majority of students in central regions and in the Islands are enrolled in local universities. Ceteris paribus, students from the North-East are those with the lowest probability to choose a local university and are more likely to choose a free non-local alternative. In general, the reduction in the probability to choose a local alternative is associated with an increase in the one related to the free non-local alternative. The only exception is given by students in the North-West who prefer to enroll in forced non-local alternatives rather than free non-local. These results confirm that, from students’ perspective, local and forced non-local alternatives are perceived as similar. Moreover, the fact that students residing in the North-West are less mobile provides evidence of the role that these regions have in attracting students from other areas of the country. These differences among macro areas are also related to the existence of a strong and persistent flow of students who migrate from the Islands and southern regions to the Centre-North of the country. As for students’ past performances and careers, we can see that those with higher diploma grades prefer the free non-local alternative over the local alternative. A standard deviation increase in the diploma grade (12 points) is associated with an increase of 1.61 percentage points in free non-local choice probabilities and a reduction of 1.46 points in local choice probabilities. In contrast, the effect on the forced non-local alternative is lower (\(-\)0.15). Moreover, we can note that, once controlling for other students’ determinants, their irregular status and years of delay in enrollment do not have a strong impact on location choices.

Finally, concerning students’ sex, we have that, on average, female students are 0.97% more likely to choose a free non-local university with respect to males. This effect is almost totally driven by a reduction in local choice probabilities. It is worth mentioning that findings on this aspect do not agree in indicating a higher probability of one gender to be in mobility with respect to another (see, for example, Cattaneo et al. 2017a; Columbu et al. 2021; D’Agostino et al. 2019a), but results seem to provide opposite evidence. Indeed, as shown with respect to the results in Table 4, the differences between females and males are not constant across the country, with females in southern regions and in the Islands that are more prone to choose local alternatives. Moreover, D’Agostino et al. (2019a) show a prevalence in the rate of movers of males in STEM students in Italy between 2008 and 2014. These results have to be considered taking into account the high variability in the students’ propensity to be in mobility between disciplinary fields and that the choice of the disciplinary field is indeed gender-oriented and affected by family socioeconomic conditions. Moreover, findings suggest that female students have a higher propensity to be in mobility when they come from a high socioeconomic background (see, for example, Gibbons and Vignoles, 2012).

5.2 The high school effect

In this section, we address the interpretation of the results concerning the set of estimated random intercepts by focusing mainly on the results associated with the high school/curriculum level parameters \(u_{jk}\).

The bottom panel of Table 4 presents the information related to the set of random intercepts \(u_{jk}\sim {\mathcal {N}}(0,\,\Omega _u)\) and \(v_{jg}\sim {\mathcal {N}}(0,\,\Omega _v)\) for each macro area of residence. In particular, for each level, we have estimated the variances and the covariances that help to recover the variance-covariance matrices \(\Omega _u\) and \(\Omega _v\). The results show that the variance estimated at the high school/curriculum level for contrast local vs free non-local is higher than the one estimated for the field of study level in all the models, with the exception of the one estimated for students residing in the North-West. This suggests that, in general, the differences among high school/curricula pairs (level k units) explain much of the variability in students’ choices for local or free non-local universities that is not explained by the other predictors, especially if we consider students in the Centre or in the Islands where the variances in the first level are more than three times the ones estimated for the second level. Instead, if we consider the contrast between forced non-local and free non-local we can see that the variances estimated for the second level (fields of study) are higher than the ones estimated at the high school/curricula level in all the macro areas with the exception of the Islands. This result indicates that, as expected, the role of the field of study is more important when considering the choice to enroll in forced non-local universities.

To inform on the relative importance of each level of the cross-classification, Table 7 reports the variance partition coefficients (VPC) for each estimated model. This measure can be interpreted as the proportion of the unexplained variability in students’ choices at each level of the multilevel structure (Leckie 2013; Snijders and Bosker 2011).Footnote 8

Table 7 Variance partition coefficient

The VPCs show that the differences in students’ high school backgrounds play a crucial role in their decision process that depends on students’ macro areas of residence. Indeed, the share of unexplained variation in students’ choices for the contrast local vs free non-local that is explained by the variability at the high school/curricula level ranges from 24.38% in the North-West to the 53.41% in the Islands where the high school effect appears to be more important. As expected, the importance of high schools is lower when considering the contrast forced vs free non-local where the differences among fields of study have a prominent role. In this case, the VPCs associated with the high schools range from 9.17% in the North-West to 35.55% in the Islands. Therefore, these results indicate that the high school effect is weaker in northern regions with respect to the rest of the country.

Therefore, the results show that, once we account for covariates, the high school maintains a relevant role in affecting students’ mobility choices, particularly if we consider the choice regarding the local university and students that reside in central or southern regions and the Islands.

Another valuable piece of information in Table 4 is the estimated covariance between the random intercepts in the two levels considered. As we can see, the estimated covariance at the high school/curricula level is very small when considering students residing in the South and the Centre of the country. These values become more relevant when considering the other areas but do not indicate the existence of a strong correlation between the two alternatives. This suggests that, from high schools’ perspective, the two alternatives are unrelated: schools that advise students on local universities are not likely to suggest to their students to enroll also in a forced non-local university. This finding is somewhat surprising considering that the evidence presented in Sect. 5.1 indicates that students who choose a local university are similar to those that choose a forced non-local alternative.

A similar interpretation also applies to the estimated covariance at the field of study level, which results not significant with an estimated standard error that is often higher than its point estimate. This element suggests that the distributions of the random intercepts in the two contrasts are not correlated.

To highlight the importance of the high school in shaping students’ choices, Figs. 1 and 2 depict the relationship between predicted choice probabilities and the estimated random intercepts at the high school/curriculum level. Negative values for the random intercepts can be interpreted as a propensity of high schools to refer students toward a free non-local alternative. In contrast, positive values indicate that schools have a positive effect in steering students toward local or forced non-local alternatives, depending on the contrast considered.

The plots in Fig. 1 are built up by computing students’ choice probabilities according to the model estimated for their macro area of residence and then looking at how these probabilities vary when the high school effect changes for a student with average characteristics. The first plot considers the contrast local vs free non-local while the second considers the contrast forced non-local vs free non-local. From the first plot, we can notice that the high school has a key role in the comparison between local and free non-local universities. If we consider the average school (i.e. \(u_{0k}=0\)), we can notice that the choice probabilities related to local and forced non-local alternatives are very similar between 20% and 30% with an advantage for the former. If we move from the average toward negative values, we can see that the probability of choosing local universities decreases with an increase in both free non-local and forced non-local choice probabilities. At the same time, if we consider schools that have a positive propensity to point students to local universities, we can see that the probability related to local alternatives increases by surpassing the one related to free non-local alternatives when the value for the random effect reaches a value close to 1, which is lower than the lowest standard deviation estimated for the distribution of the random effect \(u_{0k}\) (1.19 for North-East). This element suggests that schools have a relevant role in steering students toward local alternatives even when the estimated high school effect is not very different from the average.

Fig. 1
figure 1

Relationship between predicted choice probabilities and the high school effect. Note: The figure reports two plots depicting the relationship between the predicted choice probabilities and the estimated high school effect, defined as the set of high school/curriculum level random intercepts \(u_{jk}\). The choice probabilities are computed by considering a student with average characteristics and by changing only the value of the random intercept considered according to the model estimated for students’ macro areas of residence. The first row (plot a) shows the results for the local vs free non-local contrast (\(u_{0k}\)), while the second (plot b) regards the forced vs free non-local contrast (\(u_{1k}\))

From the second plot in Fig. 1 we can confirm that the high school has, in general, a less important role when considering the contrast forced non-local vs free non-local. Indeed, comparing the two plots, we can notice that the probability to choose the forced non-local alternative becomes prevalent with respect to both alternatives only when considering values of the random effect that are higher than 1. Namely, to make the student indifferent between the three alternatives the high school effect has to be stronger. This element is because the average predicted probability of choosing a forced non-local is much lower than the one related to the other alternatives (see Table 6).

Focusing on the high school effect with respect to the contrast between local and free non-local alternatives, Fig. 2 reports the relationship between the predicted choice probabilities and the high school effect considering the macro area of residence. Namely, each plot depicts how choice probabilities change as the school effect increases for the average student residing in a region of the considered macro area.

Fig. 2
figure 2

The high school effect by macro area of residence, Local vs Free non-local contrast. Note: The figure reports the relationship between the predicted choice probabilities and the high school effect for the contrast local vs free non-local (\(u_{0k}\)). Predicted probabilities have been computed considering the average student in each macro area of residence. The dashed lines indicate the units with an average value of the random intercept equal to zero and those with a value of \(+/-\) one standard deviation

As we can see, the relationship between choice probabilities and the high school effect changes in each macro area considered. In line with the results discussed in Sect. 5.1, we can see that, if we consider the average school, the free non-local alternative is prevalent in the North-East while local alternatives are preferred by students in the Centre and in the North-West. In contrast, free non-local and local alternatives present similar choice probabilities in the Islands and southern regions. The results indicate that in these regions these two alternatives are perceived as similar and that even a small deviation from the average may have a relevant role in steering students toward local or free non-local alternatives.

Therefore, while in the Centre and North-West of the country there is an average tendency to choose a local alternative, in the other areas the high school effect has to be positive to keep students in their local area, especially if we consider students in the North-East. The difference between the North-East and the North-West can be also related to the heterogeneity in the university supply in these regions. Indeed, the North-West is characterized by the presence of important universities such as those hosted in Milan (e.g. Polytechnic of Milan, Milan-Bocconi). Moreover, this element helps to explain also the similarities between North-West regions and the Centre, where some important universities are hosted in Rome (e.g. Sapienza).

Nevertheless, these results indicate that high schools with a high than average propensity toward local universities may help in counteracting the asymmetric flow of students from southern to northern regions.

To better explore the role of high schools in the country, Fig. 3 shows the geographical distribution of the high school effects in Italian provinces. In particular, we have computed the average high school effect in each province by considering either students who have attended a liceo (on the left) or students who have attended a technical or vocational institute (on the right). In each map, the blue areas indicate that, on average, the high school effect is positive toward local universities, while the yellow-red areas indicate that, on average, the high schools are orienting more toward the free non-local alternative.

Fig. 3
figure 3

The high school effect in Italy. Notes: The two maps consider, for each Italian province, the average value of the estimated high school effect for the contrast local vs free non-local considering the high school curriculum liceo (on the left) and the technical and vocational institutes (on the right). The high school effect is defined as the posterior estimate for the random intercepts \(u_{0k}\) obtained in the pooled CCMNL estimated for each macro area. The positive values (i.e. schools orienting toward local universities) are shown in blues, while negative values (i.e. schools orienting toward free non-local universities) are shown in yellow-red colors. For each observed university, the black dots indicate the town hosting the most important branch in terms of enrolled students

From the maps, we can highlight some important elements. First, as noted with respect to Fig. 2, the results concerning the liceo show that the high school effect is positive in the provinces that host very big universities like Milan, Turin, Florence, Naples, Rome and Bologna. Instead, if we look at Technical and Vocational high schools we estimate an average positive effect also in provinces that are more peripheral with respect to important universities. Moreover, we estimate a negative effect in Rome, Milan and Bologna. This result confirms the ones discussed with respect to marginal effects in Table 6: students in Technical and Vocational high schools prefer to stay in local or forced non-local universities. Considering Islands, we can see the attractive role of the two cities of Cagliari and Palermo. Another important element is the absence of a clear asymmetry between the northern and southern regions of the country. Indeed, we have areas with high schools with a strong propensity toward local alternatives in all the regions and geographical areas. These elements suggest that high schools may contribute to polarizing students’ choices toward the universities that already attract higher numbers of students.

This evidence, together with the results reported in Fig. 2, helps to explain the existence of an asymmetric flow of students from the South and the Islands toward the northern and central regions and to provide some policy insights. Indeed, the results show that students in southern regions and the Islands prefer free non-local alternatives over local ones. However, the local alternative becomes prevalent in presence of a small positive high school effect toward them. However, as we can see from the map, this is the case only in the presence of big cities and universities that, in turn, are less common in these regions. However, this result suggests that an increase in high school policies that encourage students to choose local universities may play a role in counteracting the so-called North–South divide.

6 Conclusion

This work provides information related to the Italian context on how much of the divergences in students’ mobility choices can be explained by differences in their high school backgrounds. Indeed, high schools are the environment where students interact with their peers, develop their capabilities, and form their expectations on their university careers. All these elements interact with established networks between high schools and local universities and the presence of promotion activities to inform students’ choices.

Moving from this framework, our analysis uses the administrative data on Italian university students to infer the factors that drive students’ mobility decisions toward local or non-local universities. This is done by comparing the choices of students that have attended the same high schools and applying a set of CCMNL that jointly considers the clustering of students in macro areas of residence, high schools, and the heterogeneity in their choices that depends on the chosen field of study. Moreover, we define students’ mobility choices using a non-deterministic approach to account for the uncertainty in the definition of students’ local area, overcoming issues related to the arbitrariness of administrative boundaries and thresholds.

Our results indicate that the macro area of residence is one of the most important determinants of students’ choices concerning students’ characteristics. The probability of choosing a free non-local alternative is higher for students from the South, the North-East or the Islands than for those living in central regions of the country and in the North-West. Moreover, we report evidence that the effect of students’ characteristics is similar for local and forced non-local and, therefore, that these two alternatives attract the same kind of students. An exception is given by high school commuters that, on average, are more oriented toward forced non-local alternatives.

A different picture arises from the analysis of the high school effect, which provides innovative findings in understanding the role that schools play in influencing students’ choices. Indeed, our results indicate that high schools have a different propensity to refer their students to local and forced non-local universities. While the differences between high schools explain between the 24.38% (North-West) and the 53.41% (Islands) of the unexplained variability in students’ choices in the contrast between local and free non-local universities, this effect is lower when considering forced non-local universities. In this case, differences between high schools explain between the 9.17% (North-West) and the 35.55% (Islands) of the unexplained variability. This difference is common to all the Italian macro areas. Moreover, the high schools’ influence on students’ choices is stronger than the one associated with the chosen field of study only when considering the local alternative, with the exception of students in North-West regions, where differences in the field of study explain 41.55% of the unexplained variability. In general, the differences among fields account for a percentage of the unexplained variability that ranges from 4.17% in Islands to 41.55% in the North-West.

These results show that high schools are relevant in affecting students’ choices, especially when considering local universities. Moreover, the weak correlation between the posterior estimates distributions of the random effects at the high school/curriculum level indicates that, from high schools’ point of view, local and forced non-local universities are considered two different alternatives: schools that incentive students toward local alternatives do not also suggest forced non-local universities. This element can be interpreted as evidence of strong relationships between schools and nearby universities.

As for the differences between macro areas of residence, the results show that for students from southern regions and Islands, even a presence of a small positive effect toward local universities can counteract the general propensity of these students to prefer free non-local alternatives with respect to students in the other regions. Instead, students in central regions or the North-West prefer a local university even in the absence of a positive high school effect. For these students, free non-local and forced non-local alternatives become relevant only when considering schools with a strong attitude toward these two categories. Finally, students residing in the North-East seem to prefer free non-local alternatives on average and that are less sensitive to the high school effect. Indeed, students in high schools located in these regions are indifferent between local and free non-local alternatives only in high schools that present a stronger than average high school effect toward local alternatives.

With respect to the distribution of the high school effects in the country, our results indicate that high schools have a higher propensity toward local alternatives when considering provinces with cities that host the most important universities in terms of students such as, for example, Milan, Turin, Naples and Rome. These elements suggest that high schools may contribute to polarizing students’ choices in these areas of the country. However, policies that improve the relationship between high schools and local universities may help in retaining students in their local area.

Finally, the approach has shown that the results are similar across different specifications even when choosing different values for the thresholds \(\delta _m\) used to define students’ mobility choices. These results allow us to consider the main findings from this analysis robust to the uncertainty in the definition of students’ local areas.

As far as the authors know, there are several elements of novelty in the analysis proposed. Namely, we advance a robust definition of students’ local area, considering university accessibility and the supply of tertiary education institutions instead of administrative geographical boundaries. This approach should overcome the possible bias due to the absence of information on students’ travel habits in the administrative archive by providing multiple classifications of students’ choices according to their travel distance. Moreover, it is the first contribution that assesses the role of high schools’ policies in influencing students’ university choices toward non-local universities, which relies on Italian administrative data related to more academic years. The main limit of the analysis is related to the lack of information on students’ socioeconomic profiles and, thus, the impossibility of disentangling the school effect from other confounding factors related to students’ and families’ income and wealth or to parents’ profession and education. However, the MOBYSU.IT database contains also information on INVALSI’s Economic, Social and Cultural status (ESCS) for all the students enrolled in an Italian university between 2018 and 2019. This indicator measures students’ socio-economic and cultural background based on parents’ education and employment, and the availability of specific resources at home. Since this variable is observed only in two cohorts, we are not able to use it to account for students’ socio-economic backgrounds in the estimation. Nevertheless, to investigate the presence of a relationship between students’ socio-economic background and the high school effect we have gathered the data regarding the average ESCS for all the high schools that are observed both in these two cohorts and in our database. The results show that the random effects that measure the high school effects are not correlated with high schools’ average ESCS. Despite this result is not conclusive, it supports the robusteness of the main finding indicating the high schools’ effect on students’ choice is not related to students’ socio-economic background.Footnote 9