Introduction

Knowledge and knowledge mobility have enormous social impact. The location choice of scholars, who are in the upper tail of the human capital distribution, has a measurable effect on knowledge formation, productivity and innovation. Bahar and Rapoport (2018) show that the location choice of high-skilled migrants draws the knowledge diffusion path, which has relevant effects on productivity. Location and birthplaces analyses can also disentangle economic competitiveness and complexity. When there is skill-complementarity, more diverse birthplaces result in more complex economic systems and higher economic growth (Alesina et al., 2016; Bahar et al., 2019; Docquier et al., 2020). The connection between human capital and modern economic growth is also confirmed in theoretical and empirical results. The essential role of education for adaptating to changing contexts and for driving modernization was first claimed by Nelson and Phelps (1966). Lucas (1988) modeled the positive spillover effects of education, and more recently many empirical studies have presented clear evidence about the essential role of human capital and its institutions in improving current societies (Agasisti et al., 2019; Barra & Zotti, 2017, 2018; Barro, 1991, 2001; Beine et al., 2001; Cohen & Soto, 2007; Cottini et al., 2019; Hanushek & Woessmann, 2008).

Notwithstanding its fundamental role in socio-economic progress, investments in knowledge and educational institutions are not linear over time. There is a non-steady process of human capital accumulation, with periods of growth, decline, and recovery that may seed a new cycle of expansion (Artige et al., 2004). Italy is an interesting example of this fluctuating process: it dominated intellectual activities until the late Renaissance, but its contemporary cultural and educational system has weak incentives (Checchi et al., 2021). Italian universities outperformed learning institutions in other countries from the 14th century to the first part of the 16th century (de la Croix et al., 2022), but only three universities are counted among the 200 top institutions in the 2022 QS World University RankingsFootnote 1.

I study the mobility of modern day Italian academics, and how they choose where to develop their career, to learn more about the modern university system of the peninsula. I take several factors into account: distance is increasing the cost of travelling, but it might be offset by the skills and knowledge acquired by the scholar and by the prestige of the university. I estimate professors’ location choice as a function of distance and quality, given the location of universities. I map the current academic market with its professors’ human capital and compare this to previous eras of Italian academia. By comparing how scholars are moving nowadays to how they moved in the past, I locate Italy in the fluctuating cycle, estimate the path of Italian academia, and map out future directions.

For my research I have built a new dataset of contemporary Italian professors in the economic field, capturing information about their origins and their individual quality. To collect information about the birthplaces of live persons I had to secure privacy authorizations, which could have hindered data collection. To overcome this missing data issue, I used a more accessible proxy: the location of professors’ lowest level of education. Once I had a value for the birth and/or education location, I used a Principal Component Analysis (PCA) to build a composite indicator of individual quality out of eight bibliometric indexes.

I use a Random Utility Model (RUM), and specifically, a multinomial logistic regression to compare scholars’ utility of living in a region other than their birthplace (Bertoli & Moraga, 2013; Bertoli & Rapoport, 2015; Beine et al., 2016; Ortega & Peri, 2013). I limit the analysis to choices made within the academic world, given the impossibility to consider the choice faced by academics when they decide whether to become a professor or to follow other career paths. I use the approach developed by de la Croix et al. (2022), who study the European Academic Market from 1000 CE to 1800 CE, to compare past and present outcomes in academia. My main estimations rely on information about geographical distance, individual quality of current professors (human capital hereafter), and aggregate quality of Italian universities (notability hereafter). (1) Agglomeration investigates whether Italian scholars are attracted by universities with higher notability, (2) positive sorting tests whether scholars with higher human capital weigh the notability of universities higher than do professors with lower individual quality, through the interaction term between human capital and notability, and (3) positive selection questions whether scholars with higher human capital move further, utilising the interaction between human capital and distance. There is an extensive literature showing that better-educated individuals are the most mobile portion of the population, with their higher growth perspectives giving them stronger incentives to move (Beine et al., 2021; Docquier & Marfouk, 2006; Faggian & McCann, 2009; Handler, 2018; Schiller & Cordes, 2016; Zhao et al., 2021). Grogger and Hanson (2011) showed precisely that highly educated people are more likely to move (positive selection) and that these highly-specialised migrants choose destinations that compensate knowledge better (positive sorting).

My hypotheses assume the presence of strong complementarity in knowledge and skills (Easterly, 2001), which leads to positive assortative matching: working with better scholars (i.e., with higher human capital) would increase the marginal gain of each professor, and this increase is greater for better scholars than for academics with lower human capital (Kremer, 1993). High complementarity also explains why more notable universities, populated with better scholars, attract more and more high-quality professors which may initiates a virtuous cycle of human capital accumulation.

I estimate these effects for Italian academia, and find that the standard distance effect is negative and has a magnitude in line with migration literature (Beine et al., 2011). To study agglomeration I include attractive features of the city in which the university is located (size and wealth), in addition to notability. The latter is positive but not significant, signalling that the relevance Italian universities’ quality can be improved by public policy to increase its attractive power for contemporaneous scholars. Agglomeration is instead driven by the disposable household income in the city (i.e., city wealth). However, the estimator for the size of the city is negative, implying a dispersion effect—although with a lower magnitude than agglomeration. This finding is crucial for understanding mobility patterns and policy directions: it is essential to attract high-skilled people to create a dynamic context and generate positive spillovers for society, which may lead the country into the virtuous part of the cycle (Grogger & Hanson, 2015; Kerr et al., 2016, 2017; Stephan & Levin, 2001). I also find evidence of positive selection and positive sorting. Indeed, positive selection (interaction between distance and individual quality) is a solid result, which confirms that the higher the individual quality, the stronger the incentives for the scholar to travel to progressively better destinations. Positive sorting (interaction between human capital and notability) has a weaker significance level than selection. The weakness of sorting is due to the structure of Italian higher education, which may be still influenced by the traditional seniority-based system (Capano, 2008; MacLeod & Urquiola, 2021; Rebora & Turri, 2008). Reforms to increase the autonomy of universities, like the decentralization of the recruitment process of university professors,Footnote 2 may have led to the greater importance of local excellence for mobility decisions. However, such reforms are too recent to be in the current project, and so sorting only reaches a low level of significance. The seniority-based system may explain why Italian universities lost their leading position: there is evidence of a highly significant positive sorting only until 1526, which fades towards 1800. The sorting effect only regains power in the sample of present-day scholars, but the significance of current sorting is not as strong as at the birth of Italian universities. This is probably due to the very recent academic reforms. Either way, these results are key to understanding Italy’s current position in the cycle, where it seems to be new momentum for Italian universities. In addition to the main regressions, I test for gender differences and find no significant outcomes. Men and women have similar patterns of mobility in Italy, but women represent only 30% of the sample. I do find important differences between public and private universities. A variant of the standard logit model shows a greater expected utility for scholars in choosing private universities over public ones. This bolsters the argument in favour of a more autonomous, excellence-driven academic apparatus.

My analysis contributes to the migration and knowledge-based mobility literature. To the best of my knowledge, much of this literature deals with more general samples of high-educated/high-skilled people (Beine et al., 2011; Docquier & Marfouk, 2006; Grogger & Hanson, 2011; Handler, 2018; Kerr et al., 2016, 2017). Only a few articles investigate the mobility of academics or scientists. Stephan and Levin (2001) find evidence of the extra vitality brought by foreign scientists (foreign-born and foreign-educated) to the U.S. in the fields of Science and Engineering (S &E). Grogger and Hanson (2015) study the mobility of foreign-born students in S &E after earning an American Ph.D. degree, claiming positive spillover effects for destination countries. The migration of German-affiliated researchers is addressed in Zhao et al. (2021), who find a net outflow of researchers from Germany. The current research keeps the focus on the academia and aims to integrate the knowledge-based mobility literature about the Italian university system. To the best of my knowledge, published papers on Italian scholars have studied the role of individual quality on selection processes (Checchi & Verzillo, 2014; Checchi et al., 2014) or its link with the competition and incentives generated within the Italian scientific sector (Checchi et al., 2021). There have also been some case studies connecting mobility and human capital of professors [see Abramo et al. (2022) for Italy, Ejermo et al. (2020) for Sweden, and Aksnes et al. (2013) for Norway]. These researches focus on the effects of professors’ mobility on their perfomance. This project investigates the same relationship, but the other way round: it considers human capital as a possible driver for researchers’ mobility. Within the Italian university system, mainly student mobility has been analysed (Agasisti & Bianco, 2007; Bratti & Verzillo, 2019; Triventi & Trivellato, 2008), and no previous works have investigated the drivers of scholars’ location choice in Italy.

Data sampling

Institutional context

Italy is home to the oldest University in EuropeFootnote 3 and has a long tradition of literates and scholars such as Giovanni Boccaccio, Leonardo da Vinci and Galileo Galilei, who belong in the upper tail of the human capital distribution. The Italian academic system has interesting peculiarities, which are worth mentioning before the empirical analysis. Italy’s education system was centralized for a long time (Cottini et al., 2019), making it subject to the whims of the governing body. This increased the importance of hierarchy within academia, based on informal relationships between the most important chaired scholars and government ministers (Capano, 2008; Rebora & Turri, 2008). In the 20th century, this centralization of the system was intended to reduce the inequalities in the Italian education system (Cottini et al., 2019; Triventi & Trivellato, 2008) and there were some positive outcomes. Social mobility improved (Barone & Guetto, 2016) and performance among geographical areas converged (Baldissera & Cornali, 2020), but academia remains seniority-based (Rebora & Turri, 2008), not only in Italy but throughout Europe (MacLeod & Urquiola, 2021). In 1946, to improve the functionality of the system, the universities’ autonomy principle (art. 33 paragraph 6) was defined in the Italian Constitution. This precept aimed to underline local excellence (Checchi & Verzillo, 2014), giving each university the autonomy to hire eligible professors. However, this Constitutional principle entered into force only at the end of the 90s, due to the lack of technical standards. The actual implementation of the reformsFootnote 4 fragmented the Italian academic market, and it retained some elements of the seniority-based apparatus (Bertola & Sestito, 2011; Bini & Chiandotto, 2003; Cottini et al., 2019; Rebora & Turri, 2008). Among the other modifications, it is important to note that Berlinguer’s decree (DPR n. 390/1998) shifted recruitment from a national to a local process.Footnote 5 Some could argue that the decentralization of the recruitment process may have had a negative impact [i.e., more opportunistic behaviour and nepotism (Perotti, 2008)] on the scientific productivity of selected professors. Nevertheless, Battistin et al. (2014) show no significant changes in the quality and meritocracy of the university system after the decentralization in 1998. In 2010,Footnote 6 the selection procedure was modified again and became a two-stage process. Nowadays, a scholar has to pass a national open competitive exam to be eligible, and then must win a local contest to be hired by a university (Checchi & Verzillo, 2014; Rossi, 2016).

In a system where seniority was the main driver of an academic career, quality and individual ability may be irrelevant. However, Checchi et al. (2021) found evidence of the opposite. They showed how the most productive scholars are those who responded best to an increase in the level of competition within the university sector, even in the presence of weak incentives.

The literature about mobility in Italian academia is thin and mostly focuses on student mobility (Agasisti & Bianco, 2007; Bratti & Verzillo, 2019); I have not found any literature on drivers of professors’ location choices. Insight into scholars’ mobility within the country, and a characterization of the forces that attract them to an institution, can inform public policy.

Professors and universities

This research is based on a new dataset. The data collection started with RePEc’sFootnote 7 ranking of the “Top 25% Institutions and Economists in Italy”. I decided to focus on scholars in the economic field to exploit the high quantity and quality of information provided by RePEc website. It uses the EDIRC database (Economics Departments, Institutes, and Research Centers in the World), which includes universities, public agencies, central banks, independent research centres, and associations [for more details see sect. 2.3 in Zimmermann (2013)]. Each institution gains from every author’s affiliation RePEc collates, implying an advantage for more populous entities (sect. 6 in Zimmermann, 2013; Seiler & Wohlrabe 2011).

For the present work, only the universities in Table 10 (Appendix) will be taken into account. In Table 10 (Appendix) there are 16 public universities, one polytechnic (UNIVPM), and four privately founded universities (BOCCONI, CATT, FUB and LUISS). I consider as privately funded a university not fully funded by the State: BOCCONI and LUISS may be defined as fully private, while CATT and FUB as hybrid, receiving funds from both public and private institutions. I include all of them in the privately funded group given their different funding system with respect to fully public universities.

Each institution includes a list of members (registered in the RePEc Author Service) and I include these observations in the dataset.Footnote 8 The people registered on the server have different roles inside academia. In this study, I only include professors—full, associate, adjunct and assistant—and research fellows (also postdoctoral).Footnote 9 I include a few emeritus professors who are still teaching. Only scholars who are active in teaching are included in the sample: I call this a “teaching disclaimer” and it captures emeritus professors and academics taking part in visiting programs or national/international collaborations. Hence, a visiting professor is only included in the sample if she explicitly mentions her teaching activity at the host university. Scholars “on leave” were not considered part of the sample, given the absence of the teaching disclaimer. This rule excluded research centres like CEPR, IZA, CESifo, given the honorific nature of their appointments. Table 11 (Appendix) presents the precise taxonomy for the scholars included in the dataset, with quantities and percentages.

Once a scholar is identified, they are associated with their university. This process required a careful investigation for each academic. The Curriculum Vitae (CV) was the main source, but where it was out of date or incomplete I used LinkedInFootnote 10 and personal web pages (institutional and/or private). I used the most updated affiliation at the moment of consultation.Footnote 11 Affiliations to telematic universities were not taken into account and research centres were excluded. For those universities with multiple locations, I counted the main location, assuming that the majority of the scholars teaching in one location are also teaching in the other(s). This can generate some bias when locations are far away from each other as in the case of Catholic University, with four locations, in Milan (main building), Brescia, Piacenza, and Rome. I discuss the robustness check for this in the Appendix 7.

Some scholars are associated with more than one university, in Italy or abroad. Multiple affiliations comprise 7.06% of the sample, with a maximum of four affiliations. In the past, academics linked with multiple institutions were associated with high-quality scores (de la Croix et al., 2022), whereas nowadays it is more common to encounter multiple-affiliated scholars with low bibliometric indicators. Usually, these academics are younger and have a postdoc position in a university while teaching in another institution. Empirically, each affiliation of the same scholar is treated as if it was chosen by different individuals, leading to their overestimation with respect to unique-affiliated scholars (see section “Robustness checks”). In the following part of the paper, the former will be called repeated movers (RM), and the latter single movers (SM).

Treating multiple affiliations in this way, the initial sample counts 1440 observations. A cleaning process removed from the sample scholars who are no longer members of the Italian academy, Ph.D. students, non-teaching emeritus and visiting professors, and those who are on leave.Footnote 12 The cleaning process reduced the sample to 1077 names. This procedure identified 76 universities, of which 39 are foreign universities and 37 are Italian. From this set, universities with fewer than 20 scholars have been excluded, given their minor relevance for academics’ choice and to have balanced choices. In the Appendix 9, I also show the main results considering a threshold of 5 scholars per university. The different threshold implies less balanced choices but a more comprehensive dataset (Fig. 1). The resulting list is the set of choices each professor faces when maximising their location decision. With the threshold at 20 scholars, this choice set has 17 universities, all of which are Italian. The number of scholars in the database decreased to 936 observations, the percentage of multiple affiliations is now only 3.10% and the maximum number of associations decreased to three. From here onwards, this is the subset for analysis.Footnote 13 Table 1 summarises the differences between the original dataset and the subset obtained after dropping universities with fewer than 20 scholars.

Table 1 Comparison between datasets
Fig. 1
figure 1

Histogram of universities’ department size. The black line shows the threshold at 5 scholars and the red line shows the threshold at 20 scholars. The universities on the left are those excluded from the analysis at the respective threshold

Data on locations

In my analysis, I study the distance a scholar is willing to travel to a given university to develop her career. I treat distance as an increasing cost for the individual. The further she is from her point of origin, the greater the distance and the higher the cost (Schwartz, 1976), also in terms of family attachment. I collect the birthplace for each observation, and treat this as an observable proxy of scholars’ usual life context. Using other locations, like residential locations, would involve some endogeneity issues, which are avoided by using birthplaces (Barbieri et al., 2011). Other variables are not observable—where academics’ families live is non-observable, as is the location of their partner’s employment.

CVs and personal webpages were the main sources for affiliations, given that neither LinkedIn nor RePEc provide birthplace information.Footnote 14 Only about 30% of the sample indicated their place of birth somewhere in their public profile. Although information about living persons is abundant and often easy to access, bureaucratic and privacy authorizations, which are essential to protect personal information, slow the data collection process. Instead I sent direct emails requesting this information, increasing by about 55% the number of birthplaces collected.Footnote 15 This gives me a known birthplace for 87.07% of the academics (815 observations).

I included in the dataset the location of the institution where each scholar obtained her lowest, publicly-stated degree of education. I consider this another proxy for birthplace, given that the two are likely to coincide or be reasonably close. To deeply investigate this, I study a sub-sample of 789 scholars in my database, for whom I know both their birthplace and their place of education. I consider a scholar to be born and educated in the same cultural environment when the place of birth and education are not further than 60 kilometers.Footnote 16 Table 2 presents the percentage of scholars who studied, respectively, in the same place where they were born, not further than 30 kilometers and not further than 60 kilometers. More than 60% of this sample was born and studied in the same socio-cultural context. This confirms that the place of education can be used as a proxy for the birthplace.Footnote 17 This measure increased the dataset to 904 observations, reaching coverage of 96.58%. For the majority of the sample, the lowest level of education is the bachelor’s degree, but some academics mentioned also the high school. Only for a few observations, the lowest education level available was the master’s degree, while for five scholars only information about the Ph.D. is known. Given their small number (only 5 out of 936 observations), and given the location of their Ph.Ds: four of them received their doctorate from the same university (or a close one) that they teach at, and only one obtained their title abroad, an ad-hoc robustness check was not necessary. For most observations I found educational information in CVs or LinkedIn profiles, and if I could not find it online I requested it with a direct email.Footnote 18 However, education information is missing for 32 observations (3.42% of the sample) and they will be excluded. I implement two different regressions: birthplaces and locations of the lowest level of education analysis (see section “Main results”).

I match decimal coordinates to location data,Footnote 19 giving me a dataset with i observations associated with a geo-localized birth and/or education site, and k geo-localized universities.

Table 2 Descriptive statistics: scholars with both place of birth and education

Data on quality

For quality indicators I collect aggregate quality scores (notability) and individual bibliometric indexes (human capital). The former are at the university level, while the latter are associated with each scholar.

Notability indicators may suffer from endogeneity, because university scores are related to the quality and quantity of scholars. I address this by using past indicators. The average age for Italian academics is 48 years (Morana, 2020) and careers usually begin at around 30 years, so I look for quality indicators from 20 years earlier. The RePEc archives provide aggregate quality scores for top institutions, organised by country, going back as far as 2007. Prior to that, only simple/ordinal rankings are available, and there is no institutional score. I elected to consider scores from around 10 years ago, as of December 2010.Footnote 20 Scores for top universities were collected from country rankings. The scores in these rankings are weighted averages of the credit brought by each affiliated scholar: the highest portion (0.5) of affiliation is given to the scholar’s main university and the remainder is a weighted average of the other appointments [for the specific formula see sect. 6 in Zimmermann (2013)]. This can generate some biases, for example decreasing the relevance of the main affiliation as more associations are added, as pointed out by Seiler and Wohlrabe (2011).

It was possible to assign a quality score to all 17 universities in the sample. RePEc uses reversed indexes in which lower scores indicate higher quality; I convert them to have a direct relation between indexes and quality. The notability (\(\ln Q\)) linked to each university (\(k \in K\)) can be visualized in Figs. 2 and 3.

Fig. 2
figure 2

Bubble Plot on Italy map. Showing notability indexes associated to each university: the higher the \(\ln Q\), the bigger and the lighter the colour of the bubble. Each point represents the location of \(k \in K\) institutions. Note: BICOCCA, BOCCONI and CATT are overwritten by UNIMI, which has the highest notability in Milan. UNIROMA1 and LUISS are overwritten by UNIROMA2, which has the highest notability in Rome

Fig. 3
figure 3

Histogram of universities’ notability. Grey bars define public institutions, red bars private institutions. (Color figure online)

To compute human capital there are many individual bibliometric indicators to choose from. RePEc has the top authors per country ranking (i.e. “Top 25% Institutions and Economists in Italy”). These human capital scores are the harmonic mean of various rankings based on different factors [sect. 5 in Zimmermann (2013)] and more than 800 scholars are ranked. I use the December 2020 ranking (see below for missing data).Footnote 21

In the literature, academic quality is measured by indicators provided by Web of Science (WoS—with its three subject specific ISI citation databases; Yang & Meho 2006). The WoS social science indicator goes back to 1956.Footnote 22 For a long time it has been one of the few multidisciplinary databases to assign authors’ scores based on citations from an original set of sources (Jacso, 2005; Neuhaus & Daniel, 2008). The main issue with Web of Science measures is the relative coverage: only a fraction of sources are considered, although those that are considered (i.e., journal literature) are significant (Norris & Oppenheim, 2007). However, for economics and social science, this literature is not the main way that knowledge is disseminated (Neuhaus & Daniel, 2008).

Quality-evaluation possibilities are now augmented with the automated databases Scopus, from Elsevier, and Google Scholar. The former covers a wider range of sources than WoS: it starts with an Elsevier database and it goes back to 1996 for social scienceFootnote 23 (Norris & Oppenheim, 2007; Jacso, 2005; Yang & Meho, 2006). Google Scholar is a free Google database that uses a wide range of sources, but does not identify clearly what those sources are. This gives it low reliability, which is added to weak, imprecise performance, as pointed out by Neuhaus and Daniel (2008). However, because it is free and has some of the widest coverage among bibliographic indicators, Google scholar still has value as a measure of quality (Neuhaus & Daniel, 2008).

I add to the comparison the WorldCat identities index. This database has measures for works (Worldcat Works) and library holdings (Worldcat Library) for each scholar (and organization) found in WorldCat.org and OCLC sources (OCLC Research, WorldCat identities).Footnote 24

Because no single indicator is perfect, I create a composite indicator of: RePEc score, Worldcat works and library holdings,Footnote 25 Google Scholar citations, H-index and i10-index,Footnote 26 WoS H-index,Footnote 27 and Scopus H-index.Footnote 28 To understand the information added by each indicator I use a Principal Component Analysis (PCA) to reduce the number of variables, without losing too much accuracy and information. Once the correlation between the variables is computed (Fig. 4) and their standardization is completed, the PCA compresses most of the information among the first principal components, which are new uncorrelated variables. For this research, I take the first component into consideration, because its standard deviation is greater than one and the cumulative information explained is sufficiently high (60.78% of the total—Table 3). Hence, considering the first component, the analysis gains simplicity while losing only a little portion of its accuracy. The following equation shows the factor loadings. The constant is the minimum of the first component and normalizes it to avoid negative human capital indexes. I use this linear combination of weights to represent the new individual quality index:

$$\begin{gathered} q_{i} = + 4.47 - 0.30\ln (\text{Re} {\text{PEc score}}) + 0.32\ln ({\text{Worldcat works}}) \hfill \\ \quad\quad\quad+ 0.30\ln ({\text{Worldcat library holdings}}) + 0.37\ln ({\text{Google Scholar citations}}) \hfill \\ \quad\quad\quad+ 0.39\ln ({\text{Google Scholar H - index}}) + 0.41\ln ({\text{Google Scholar i10 - index}}) \hfill \\ \quad\quad\quad+ 0.35\ln ({\text{WoS H - index}}) + 0.37\ln ({\text{Scopus H - index}}) \hfill \\ \end{gathered}$$
Fig. 4
figure 4

Correlation Matrix Plot showing the correlations between the eight different bibiliomeric indicators included in the analysis

Table 3 Principal components table

Methodology

Main hypotheses

In section “Institutional context”, I described some interesting features of Italian universities. One of the main aims of this paper is to understand how these features have changed over time. In order to achieve this objective I compare my results with (de la Croix et al., 2022). The authors tested the following hypotheses for the period between 1000 CE and 1800 CE, while I study them for contemporary times. The role of the location of higher education institutions (Agasisti et al., 2019; Audretsch, 1998; Barra & Zotti, 2017; Cottini et al., 2019; Drucker & Goldstein, 2007) is considered to be exogenous.

The current project assumes the presence of strong complementarity in knowledge and skills (Easterly, 2001, chapter 8). This property leads to positive assortative matching, where better scholars work together with other high-quality academics. The returns for working with better-skilled personalities are higher for better scholars than for their peers with lower human capital index (Kremer, 1993). Complementarity in knowledge and positive assortative matching may turn into possible virtuous cycles where notable universities, with better scholars, attract more and more high-quality human capital (Easterly, 2001; Kremer, 1993).

Hypothesis 1

Agglomeration: scholars are attracted by universities with higher notability.

I expect to find agglomeration (Grogger & Hanson, 2015; Kerr et al., 2016, 2017) although the distance covered by academics could appear shorter than in the past, with a lower magnitude of the coefficient. In Italy, the local appointment of professors may have increased the probability of finding local excellence (Checchi & Verzillo, 2014)—and the importance of networks and nepotism (Durante et al., 2011). With this hypothesis I test for agglomeration forces, such as notability of the university, and the attractiveness of the city in which the institution is located, measured by the size of the population (istat.it) and the local disposable income of private households (finanze.gov.it).

Hypothesis 2

Positive sorting: scholars with higher human capital weigh the notability of universities higher than scholars with lower human capital do.

I hypothesise that better scholars have better career prospects, and their expected gains are higher in high-quality environments (Docquier & Marfouk, 2006; Grogger & Hanson, 2015). Thus better professors would assign higher weight to the notability of the university.

Hypothesis 3

Positive selection: scholars with higher human capital move over greater distances than scholars with lower human capital.

The literature shows that better-educated people are more mobile (Beine et al., 2011, 2021; Grogger & Hanson, 2011; Schiller & Cordes, 2016), hence my hypothesis that better professors travel further.

The model

I use a Random Utility Model (RUM), a gravity model widely used in migration analysis (Beine et al., 2016; Bertoli & Moraga, 2013; Bertoli & Rapoport, 2015; Grogger & Hanson, 2011; Ortega & Peri, 2013). It determines the individual utility of living in a certain region and compares it to the expected utility from moving to alternative locations (Ramos, 2016).

I implement a standard multinomial logit model (Akcigit et al., 2016; Ortega & Peri, 2013), which is a specification of the RUM and requires perfect elasticity of demand in the academic market i.e., that there is a position available for every scholar. In Italian academia, there is a two-step hiring procedure: scholars are filtered at the national level and then at the local level. The assumption of perfectly elastic demand implies that each professor who succeeds at the national level will succeed in finding a chair that she prefers at the local level. This is a reasonable assumption, because the reforms of the university system (in 1998 and in 2010) simplified bureaucratic processes and increased the opening of vacancies (Checchi & Verzillo, 2014; Rossi, 2016). However, in practice only professors with higher individual quality can freely choose the location of their career. To account for this I include the individual human capital score in the analysis. Keeping the perspective of partial equilibrium analyses, I introduce competition variables as demand-side factors, i.e., universities’ notability, desirability of the city and individual human capital.

A multinomial logit model allows us to compute the probability that a university k, belonging to the set of choices K, is maximising a scholar i’s utility, with error terms independent and identically distributed (McFadden, 1974). Technical details are given in the Appendix 2.

Main results

In this section, I use the multinomial logit model described above to estimate the main regression of the research. First, I consider scholars for whom the place of birth is known (815 observations—87.07% of the sample). Second, as a robustness check, I use the site of their lowest level of education (904 observations—96.58% of the sample). I link each site with its geographic coordinates and each academic with a unique individual quality index, computed with a PCA (see section “Data on quality”). I discarded universities with fewer than 20 professors from the database, assuming that they have minor relevance in the total set of choices.Footnote 29 The university set counts 17 geo-localized institutions linked to their RePEc quality score (see section “Data on quality”). Because I work in logarithm terms, the estimation does not allow for zero indexes at aggregate or individual level. If a scholar does not have a positive score, I fill this gap with the lowest human capital index of the sample (794,82 for RePEc, 1 for all the other indicators). It is reasonable to assume that such a scholar does not publish as much as her peers with a positive score. However, it is possible that the sources used to compute bibliometric indicators do not accurately reflect her work, which is a known flaw in quality evaluations. I apply the same reasoning for universities with indexes at zero and link them with the lowest positive score of notability. Finally, I also take the logarithm of the measure of distance which raises the issue of zero distances, affecting scholars born in the same city where they teach. These academics bear the minimum cost of distance, which I assume to be the same as in de la Croix et al. (2022): 3,5 km, the walking distance from the Vatican city to the Colosseum, in the old city of Rome.

In the following part of the section, I describe the results of the main regression which considers scholars’ locations of birth. I use the package called “mlogit”, written by Croissant (2020). I focus the evaluation on the sign and on the significance of the coefficients of distance, agglomeration, selection, and sorting effect. I control for unobserved characteristics of universities with fixed effects in each regression, except for when I introduce agglomeration effects. In this case, I include the variables which capture the observed characteristics of the city where the university is located (\(P_k\) and \(Y_k\)—see section “The model”) and represent the reputation of the institution (\(Q_k\)). Table 4 presents some descriptive statistics.

Table 4 Descriptive statistics

Table 5 shows the results of the multinomial logit estimations with known birthplaces. The dataset counts 815 scholars (87.07% of the sample) who choose among 17 universities, resulting in 13855 possible dyadic matches.

The first column contains the basic gravity equation and highlights the negative sign of distance coefficients, \(\ln d\). This means that the greater the distance between the birthplace and the location of the university, the higher the costs and the lower the probability of finding a dyadic match. Distance coefficients remain highly significant in every specification. The magnitude is consistent with the contemporary migration literature; for example, in “Diasporas” (by Beine et al., 2011) they also find distance coefficients of around 0.7 when migrants are not divided into low- and high-skilled categories. However, this coefficient is lower than in analyses of past periods (de la Croix et al., 2022).

I add a selection effect in the second column, defined by the interaction term between human capital and distance, \(\ln q \ln d\). As expected the sign is positive, which means that scholars with higher human capital are less affected by distance than scholars with lower human capital. The high significance of the coefficient (at 1%) confirms the third hypothesis of positive selection in every specification of the model.

Column (3) shows the effect of sorting, through the interaction between individual human capital and university notability (\(\ln q \ln Q\)). The positive sign of the coefficient is evidence for positive sorting, as expected from the second hypothesis. Despite this, the significance of sorting appears weaker than selection. The sorting effect is non-significant when considered alone in column (3), but it becomes slightly significant (at 10%) in column (4) when I include selection. Sorting maintains the level of significance at 10% in the complete model [column (6)]. Finally, I compare log-likelihood (LL) values in order to compute a likelihood-ratio (LR) test: considering column (4) over column (1), the null hypothesis of no selection and no sorting is rejected at any conventional significance level (p value = 0.000).

To investigate agglomeration, I exclude university fixed effects from the regression [columns (5) and (6)], otherwise the effect of agglomeration variables cannot be identified (see section “The model”). Without fixed effects, I can study the relevance of the attractiveness of cities where universities are located. All three included variables are highly significant in column (5). The coefficient of the logarithm of population (\(\ln P_k\)) is negative, which preludes the presence of dispersion: the probability that a scholar chooses university k decreases as the city size increases. The coefficient of the logarithm of disposable income (\(\ln Y_k\)) is positive, which implies that the variable has a strong attractive force: the richer the city, the greater the likelihood a professor develops her career at that institution. The coefficient of the logarithm of university notability (\(\ln Q\)) is also significant at 1% and positive, which means that the better the university’s reputation, the higher the possibility that a scholar moves there. However, when I consider all the coefficients together [column (6)], notability loses its significance, while the other variables retain their signs, significance levels, and magnitudes—I confirm the second and third hypotheses. The first hypothesis about agglomeration also holds: although from \(\ln P_k\) there is a tendency for dispersion (given its negative sign), it is more than compensated by the attractive force of city wealth (\(\ln Y_k\)). Nevertheless, these results show that agglomeration forces are driven by the income of the city and not by the university’s reputation (\(\ln Q\)). This result reveals room for public policies to improve the relevance of Italian universities’ quality in attracting human capital.

Table 5 Multinomial logit regressions: standard logit model—birthplaces analysis, threshold at 20

Robustness checks

In this section, I substitute the data on scholars’ birthplaces with data on their lowest level of education, considered a proxy, which covers 96.58% of the sample. Table 12 (Appendix) presents the results of multinomial logit estimations when I study this proxy. Now the dataset counts 15368 dyadic matches, which associate 904 observations with 17 universities.

Only the distance and agglomeration coefficients remain significant. The sign of the former is still negative and each specification confirms the magnitude of about 0.7, although it slightly decreases compared to the birthplaces analysis. From the models without fixed effects [columns (5) and (6)], agglomeration variables (\(\ln P_k\), \(\ln Y_k\), \(\ln Q\)) confirm again the first hypothesis, with the same signs as in the birthplaces analysis.

The coefficients of selection effect (\(\ln q\ln d\)) are still positive, but not significant anymore. I find similar evidence for sorting (\(\ln q\ln Q\)), which has positive signs but not significant coefficients. These results prove that the second and the third hypotheses are confirmed only when I take into account the actual location of birth; indeed the LR test between (4) and (1) now fails to reject the null hypothesis of no effects (p value = 0.144). On the other hand, for the standard effect of distance and also for the agglomeration effect, the results remain in line with the birthplace investigation. In this case the analysis focuses on features of the universities (reputation/quality) and cities (population size and income level), aspects that do not vary compared to the previous analysis. The change of dataset affects distance and the individual level of quality, which appear in selection and sorting effects.

Given the results of both regressions, I consider the birthplaces analysis more relevant for the project. I use this as the benchmark model in the following part of the paper, where I develop further analyses.

For space constraints, I elaborate three additional investigations in the Appendix. Firstly, in Appendix 3, I correct the human capital index by scholars’ age: younger professors with similar bibliometric indicators of senior ones should receive more credit in the computation of their human capital index. Once I introduce age into the analysis, Table 16 confirms almost all the benchmark results, but the sorting effect is just below the threshold of significance. These findings indicate that the human capital index employed in the main regression was already able to capture age specificities of Italian scholars. Secondly, I check the overestimation of repeat movers with different strategies in section D. Finally, in section E, I test for gender differences in the effects found in the benchmark model, but no significant discrepancies between male and female professors are found, although women are about one-third of my sample (30.24%).

Private/public universities

As mentioned in the description of the sample (section “Professors and universities”), four of the universities originally considered are private: Bocconi University, Catholic University, Free University of Bozen and LUISS University. Private universities have more hiring autonomy and discretion around remuneration (Trivellato et al., 2016; Agasisti & Ricca, 2016), making them more attractive to better scholars. To understand how private institutions influence the benchmark estimation, I run additional regressions in the Appendix (one estimation excludes all of them, the others exlude Bocconi and Catholic University one at the time, see Appendix 7).

In this section, I develop a nested logit model to investigate further. I divide the set of universities per status s: private and public. The nested logit still denies the correlation of error terms between the two groups (private and public), but there is the possibility of error terms dependency within a nest (McFadden, 1978; Train, 2003). With this method, it is possible to test whether one type of university implies systematically higher utility. Technical details and general results are found in the Appendix 6, here I focus on possible differences between private and public institutions.

As the nested logit model is consistent with random utility maximization (details in Appendix), I can define the expected gain each scholar obtains from choosing either a private or a public university. Given the lack of nest-specific variables, this utility is only given by the product \(\lambda _s I_{is}\) (explained in Appendix 6), which varies for every scholar. Among the 815 professors considered, 127 have greater expected utility (EU) from teaching in public universities than in private ones, while 688 realize higher expected gains by affiliating to private institutions. I compare these two groups (Table 6) and the mean of the individual quality for those who prefer public universities is lower (4.02) than for those who prefer private institutions (4.76). Sorting effect is evident: better professors prefer more favourable environments. Private institutions, with more available resources, create better contexts to attract more relevant human capital.

Table 6 Descriptive statistics: groups of scholars preferring public or private universities

Comparison between the present and the past

It is interesting to compare features of the contemporaneous academic market in Italy with those of the past. I run the same logistic regression as before but I use a sample of professors who worked in Italy from 1000 to 1800,Footnote 30 the whole period considered by de la Croix et al. (2022). Agglomeration variables are not fully comparable; I cannot test the first hypothesis with the updated dataset of de la Croix et al. (2022) because the authors consider the level of city democracy instead of the average disposable income of the households (\(\ln Y_k\)).

In Table 7, column (1) summarises the other findings using present professors (i.e., professors who currently work in Italy), while column (2) involves past scholars (i.e., professors who worked in Italy between 1000 and 1800).

In both cases, I confirm standard results for distance of gravity models: the greater the distance, the lower the probability a scholar chooses to travel that route. From Table 7, the difference in magnitude between these coefficients is evident but both are still in line with the literature, which provides greater magnitude for past periods than for current times. Furthermore, the distance in column (1) is the Euclidean distance, while in de la Croix et al. (2022) it is the cost distance. However, the Euclidean distance increases linearly with the cost distance, which limits the relevance of this computational difference. The magnitude of selection effects halves in current times with respect to the past, due to changes in individual quality measures—the human quality indexes are both the result of a PCA but they consider different bibliometric indicators.Footnote 31 In the Appendix, Table 14 compares the total effect of distance now and in the past for different levels of human capital: the effect is almost the same for top scholars in both columns, which confirms the comparability of the results. There is another important difference when I consider sorting. To compute the notability of the university, shown in the second column, de la Croix et al. (2022) aggregate the 5 highest human capital indexes associated with scholars active in that institution during the preceding 25 years (for technical details see de la Croix and Stelter 2021). In the first column, I link the notability index to the RePEc score of each university as of 10 years ago (see section “Data on quality”). Finally, the significance of sorting coefficients in Table 7 is weaker for current times than for the past, when the same effect had a high relevance.

Table 7 Multinomial logit regressions: standard logit model, birthplaces analysis—comparison of results from the present and from the past without agglomeration variables

The time horizon shown in the second column of Table 7 is too broad to freely compare it with the shorter time-span of column (1). Instead I exploit the division in periods developed by de la Croix et al. (2022) to seize more directly possible changes and fluctuations of the cycle that occurred in past centuries. In de la Croix et al. (2022) there are eight time segments with a different number of observations available for each period. I follow this two-by-two partition and I group together: the 2nd and 3rd period (1348–1449/1450–1526), the 4th and 5th (1527–1617/1618–1685), and the 6th and 7th (1686–1733/1734–1800). I exclude the first two periods from 1000 to 1347, because there are too few observations, which leads to a negligible empirical relevance and less comparable results with respect to the other segments.

Table 8 shows the results. Distance is negative and highly significant in every specification. Its magnitude reflects the corresponding literature and historical period: it is higher in columns (2) and (3) than in column (1). This shift corresponds to the rise of national states and the increase in barriers and customs duties, leading to higher transportation costs. These burdens have decreased only in recent times with technological improvements and transport innovations. Selection effect is always present, with its positive sign and high relevance. Its magnitude drastically decreases in column (3) and lowers even more when the model involves current scholars.Footnote 32Sorting appears the most fluctuating effect across the time horizon, as expected. It is positive and highly significant in the first time range, when most of the major universities are already established (i.e., Bologna, Rome (Sapienza), Florence [Studium generale)] and are among the European top five institutions (de la Croix et al., 2022). These features allow me to position the 1348–1526 Italian academic market in the upward part of the aforementioned fluctuating cycle. However, sorting totally disappears in the second period I consider. Its sign is negative in both columns (2) and (3), but it is not significant in either column. These results might be due to the characteristics of the Italian academic world: its decline starts after the sixteenth century, a time of strict censorship of revolutionary concepts by the Catholic Church (Blasutto & de la Croix, 2021). Notable scholars were strongly attracted to the high quality of the first universities, but the sorting effect was diluted with the flow of time and with other universities entering the academic market. This decline in the sorting effect locates the 1527–1800 Italian university system in the downward portion of the cycle. Sorting regains its positive sign only when the model considers current scholars. In column (4), positive sorting is slightly significant, which may signal a new momentum for current Italian universities. With the local recruitment of professors and the greater autonomy of each university, quality should gain attention and importance. However, the current analysis cannot detect these reforms with confidence; they are too recent and the influence of the previous seniority-based apparatus persists. This explains the weak sorting effect in the sample of contemporaneous Italian scholars. The same structural explanation applies to the low significance level of sorting in the past [column (2) and (3)]: the strong control of the powers in charge (e.g., Catholic Church) limited the relevance of university quality while favouring more denominational sorting, which relies on membership and networks rather than on meritocracy (MacLeod & Urquiola, 2021).

Table 8 Multinomial logit regressions: standard logit model, birthplaces analysis—comparison of results from the present and from the past without agglomeration variables

To further emphasize the relevance of positive sorting in the functioning of the academic market, I estimate scholars’ choice probabilities and compute simulated outcomes with and without the sorting effect. In the Appendix 8 shows the estimated probabilities for three selected scholars, with different levels of human capital, but born in the same city. Constraining the sorting effect to zero drastically reduces the predicted probability to teach at the best universities. The effect is stronger for better scholars than for professors with lower human capital indexes: for the best scholar (A) the probability to choose the best university (UNIROMA2) halves without the sorting effect. This variation in the choice probabilities is visible for the first six institutions and lowers as I move down in the institutions’ ranking. These findings (Table 25—Appendix) demonstrate the importance of positive sorting in fostering high-quality university contexts: the effect is much larger when better professors match with better universities.

To clarify the importance of positive sorting in enhancing quality in academia, I also estimated the total academic output with and without the sorting effect. Appendix 8 presents the production function I use. I assume the elasticity of substitution between professors’ skills, \(\rho\), to be finite. This assumption is crucial because it demonstrates complementarity between professors. As \(\rho\) falls, the gains from matching better scholars in the best institutions rise, improving the total output. Table 9 presents the results for two levels of \(\rho\): the left part assumes low complementarity (\(\rho = 3\)), while the right part higher complementarity (\(\rho = 2.6\)). I estimate the total output by using both the benchmark [Table 5 column (6)] and the nested models [Table 20 column (6), in the Appendix].

Table 9 Academic market output—role of sorting

When I compare the academic output with and without sorting in Table 9, the effect is already clear: when I do not constrain sorting to be zero, the gains are always higher, and this variation is greater with the nested logit model. To fully capture the importance of positive sorting, I compute academic output with different level of complementarity between scholars. When complementarity is higher, there is a significant increase in gains, with output that nearly doubles when the elasticity of substitution decreases by 0.4. This further underlines the relevance of positive sorting in creating a high-quality academic market. This also supports the background concept of this research: strong complementarity between scholars and positive assortative matching may lead to virtuous cycles by increasing the attraction of relevant human capital towards the best environments (Easterly, 2001; Kremer, 1993).

Conclusions

Using a new sample of contemporaneous scholars, this research confirms and discloses important features of the Italian academic market. Gravity highlights a recurrent effect widely explained in migration literature. Agglomeration forces of Italian universities are driven by the average disposable income of the city where the institution is located and not by universities’ notability. This shows room for policy improvements: the quality of institutions is a strong factor for attracting relevant human capital and must be better exploited. Selection effect is also remarkably strong in the benchmark model, which implies that contemporaneous professors travel longer distances when they have greater human capital indexes. Sorting is weaker in this specification, but still significant and positive, which means that notability is more valuable for scholars with a higher individual quality index. Although it is less clear than the others, this last effect might direct the position of Italy in the human capital accumulation cycle. The difference in current and past sorting represents an important initial step for Italian academia: implementing reforms may enhance Italian universities’ notability. Policies to improve the quality of high-education institutions would stimulate excellence and in turn, would increase the attractiveness of Italian universities. This would trigger a virtuous circle for the whole economy—improving the sorting effect will feed the system with more resources, attract more remarkable scholars and increase the likelihood of innovations and economic enhancements, as demonstrated by the predicted academic market output. The United States, which has the top universities and research centers, has reaped the benefits of these positive spillovers. Since the early 1900s, sorting has been much stronger in America than it has been in Europe, where centralized systems favored equal growth of high-education institutions while simultaneously preventing the most promising ones from completely exploiting their potential (MacLeod & Urquiola, 2021). The recent reforms in the Italian system might be seen as a watershed moment: the positive achievements reached in terms of equality under a centralized system (Baldissera & Cornali, 2020; Barone & Guetto, 2016) can be bolstered by growing investments in excellence.

Future research can relax the assumption that demand in academia is totally elastic. This would necessitate the use of alternative gravity models, which do not impose the same stringent constraints as the multinomial logit. Gravity models that allow the consideration of both sides of the market can achieve a more complex general equilibrium analysis. Finally, the notability measure can be improved when using the conventional multinomial logit model. This could be accomplished by creating an index similar to de la Croix et al. (2022) to mitigate (if not eliminate) endogeneity issues with RePEc indicators.