The Role of Early-Career University Prestige Stratification on the Future Academic Performance of Scholars

This paper investigates the effect of university prestige stratification on scholars’ career achievements. We focus on 766 STEM PhD graduates hired by Mexican universities between 1992 and 2016. We rank university according to their prestige based on the pairwise assessment of quality contained in the PhD hiring networks. Further, we use a quasi-experimental design matching pairs of individuals with the same characteristics, PhD training or first job experience. Our results challenge the positive association between prestige and academic performance as predicted by the ‘Matthew effect’. Scholars hired internally sustain higher performance over their careers in comparison to those who move up or down the prestige hierarchy. Further, we find a positive (negative) relation between downward (upward) prestige mobility and performance that relates to the “big-fish-little-pond” effect (BFLPE). The evidence of a BFLPE-like effect has policy implications because hinders the knowledge flows throughout the science system and individual achievements.


Introduction
For universities, forming and hiring PhD graduates is essential for their competitiveness as young scholars will become lecturers, senior researchers and professors in their faculties. However, for early-career scholars, the transition from PhD and first job is competitive, stressful and can have consequences for their future careers (Bazeley, 2003). At the same time, the academic labor market is well-known for exhibiting a Matthew effect, where Financial support was provided through The National Council on Science and Technology (CONACYT) of Mexico. We gratefully acknowledge the comments and suggestions of Prof. Paula Stephan, Prof. Robin Cowan, Dr. Neil Foster-McGregor, and Beatriz Calzada Olvera. prestigious positions in hierarchical networks give advantages to early-career scholars across their careers (Bol et al., 2018;Teplitskiy et al., 2020;Horta et al., 2018).
The Matthew effect predicts that 'early career' gains in prestige confer researchers advantages that over time render higher research performance. The mechanism behind this success is a process of cumulative advantage in which early career prestige attracts resources (Merton, 1968;Long, 1978;Merton, 1988), increases visibility (de Solla Price, 1965;Wang, 2014;Farys and Wolbring, 2021) and collaborators (Perc, 2014). This positive feedback loop reinforces the prestige of the authors (as well as universities) and leads to higher research productivity (Allison and Stewart, 1974). This thesis has been tested extensively from sociology, economics, and studies of science and technology, presenting mixed results. One group of studies asserts that university prestige is positively associated with individual performance (Fox, 1983;Headworth and Freese, 2016;Su, 2011). For instance, the study of Bedeian and Feild (1980) concludes that institutional prestige from the PhD is more important than publication productivity prior to the first job appointment in the academic labor market. However, when the endogeneity of university prestige is accounted, the literature shows mixed results (Allison and Long, 1987;Williamson and Cable, 2003;Bair, 2003;Miller et al., 2005;Laurance et al., 2013;Appelt et al., 2015), leaving several gaps that are areas of opportunity for the present study.
The first gap that we address is the potential selection bias in the estimation of prestige and research performance. Most studies, disregard, that researchers are sorted into prestigious appointments by self-selection. To address this problem, we take advantage of a quasi-experimental design. Our treatment is the change in prestige that naturally occurs to scholars in the transition between PhD graduation and first job appointment. This change in prestige is a partially random assignment, given the exogenous choice of hiring committees. This shock allocates scholars into three groups depending on their change in prestige (treatment condition). One group exhibit a positive change or upward prestige mobility when moves from a less prestigious PhD university to a more prestigious first job appointment. Conversely, a second group displays a negative change in prestige (downward mobility) and the last group experience no change in prestige as this group is hired by their faculty. 1 . We address the selection bias, by matching early career researchers in these three groups keeping constant their individual characteristics, PhD training or first job experience. Fixing these determinants of research performance allows to observe the effect of early-prestige in the short, medium and long-run career.
A second gap in the literature is that results are typically bias towards top-tier institutions located mostly in North America and Western Europe (Clauset et al., 2015). Those universities usually have a long history and are well known and integrated internationally (Demeter and Toth, 2020). In contrast, less mature higher education systems are younger, less consolidated and operate under resource constraints. Therefore, this paper examines the subject in a large emerging economy, namely Mexico.
More importantly, the prestige of academic institutions in those contexts is less known and can be more volatile compared to more mature university systems. In developing university systems, prestige may change according to modes, tastes and new information. For example, reputation can decrease if frauds and misbehavior are found within a department. In the same way, if a department wins a large National grant, its reputation will improve. Similarly, the academic mobility of star scholars may change substantially the prestige and output of an entire department. Indeed, most of the US-based literature on prestige and hiring has found at the departmental level, that the correlation between hiring centrality and survey based measures of prestige is higher than the correlation between bibliometric production and survey based measures of prestige (Clauset et al., 2015;Burris, 2004). Additionally, in the case of an emerging economy many institutions are not listed in formal university rankings 2 and non-English literature is not present in bibliometric databases. Thus, all the mentioned reasons make the use of survey-based or bibiometric-based measures not only unsuitable for measuring prestige in this context but also practically harder. A third gap to address is then to develop a measurement of prestige suitable for less developed university systems, robust against prestige variation over time, taking into account the job market dynamics, (see Oyer 2006).
To overcome the aforementioned problem of measurement error, we estimate the institutional prestige using a ranking algorithm based on university hiring networks of PhDs. The university hiring network has been extensively studied and contains explicit hiring flows from one university to the other (Barnett et al., 2010;Clauset et al., 2015;Lang et al., 2019). Past research on these flows indicates that university hiring networks contains an implicit hierarchical order of prestige among institutions (Barnett et al., 2010;Mai et al., 2015). This hierarchy of prestige comes into existence because hiring decisions in the academic labor market are pairwise evaluations of "quality" between candidates and universities. We use the information of each pairwise assessment of quality to measure university prestige, applying the algorithm developed in our previous study (Cowan and Rossello, 2018). We argue that this measurement of prestige based on a pairwise peer review assessments goes beyond university ranks based on bibliometric indicators and subjective surveys. In particular, in this paper, we propose a dynamic estimation of the algorithm that overcomes the potential volatility of prestige in less mature higher education systems.
Using our dynamic measurement of prestige, this paper asks how university prestige stratification affects academic performance and the labor market mobility of scholars in Mexico. We resolve the endogeneity of university prestige on academic performance from several sources. To overcome the problem of reverse causality, we assess how early-career institutional prestige affects future academic performance of scholars with longitudinal data. In line with the literature, changes in prestige during early-career affect future research performance, given that initial conditions in academia confers cumulative advantages/disadvantages with long-run consequences (Bazeley, 2003;Bol et al., 2018;Lee, 2019). To overcome the problem of bias from omitted variables correlated with individual capabilities, we take advantage of a quasi-experimental design and a re-sampling technique. For our three treatment groups (Upward, Downward, or Unchanged prestige change) we control for the PhD training, the postdoctoral experience and the individual characteristics because these variables also affect academic performance. Our re-sampling technique matches pairs of scholars with similar characteristics assigned to different treatments. Firstly, we compare pairs with similar characteristics that receive the training and prestige from the same university and graduate window. Secondly, we compare pairs with equivalent characteristics, similar networks, experience and prestige from their first job appointment. By holding constant these initial conditions for similar individuals allows us to compare the effect of changing prestige (treatment) on future research performance.
Our results are based on a sample of 766 Mexican PhD graduates in STEM between 1992 and 2016 and show a highly stratified university system. At the macro level, a selected group of 10 institutions graduate and hire the vast majority of early career scholars. This stratification is consistent with more mature university systems (Burris, 2004;Barnett et al., 2010;Clauset et al., 2015). This centralization of research could promote higher levels of specialization and targeted allocation of resources. But also can reveal a structural problem of "lock-in", hindering mobility and the flows of knowledge through the national science system.
Focusing on the individual level, our results reveal a more nuanced and complex pattern than the one predicted by the Matthew effect. We find that moving up the prestige ladder does not necessarily correlate with a higher academic performance. Controlling for the PhD training and the experience gained from their first job, our results confirm that scholars (with similar characteristics) hired internally exhibit on average higher levels of academic performance. Interestingly, early-career scholars with similar characteristic and initial conditions who experience downward prestige mobility perform on average better than scholars who exhibit upward prestige mobility. We explain our results using the analogy from the big-fish-little-pond effect (BFLPE), where individual performance relates to the average performance of her/his peers. The effect that prestige has on individual performance may relate to the "profile of prestige" an individual is accustomed to. Agents might perform "better" ("worse") in an environment with a "lower" ("higher") competition where the perceived average ability of peers is "lower" ("higher"). Testing these psychological mechanisms is beyond the scope of this work, but we highlight a non-linear association between prestige and performance that can be conductive of future research.

University Prestige
The main approaches for ranking universities by prestige are input-output (Debackere and Rappa, 1995;Chan et al., 2002;Kalaitzidakis et al., 2003;Oyer, 2008;Buela-Casal et al., 2012), survey (Abbott and Barlow, 1972;Cyrenne and Grant, 2009;Moodie, 2009;Olcay and Bulu, 2017), and network-based measures (Barnett et al., 2010;Cowan, and Rossello 2018;Zhu and Yan 2017;Nevin 2019). Some of the most popular input-output based measures use bibliometric indicators. For instance, Debackere and Rappa (1995) use bibliographic records to calculate the prestige rankings for universities in the top-twenty departments of neuroscience by adding their citations. Similarly, Oyer (2008) uses the methodology proposed by Kalaitzidakis et al. (2003) to rank universities based on the contribution of universities to the top thirty journals in economics selected by their normalized citation index. Chan et al. (2002) take a similar approach and use the number of pages produced by a university faculty in top journals to rank universities. However, bibliometric based measures focus on research outputs, disregarding inputs such as graduates production and other relevant measurements of academic excellence. Buela-Casal et al. (2012) assess the higher education system in Spain using a mix of input-output indicators including journal publications, number of full-time researchers, number of R&D projects, PhD graduates, scholarships and patents.
Another way to rank universities is through surveys. This method aims at capturing the peers' perception of academic prestige that includes niches within a discipline. Abbott and Barlow (1972) use a survey of graduate faculty to rank universities in 29 disciplines, with an ordinal response scale of five levels. For Fogarty and Saftner (1993) the main drawbacks of survey methods are that they assume that scholars are unbiased of their current and previous affiliations. An additional issue with survey measures is that scholars tend to have sticky and localized information about other universities-they know better institutions of similar status that are those which are in direct competition with them (Cowan and Rossello, 2018). For instance, institutions that are closer in rank are likely to compete for the same resources (projects and grants) and publish articles in similarly ranked academic journals.
Network-based measures are a new framework to rank universities that examine a social process in which scholars recognize quality in their work. This evaluation is cross-validated by the interactions between institutions. Following these lines, the core idea of our ranking approach is that the PhD hiring networks contain information about how scholars evaluate each-other quality (Clauset et al., 2015). Finally, the methodology of Cowan and Rossello (2018), applied in the university system of South Africa, exploits the information contained in the movements in the labor market to approximate the distribution of prestige. The advantage of their method is that it directly uses pairwise assessments of quality between PhD graduates and hiring committees to rank universities. As a further expansion of their methodology, our ranking algorithm is dynamic across time. Thus, we consider the job market of variable sizes that consequentially changes the prestige of the institutions involved. A dynamic perspective to rank universities is more suited in a large, developing setting where universities are young and the distribution of prestige is potentially more volatile.

The Link Between Mobility and Performance
The literature studies the relationship between university prestige and mobility to understand how the prestige of the PhD-granting institution affects the placement and subsequent labor market outcomes. In the PhD job market, both individuals and universities have incentives to make accurate choices. On the one hand, for the job market candidate, the transition from the PhD and to her/his first academic job might affect future career perspectives (Laudel and Gläser, 2008). On the other hand, for the university, each new hire is a strategic asset that influences its human capital and competitiveness (Cowan and Rossello, 2018). However, hiring PhD graduates could be a difficult task, since early-career scientists often have few research records. In such a case, universities have little information about the inherent ability of young candidates, especially when they graduated in other universities.
Related to this, Oyer (2006) examines the job market for economists in the US 1979-2004 and finds that the job market conditions, that influence the first job placement of scholars, contains a large element of randomness. In particular, he finds that the initial job market 'luck' (i.e. favorable market conditions) affects top positions in academia that later drive research productivity between neighboring cohorts of graduates. The role of job market luck in affecting placement operates due to the asymmetry of information. The latter makes the prestige of the PhD-granting institution a signal for the unobserved skills and abilities of applicants. Indeed, past literature shows a positive relationship between the prestige of the PhD granting institution and future employment (Crane, 1970;Debackere and Rappa, 1995;Oyer, 2006;Bedeian et al., 2010;Appelt et al., 2015;Pinheiro et al., 2017;Headworth and Freese, 2016). PhD graduates from prestigious institutions tend to get "better" job compared to graduates from lower-tier universities. Similar results suggest that a prestigious PhD is as a key mechanism to alleviate the asymmetry of information between candidates and hiring faculties. Moreover, since a "good" affiliation provides opportunities, networks, and resources, this first sorting of the job market has potential long-run consequences on career achievements (Oyer, 2008;Bedeian et al., 2010). A similar early-career advantage is potentially problematic when quality and prestige do not entirely overlap. In this respect, some studies argue that institutional prestige from the PhD granting institution is more important than researchers "quality" for obtaining a first academic job (Long et al., 1979;Allison and Long, 1990;Baldi, 1995;Gerhards et al., 2018).
Another group of studies pays attention to how changes of institutional prestige relate to academic performance and academic achievement of scholars. Oyer (2008) use a longitudinal sample to estimate how changes in institutional prestige affect the academic performance of economists. He shows that, even after controlling for proxies of individuallevel ability, that early academic prestige positively correlates with academic performance measured by publication productivity. Moreover, he shows that scholars generally move down the prestige ranking over their careers. He argues that this is because high ranked universities produce a significant percentage of the total graduates that later move to lowerranked universities. Chan et al. (2002) examine the mobility of scholars publishing in 16 top financial journals. They find that upward ranking mobility is rare and that scholars who experienced it produce twice as many publications compared with average production of scholars from destination universities. They furthers show that after controlling for ability using publication productivity, the rank of the PhD grading institution predicts upward ranking mobility through their academic careers. Azoulay et al. (2014) take an alternative approach, comparing academic performance of scholars before and after upward mobility given by a prestigious academic recognition. They find that gains from upward ranking mobility have a lower effect on scholars who have above average citations than on scholars with low or below average citations. In general, these studies suggest that upward ranking mobility is associated with higher academic performance, but this is not always the case. Cowan and Rossello (2018) offer a closer look at how prestige differentials from the PhD to the first job relate to academic performance. They use a quasi-experimental methodology based on matching pairs and examine a sample of 1011 South African job market candidates in STEM. They show that scholars hired internally (maintaining their place in the prestige hierarchy) exhibit on average higher performance compared with scholars who move up in the prestige rank. Their work underlines that the link between prestige changes and performance might be complex.

The BFLPE and Similar Mechanisms
The complex relation between prestige changes and performance can relate with the relation that individuals have with their new peers and the working environment. When individuals change institution they compare themselves with new peers, therefore, their view of themselves may be odd because they have little information on the new environment.
Psychology of education examines how the social comparison affects individual performance by looking at how the average achievement of peers affects the individual academic self-concept (Marsh and Hau, 2003;Marsh et al., 2008). The main hypothesis in this literature is the big-fish-little-pond effect (BFLPE). The hypothesis suggests that a student will have a lower academic self-concept (and thus performance) in an academically selective school, where the average achievement of peers is high, than in a non-selective one (Astin, 1 3 1969;Marsh and Hau, 2003;Marsh et al., 2008;Salchegger, 2016;Rosman et al., 2020;Keyserlingk et al., 2020).
The empirical evidence on the BFLPE shows mixed results and focuses on children or adolescents at school age (Salchegger, 2016). Two recent contributions test the BFLPE looking at first-year university students. Rosman et al. (2020) find no support for the BFLPE examining 115 first-year undergraduate psychology students at the Leibniz Institute in Germany. In contrast, in a larger and more representative study, Keyserlingk et al. (2020) find strong support for the BFLPE in a sample of German students in the transition from high school to universities. However, in higher education, competition and the need of collaborating with peers is higher. In similar circumstances, additional mechanism of social comparison might influence beliefs and achievements. The additional mechanisms in the literature are peer effects and the what does not kill me makes me stronger effect.
In general terms, the literature on peer effects in academia studies whether the social comparison generates learning that in turn affects performance. Most of this literature focuses on positive peer effects. For example, Slavova et al. (2016) study whether hiring a new scientist affects the scientific performance of the incumbents in the hiring department. They examine 94 U.S. chemical engineering departments, finding that a new hire generates positive peer effects in the performance of colleagues with a recent tenure. As a limitation, their analysis does not consider which is the impact of changing department for the newcomer. Related to this, past research suggests that, depending on the level of the competition, peer effect can also be negative (Stapel and Koomen, 2005). When resources are scarce, the level of competition to access them increases and in extreme cases a newcomer can be perceived by the group as a treat. In this case, the effect of the social comparison may negatively affect the performance. An extreme case of thereof are bullying episodes or misbehavior in academia (McKay et al., 2008;Keashly and Neuman, 2010;Giorgi, 2012). 3 The way in which people are integrated or promoted in a new workplace can affect individual self-esteem and academic self-concept as well. In general, the literature associates high self-esteem and self-concept to high career outcomes. However, there are cases where this association appears to be negative (Whelpley and McDaniel, 2016;Sherf and Morrison, 2019;Li et al., 2020;Weiss and Knight, 1980). For example, in a recent contribution, Wang et al. (2019) find support for the what does not kill me makes me stronger effect. They compare publication and citation records of 561 narrow wins and 623 near miss scientists who applied for the NIH grant 4 They find that despite an early setback, individuals with near miss proposal systematically outperform those with narrow wins in the longer run. This result could be consistent with a BFLPE in early career scientists, where an initial promotion or confirmation of abilities may be counter-productive for their future performance. The complex mechanisms associating individual performance and the comparison with peers motivates us to study how prestige changes affect research performance in the transition into the first academic job.

Data
Data originates from the Mexican National Council of Science and Technology (CONA-CYT). Data were collected through the most extensive science policy of the country, the National System of Researchers (NSR), whose aim is to increase the productivity, quality and competitiveness of Mexican researchers (Gras, 2018). NSR was implemented in 1985 when the primary motivation behind the policy was the raising concern about technological capabilities and performance of the Mexican science system under the threat of inflation and budget cuts. Reyes and Suriñach (2015) describe how the policy evolved across the years, but in general, its structure is substantially unchanged.
We focus on STEM graduates, excluding from the analysis those in Social Sciences and Humanities to reduce the potential influence that schools of thought have on the PhD job market. 5 Our sample spans 25 years representing 766 PhD job market candidates hired in a Mexican university between 1992 and 2016. We present the summary statistics of the panel in Table 1. We include 36 Mexican institutions 67 and longitudinal records for each scholar of academic performance (NSR rating) and individual level controls as gender, discipline, graduation year and evaluation year.

NSR Rating
Our dependent variable is the NSR rating, which measures the academic performance of researchers using 5 ordered categories. The general framework of the NSR rating process is summarized in Fig. 1 and works as follows. Each researcher applies to the NSR submitting her/his curriculum vitae and publications. The submitted publications comprise not only scientific articles, but also books, chapters in books, patents, and technological developments and transfers. Each application is assigned to one of seven different research disciplines. Every three years, for each discipline, the NSR forms the evaluation commission which comprises 14 prominent researchers called to rate the applications. The evaluation commission works as follows. Each member of the commission evaluates the performance of the applicants evaluating all the submitted material, this following a peer-review process that ends with a grade. The CONACYT authorities supervise the quality and independence of the evaluation commissions (more details are in "NSR Disciplines and EvaluationProcedure"). 8 In contrast to bibliometric measures, a peer-review evaluation has the advantage of including a holistic evaluation taking into account the validation practices of each discipline within the country. In particular, a similar assessment of research performance considers seniority, the quality of publications, the individual contribution to co-authored works, and above all (sub-)field differences. The evaluation process ends with a rating that 6 The list of institutions is presented in the Appendix "A. Faculty Hiring Matrix Names". 7 The sample comprises the 36 Mexican universities who provide doctoral education in STEM subjects. In Mexico, universities specialized in teaching do not grant doctorates. In Mexico, there are 1250 institutions of Higher Education, including public universities, technological institutes, technological universities, private institutions, teacher training colleges, and other public institutions. Among those, universities are 213. However, 50% of the research and 58% of the students concentrates in only 45 public universities which are mostly located near Mexico City and in other large cities. 8 For more details on the NSR system see (Gras, 2018). systematizes the academic performance of researchers in 5 ordered categories. In the paper, we use those categories to measure the academic performance of individuals.

Interactive Prestige Ranking
In this section, we describe the variation of the prestige ranking algorithm developed by Cowan and Rossello (2018) for a dynamic setting. The algorithm assumes that that movements in the academic job market contain information about how universities and PhD graduates perceive each other's quality. The input of the algorithm is the hiring network defined as G = (V, E) . The vertices V of the network are the universities participating in Table 1 Summary statistics, the panel of 766 researchers  Gender 0 is for male researchers, Pr-job is the prestige rank of the university of researchers' first job. The lower the score, the higher the prestige, Pr-PhD is the prestige rank of the university of researchers' PhD. The lower the score, the higher the prestige,Δ Pr is the change in prestige from PhD to first job. The Down group is equal to one, the group who Stayed is two, and the Up group is three. Δ Pr * is the continuous change of prestige. The negative means signify that, on average, scholars move down. That is because the lower the rank, the higher the prestige. NSR rating is the dependent variable that scores research performance from one to five (the best performance) the job market and edges E represent movements of PhD graduates from one university to another. The network G is represented by a weighted directed adjacency matrix A that captures the flows of graduates from PhD to their first job institution. A's off diagonal elements ( a ij with i ≠ j )) show the number of scholars that graduated from university i and were hired by university j within 5 years after doctoral graduation. Conversely, the diagonal elements ( a ii ) are the PhD graduates hired internally (trained and hired by their faculty).
The ranking algorithm is based on two key assumptions. The first concerns universities, that is, they try to improve their status and quality and in pursuing it they try to hire from universities "better" than themselves. The second assumption considers scholars, that is, they want to be hired by the most prominent institution. If both academics and institutions satisfy those desires fully, in A exists a unique order of university names (rows/columns names) such that PhD graduates only move down the hierarchy. In other words, under this assumption, rows and columns of the adjacency matrix A can be rearranged in an upper triangular matrix such that all entries below the diagonal are equal to zero. We define this unique order o * , where the sum of rows have a global maximum score equal to s * .
Geographic location and other recruiting criteria imply that the PhD job market often departs from this strict assumptions. Thus, empirically the order is not unique, and A it is not a perfect upper triangular matrix. However, since prestige is an important selection criteria for university and scholars, we apply the heuristic algorithm proposed in Cowan and Rossello (2018) to find the set of orders that gets as closer as possible to s * and have the minimum number of violations from an upper triangular matrix configuration. To approach the underlying prestige hierarchy, the algorithm works as follows. The algorithm for k = 10000 times starts assigning to A a random order of rows, then it computes the score s k such that For 100 times the algorithm tries to improve the score s k in the following way. For each iteration, two nodes (both rows and columns) are randomly selected and swapped. If the swap does not decrease the score, we keep it, otherwise we reject it. After this 100 searches, the obtained order o k and score s k are recorded, obtaining a set of n-tuple O = {o 1 , o 2 , … , o k } orders and their associated scores S = {s 1 , s 2 , … , s k }.
From these two sets O and S, the algorithm selects the set Q = {o m ∈ O | s m ≈ s * } of orders that reach the highest score. Each n-tuple of the set Q = {q 1 , … , q m } contains a possible university rank R(v) = {r 1 , r 2 , … , r m } Then for each university the algorithm computes its prestige score according to the formula which is in other words the mean of its ranks in the set of orders with maximum score Q. 9 The prestige score of each university provides a natural ordering or ranking of universities that is our measure of prestige. (1) Further details of the procedure are described in the Appendix.
A key assumption of this algorithm is that a single adjacency matrix A captures the underlying hierarchy of prestige. This implies that all universities (nodes) and scholars participate in the labor market and the size of the market is fixed. However, movements between universities over an interval of time can be constrained by various forces.
We relax the assumption that the hiring network is fixed across time, adopting a dynamic computation of the algorithm. The proposed variation iterates the previous algorithm over closed intervals, t = [y − Δ, y + Δ] , of time centered around the PhD graduation year y, with fixed windows of Δ = 3 years. This implies that the hiring network and the scholars and universities involved are different for each time window. 10 However, not all institutions are present across t intervals, for instance, more recent universities are not listed in the early years of the sample. Hence, the final scores of our Interactive Prestige Ranking is the average score of each university i over t intervals of time.
After the computation of the ranking, we distinguish between three groups of scholars, Up, Down and Stay, that exhibited different changes in the prestige during the transition between PhD graduation and first job in the following way. For each researcher in the sample, we calculate the difference between the PhD prestige rank and the one of her/his first job institution. The difference is positive (negative) for the group of scholars who move Up (Down) which experience upward (downward) prestige mobility-they are hired by a university more (less) prestigious than their PhD. The difference is equal to zero for scholars who Stay experiencing internal hiring-those hired by their PhD institution. Table 2 shows the ranking of Mexican universities using the dynamic ranking with a 3 years time window. The rank follows from the average of the university rank computed by our algorithm across all time windows. The lower the average indicates that the university has occupied the higher positions more often across time periods. Table 3 shows the stratification of prestige in the Mexican university system. Looking at the movements in the prestige hierarchy, Table 3 highlights a high level of stratification in the Mexican university system. Where the 10 most prestigious Mexican universities produce the 68% of PhD graduates and nearly half of them are hired as a first job in those institutions. This stratification is also geographical, as mostly all top universities are located near Mexico City. Thus Mexican academia operates in a highly stratified system where public and private research funds are mostly centralized. Table 4 presents the correlation between the dynamic (d) and static (s) prestige ranking (Pr) with individual level variables. In addition of the measurement of productivity from the NSR we compute bibliometric indicators using Science Citation Index data of Web Of Science (WOS) for the period 1992-2016. First, we compare the stability of prestige over time 1 3 by comparing the Spearman correlation coefficient from the static and dynamic ranks. The high correlation between the prestige from PhD and first job, 0.64 and 0.83 respectively, indicate that prestige does vary over time but not largely. The difference in prestige ( Δ Pr) is positive (0.182) and negatively ( −0.649 ) correlated with the prestige from the PhD and first job, respectively. This pattern shows that moving up the ranking (positive difference) is associated with lower prestige from PhD university (higher in rank). Conversely, moving down the ranking (negative difference) is associated with a lower prestige from first job.

Descriptive Statistics
Next, we draw attention to the peer-review research productivity variable from NSR and the prestige variables. The results show that static (s) and dynamic (d) measures of prestige are negatively correlated with productivity (NSR). This is expected, since lower rank signifies higher university prestige. Interestingly, the correlation between Δ Pr, and all the measurements of productivity is close to zero but positive and significant. A difference is equal to zero, indicates that scholars were hired by university after PhD graduation (stayed). Lastly, we show the correlation between the NSR rating and other bibliometric variables. In general, the NSR rating is positively correlated with bibliometric measurements of productivity. However, this correlation is not higher than 0.40, which suggest that the NSR performance variable takes into account local research and other products of research such as patents (See "NSR Disciplines and Evaluation Procedure" and the previous section for a description of the NSR rating).

Quasi-Experimental Design
In this section, we examine how movements in the prestige hierarchy in the transition from the PhD to the first job affect scholars' academic performance. We take advantage of a quasi-experimental design that naturally occurs in the academic labor market of earlycareer scholars. After PhD graduation, early-career researchers are self-selected into academic positions. However, the choice of hiring committees is a quasi-random assignment that naturally clusters PhD graduates into three groups according to prestige differentials: Up, Down, and Stay. The allocation is a partially random assignment given that earlycareer researchers typically have thin publication records after PhD graduation, such that ability and research skills are mostly unobserved. Given the asymmetric information in the labor market and its dynamics, the choice of hiring committees evaluating early-career researchers contains a large element of randomness (Oyer, 2006). In this setting, PhD prestige plays an important role as a signal mechanism of candidates' quality.
To deal with the endogeneity of university prestige related to training, experience and individual characteristics, we use a bootstrap matching pairs technique. To assess the effect of early career university prestige on future academic performance, we compare the research performance of the bootstrap matches pairs of individuals with similar characteristics but different treatment (Up, Down, Stay). When individuals are paired holding constant their PhD institution, we test how changes in prestige relate to academic performance irrespective of training. Similarly, the comparison done matching scholars by their first job examines how prestige movements affect scholars performance for agents with the same first academic job. The Matthew effect of the academic labor market may amplify the role of early-career prestige on long-run academic performance. Therefore, we replicate our analysis comparing the performance of matched pairs scholars in the short (up to 2 years), medium (3-5 years) and long-run (6-25 years) after PhD graduation.
Our quasi-experimental design follows Cowan and Rossello (2018) and Way et al. (2019). The basic idea is that career movements from the PhD to the first job is a quasirandom assignment made by the job market. Individuals with the same PhD training are placed in different institutions, likewise, individuals with different training (PhD) are   placed in the same first job. Thus, our strategy compares matched couples that either received the same training (PhD) or are exposed by the same working environment (same first job) but experienced different prestige movements (treatments groups). The additional variables on which we match the pairs are: gender, age, discipline, and graduation year. However, we should remark two potential limitations and the scope for future research. The first limitation is that individuals with the same PhD training (or first job) might have slightly different productivity or performance (broadly defined). Unfortunately, we do not have prior students' data to control for that. Besides this limitation, we must highlight that our method partially implicitly controls for performance. Each PhD program has rules and quality standards with minimum requirements, both in admission and in promotion decisions. Thus, all graduates from a PhD program met at least those minimum requirements. Internal rules at universities make the performance of individuals in the same PhD batch comparable. The same reasoning applies to the first job. Internal university rules make hiring committees accountable for their decisions. This implies that job candidates must meet minimum "quality" requirements to be considered for the job. The latter makes new hires of a university similar in terms of prior performance. Thus, our strategy might mitigate this potential data limitation.
A second limitation is that our technique discretized prestige movements in three treatment groups, rather than considering them as a numeric variable. Our method might have the disadvantage of considering movements from the first to the second ranked institution as movements from the first to the bottom ranked one. Besides this potential drawback, we should remark that the way in which we operationalized our ranking algorithm limits this possibility. The situation in the example above is very unlikely. Most hiring patterns from one university to other are stratifies and clustered around universities of comparable prestige. Very distant prestige movements represents a strong violation of the basic assumptions of our ranking algorithm. The ranking algorithm therefore minimise that possibility. Even if it is still possible for an individual to move from the first to the bottom ranked institution, this situation is rare, since a university that has many individuals moving this way will be penalised by the ranking algorithm. The rank of a university is higher as much more students it is able to place in better ranked first job.
For each comparison Up vs. Stay, Down vs. Stay, and Up vs. Down we generate n = 10000 bootstrap samples of the group on the left-hand side (the smaller) of its same size s. For each of the 10000 samples of size s, we create matched pairs of scholars matching on gender, age, discipline, graduation year and PhD (or first job) university. In order to compare their performance, in each sample we estimate the proportion of pairs, p * = (p 1 , p 2 , … , p n ) , in which one group g have higher performance than the other (group) g . Such as Where academic performance g is the individual NSR research rating. 11 For each group in the comparisons Up vs. Stay, Down vs. Stay and Up vs. Down, we estimate the two p * and (4) construct their F(p * ) cumulative empirical distribution function (CEDF). To assess the performance of one group over the other, we test for first order stochastic dominance (Levy, 1992). This test implies higher performance of g over g if F(p ) ≤ F(p ) for all p * . 12 We compare the two CEDFs running a two-sided and a one-sided Kolmogorov-Smirnov test (KS test).
The null hypothesis of a two-sided test is H 01 ∶ F(p ) = F(p )-the two CEDF are drawn from the same distribution. Rejecting the null hypothesis H 01 implies that the academic performance is statistically different between the two groups. The null hypothesis of the one-sided test is H 02 ∶ F(p ) ≥ F(p ) . Rejecting the null hypothesis implies that F(p ) stochastically dominates F(p ) , in other words, that the increase in academic performance associated with a change of prestige from group g is statistically different and greater than g .

Matched Pairs Results
In this section, we compare scholars performance of Up vs. Stay, Down vs. Stay, and Up vs. Down in the short, medium and long-run. In particular, we examine the CEDF of the proportion in which one group received a higher NSR rating than the other. Up, Down, and Stay represent prestige changes from the PhD to the first job where prestige is measured with the dynamic ranking algorithm with a moving time window of 3 years. 1314 Results of the KS-tests of H 01 and H 02 in Table 5 show for every comparison that the CEDFs are different and one group stochastically dominates the other. Figure 2 compares the NSR research performance of matched pairs of scholars who Stay and move Up the hierarchy. In all figures, scholars match if they have the same gender, age, discipline, and graduation year. Additionally, figures on the left match scholars with the same PhD while those on the right-hand side match those with the first job institution. The matching procedure allows us to compare scholar with same PhD (or first-job) and characteristics but experiencing different prestige movements. In both cases, the CEDF of Stay > Up (solid lines), is located below that of Up > Stay (dashed lines) implying that the Stay group stochastically dominates the Up group. The implication of the results is the following. On the one hand, looking at scholars with the same (PhD) training (left-plots) we find that those hired internally have on average a better research performance than those experiencing upward prestige mobility (hired into a university more prestigious than their PhD). On the other hand, comparing scholars with the same first job (right-plots) but different training (PhD) we find that internal hired perform better than those coming from less prestigious PhDs (Up). These results suggest that scholars who manage to secure positions at their faculty after graduation demonstrate higher NSR levels of performance than those who migrate to upper ranked institutions.
Results for the comparison between the Down and Stay groups are in Fig. 3. In this case, we compare scholars who take academic positions in their faculties after graduation and PhD graduates experiencing downward prestige mobility. Results are the same matching the pairs on the PhD or the first job institution-the CEDF Stay > Down stochastically  dominates Down > Stay . In line with our previous results, plots on the left-hand side show that scholars with the same PhD training moving (Down) to a less prestigious institution in their first job tend to have a lower NSR rating than those hired internally (Stay). Similarly, plots on the right-hand side compare scholars with the same first job and indicate that those hired internally (Stay) have a higher performance than those moving down the hierarchy. The last comparison in Fig. 4 examines performance differences between scholars who experience upward and downward prestige mobility. Results show that the CEDF of the proportion of pairs of scholars where the performance of Down > Up stochastically dominates Up > Down . The stochastic dominance of one over the other implies that scholars who experience downward prestige mobility sustain higher performance over their career than those experiencing upward mobility in their early career. These results are consistent both matching pairs, keeping fixed the (PhD) training (left-plots) or the first job (rightplots) institution. In the first case, comparing scholars with the same (PhD) training, we find that those moving down the hierarchy have higher performance than those moving up to more prestigious first job institutions. In the second, pairing scholars with the same first job but different PhDs institution, we find that those coming from more prestigious PhDs (Down) have a higher NSR rating on average than those moving up from less prestigious PhD institutions. These results seem counter-intuitive at first glance, since most studies have associated upward ranking mobility with higher academic performance. 15 In particular, the first result of a negative impact of upward prestige mobility comparing scholar with the same training contradicts the previous results of Chan et al. (2002). However, their analysis is slightly different. They use a longitudinal analysis in one sub-field of economics, and their sample is limited to scholars with publications in 16 top journals in finance. In contrast to us, they find that scholars who experience upward prestige mobility publish twice as many as their colleagues. What is most interesting is that what we found is a pattern through the career of scholars and for both dynamic (Fig. 4) and static ranking estimations (Appendix Fig. 9). Nevertheless, these findings require further research that we discuss in the "Discussion" section.

Robustness Checks
Our methodology might be prone to potential weaknesses that we discuss in this section, providing additional robustness checks.

Static and Dynamic Ranking Results Comparison
The first potential drawback stands in the different time-frame between the categorization of prestige movements (Up, Down, Stay) which follows from the dynamic prestige ranking and the dependent variable NSR rating. This might mean that people move to the hierarchy in the PhD to the first job transition, and at the same time universities might change their ranking position relative to other institutions. A simple way to overcome this limitation is to run the same analysis using the static prestige ranking. In this case university prestige is assumed to be constant over the period of analysis and thus movements in prestige are computed on the basis of this aggregation of the data. Results using the static prestige ranking, in Appendix Figs. 7,8,9, are consistent with the previous. Additionally, we should remark the issue above does not apply to the short run estimation (top panels of Figures 1,  2 and 3). Since the dynamic computation of the ranking is done using windows of times that overlaps with the comparison of the research performance. In particular, the evaluation of the NSR ratings (our dependent variable) overlaps with the period of the transition up/  Table 6 Ordinal logistic regression: changes of prestige on academic performance  down/stay from the PhD to the first job. In more general terms, comparing the results using the static and the dynamic ranking. The main results show consistency.

Regression Analysis
In this sub-section, we further assess the robustness of our previous results using an Ordinal Logistic Regression model. In Table 6 the dependent variable is the NSR rating of research performance with five ordered categories (See Section "Data"). We include the control variables that we use in the stochastic pair analysis.
Models 1 to 3, on Table 6, explore the effect of university prestige from PhD, first job and change in prestige (Pr-PhD, Pr-Job, ΔPr). As expected, the models show negative coefficients on prestige of PhD and first job given that higher rank (lower prestige) decreases the likelihood of achieving higher academic performance. These results are also consistent in Model 4, that shows a similar effect of prestige from PhD and first job. Model 3 and Model 5 incorporate the continous change in prestige ( ΔPr), which is significant but closer to zero. Similarly to the stochastic pair analysis, these results suggest that researches who are hired by their faculty ( Δ Pr = 0 ) have higher odds of achieving higher research performance. To better understand the average change of moving Up, Down or Stay in the ranking we estimate Models 6 and 7 using the categorical variable of change in prestige similar to the previous section. We argue that the discrete change in prestige is more suitable to analyze the BFLPE, as it captures the shock in prestige internalized by researchers.
As shown in Table 1, scholars move on average only 6 places ( ΔPr). Henceforth, it is more likely that a graduate move up or down of few position in the rank. For example, A graduate who moved from the most prestigious university in the country to the second most prestigious is more common than moving from the most prestigious university to the least prestigious for their first job.
However, both could be labelled by their peers as researchers moving down the ladder and, so, they can also internalize their experience psychologically thinking that they have been "downgraded" and perceive a sense of failure (Waters and Leung, 2017).
Indeed, scholars are typically only aware of the universities and departments from whom they work and compete. Thus, they operate with asymmetric information and are myopic with respect to their precise place in the distribution of prestige. Henceforth, moving up/down within their league (average 6), has other potential costs and psychological effects (See "Discussion" section). And, the effect of changing prestige may be internalized irrespectively of the change in the number of places in the prestige ladder.
Model 6, estimates the average effect of moving Down and Stay with respect to the baseline category Up. The results are consistent with our previous results, Model 6 shows that the researches who experience downward mobility reported 115% (exp(0.144)) odds of achieving higher levels of academic performance in comparison to the Up group. In the same model, the group of researchers hired by their faculty after PhD graduation reports 158% higher odds of achieving higher performance rates than the group who stay. Model 7, has the group Stay as baseline and henceforth the log-odds of moving Up or Down are negative. Figures 5 and 6, reports the predicted probability for each group, clearly the probability of achieving higher levels of research performance is more loaded in the stay group. The predicted probability of achieving an NSR level II or higher is approximately 10% larger for the group that stayed in comparison to the group that went up. These results are in line with the previous section.

Discussion
Our main findings underline a more complex relation between prestige movements and performance than the common positive association. Our first result is that scholars experiencing downward prestige mobility show higher performance compared with colleagues with similar characteristics moving upward in the prestige hierarchy. This result challenges past studies that often associates upward prestige movements with above average academic performance (Chan et al., 2002). We interpret this finding in the light of the literature of the BFLPE and related mechanisms presented in "The BFLPE and Similar Mechanisms" section.
Becoming a researcher in academia has career phases where the social comparison might matter. The BFLPE hypothesis in psychology claims that individual performance can be affected by how individuals perceive themselves in comparison with their peers. The transition from the PhD to the first academic job is a stressful event in a scholar career, where the social comparison might matter more. Indeed, the psychological literature predicts that the BFLPE mechanism takes place when individuals change their environment. 16 When individuals change institution, or advance in their career, their peers also changes. For instance, after PhD graduation, mobile scholars change their place in the prestige hierarchy. A scholar moving down the hierarchy has higher prestige and academic self concept relative to their incumbent peers with lower prestige. The higher prestige and academic self-concept translates into higher competitiveness, visibility and resources than scholars moving upward.
Following the BFLPE mechanism, we can interpret the results of the Mexican PhD job market as follows. On the one hand, scholars who move down might think they are "big shots" relative to their new peers, and this is beneficial to their performance. Or in terms of positive peer effects, their new colleagues might think they have made "a catch" and give them more resources. In addition, an initial "failure"-moving down in the hierarchymight lead to an effort to "regain the previous prestige". Our results might also relate to how co-workers see the new hired, and this might generate different peer effects. Depending on how a new researcher is integrated in the new department, the peer effect can positively or negatively affect individual performance.
The explanation mentioned above is consistent with the apparently paradoxical result in this paper. To accurately test the BFLPE mechanism will require psychological tests on self-esteem or academic self-concept. However, the latter is out of the scope of our work and should motivate further studies. Still, we consider our results to be relevant for the policy debate of prestige stratification and mobility in the academic market, specially in developing settings. In light of the consistency of our findings, further studies should operationalize a standardised psychological test aimed at measuring the change in academic self-concept.
Our second result suggests a positive association between internal hiring and research performance in comparison to scholars moving in the hierarchy. This result, in conjunction with our analysis of the stratification of the Mexican system, suggests that PhD graduates are not moving to the peripheral areas of Mexico. The market is operating quite efficiently, however, identifying talented PhDs that are hired internally. This also suggests a negative effect of mobility, as internal hiring is associated with higher research performance. The negative association between early career mobility and performance is not surprising. Past literature on academic inbreeding 17 finds that its relationship with individual performance is ambiguous and depends on the sample, country, era, field, career stages, and measures which are used (Gorelova and Yudkevich, 2015;Capponi and Frenken, 2021). For example, Cruz-Castro and Sanz-Menéndez (2010), controlling for the type of mobility experienced by researchers, find that in Spain inbreeding scholars obtain a tenure earlier and with a higher number of publications than mobile colleagues. While, Horta et al. (2010) with Mexican data find a negative relationship between inbreeding and the number of publications. However, our work differs to theirs in many respects. First, we use a more nuanced measure of performance, the NSR rating which includes the local knowledge production and the quality evaluation of independent scholars specialized in the fields. Second, we give a more narrow definition of mobility focusing on the transition from the PhD to the first job and excluding international mobility.
The positive relation between inbreeding and performance in early-career is not surprising in Mexico, where inbreeding is pretty common. The Mexican university system is highly stratified and geographically centralized. The centralization of science and research is a well-known problem in Mexico, that is changing slowly through investing in research infrastructure in peripheral areas (Lopez-Olmedo et al., 2017). However, our results suggest that science policies in Mexico should continue working on increasing the flows of specialized human capital to aid the development of regional capabilities. With these lenses, our results might highlight that inbreed scholars specialized in areas germane to their faculty. A similar thing might be beneficial for the career of individuals but not necessarily good for the system as a whole.
More generally, Oyer (2006) highlights that the dynamics of the job market might impact the career in the long run. Possible mechanisms are university-specific capabilities and norms, co-worker behavior, and university turnover. The latter in particular implies that the time of graduation and information about the job market might give individuals an advantage/disadvantage depending on the job market conditions. In a system with few resources and low mobility, two main mechanisms might explain why those internally hired are more successful than the others. On the one hand, resource constraints make the job market less predictable because available positions at universities might be subjected to economic fluctuations and budget cuts. On the other hand, low mobility levels make that information about universities and their job market sticky and localized. In a similar case, those able to secure their initial positions at universities will have better information of the job dynamics as well as about norms and routines that might help them to become more germane to their institutions compared to those trained elsewhere.
We should remark a limitation of our analysis, our work considers Mexico as a closed system. We do not have data on foreign PhD graduates returning to the country. However, given the ties between Mexico and North America, it is likely that many Mexican trained abroad return to their country. For example, Finn (2010) estimates that only 40% of Mexican PhD graduates, trained in the US, are hired in a US university. 18 In the same line, Rivero and Peña (2020) show that repatriation policies have contributed to keeping the rate of return of Mexican researchers to 60% and 83% from Europe and the US, respectively. This suggests that a large proportion of PhD graduates returns to Mexico. This sub-sample of foreign-trained PhDs might behave differently following different career paths. However, besides this limitation, we concentrate on the Mexican university system, and we consider our findings relevant for science policy in emerging contexts.

Conclusion
This work is the first studying how prestige stratification affects scientific performance of early career scientists in the Mexican higher education system. The majority of comparable analyses looked at university systems in developed economies, especially in North America, where mobility after the PhD tends to be high and systems are more integrated. In the U.S. for example, there are often hiring practices that prevent universities from hiring their own graduates immediately after the PhD. Studying these mechanisms in less mature settings with higher resource constrains has policy implication because the gap between prestige and performance can be larger than in other settings.
Our findings in general suggest that there is a negative relation between mobility during early-career and academic performance. Moreover, when we decompose mobility looking at prestige differentials between PhD and first job institution, we find that scholars who Stay or move Down the hierarchy remain mostly in first-tier (top 10) institutions. 19 Our results of the matched pair analysis provide evidence of the same association of prestige movements and performance in the short-, medium-and long-run. Further, comparing those moving up with those moving down the hierarchy, we find that those moving down have sustained higher performance than those moving up.
The reasons why promising scholars experiencing upward prestige mobility have lower performance than their colleagues (with the same PhD) that stay or move downwards requires further investigation, as highlighted in the previous section. Similarly to the higher education system in the U.S. and other developed economies, we find a large stratification in the Mexican university system but low mobility (around 50% of PhD graduates are hired by their faculty), with a few prestigious institutions (around 10) producing the majority of PhD graduates that are subsequently mostly hired in these same institutions. A high concentration of prominent scholars in a few academic institutions reveals large inequalities in the distribution of prestige. On the one hand, the stratification of higher education could promote higher levels of specialization with a targeted allocation of resources. On the other hand, it can also reveal a structural problem in the science system, a "lock-in", where researchers are trained and hired by elite institutions and flows of knowledge are reduced throughout the national science system. In the case of Mexico, a structural "lock-in" could be additionally reinforced by the negative association between mobility (upward prestige mobility in particular) and performance.

Levels of Rewards linked to the NSR ratings
-SNI Candidates 20 : Granted for 3 years, with the possibility of 2 years of extension. -Level I: Granted for 3 years the first time, and every 4 years in the following periods. -Level II: Granted for 4 years the first time, and every 5 years in the following periods. -Level III: Granted for 5 years the first and second time, and every 10 years in the following periods. -Emeritus Professors: Candidates must have 65 years of more, and have accumulated at least three periods of level III distinction (15 years) without interruption.

NSR Rating and Evaluation
We detail here the NSR rating process. The NSR ratting is a peer review evaluation of research performance. Scholars submit their CVs with their scientific production. The evaluation of the CVs takes into consideration primarily the research output and linkages with industry and the public sector and to a less extent human capital formation of research groups 21 . The research output include: Scientific Articles, Books, Book Chapters, Patents, Technological Developments, Innovations and Transfers of Technology. These research products contain not only publications indexed in Web Of Science or Scopus but also local research in Spanish which is relevant for the country.
The "Evaluation Commission" reviews the quantity and the quality of the aforementioned research products in each CV (application). All the members of the commission participate in the revision of applications, but the evaluation is the direct responsibility of two members of the commission. The Evaluation Commission is comprised by a heterogeneous group of 14 prominent scholars from different institutions, changed every 3 years 22 . The rotation of the members and evaluation process is overlooked by the "Council of Approval" and several CONACYT authorities. Their job is to eradicate any personal bias and discrimination to ensure a meritocratic evaluation. 23 20 Applicants can only receive this distinction one time. 21 See "Glosario de Términos" for more details. Available at https:// www. conac yt. gob. mx/ images/ SNI/ GLOSA RIO_ DE_ TERMI NOS_ BASIC OS_Y_ RECOM ENDAC IONES_ SNI. pdf. (Last accessed July 2021). 22 See "Miembros de Comisiones" for more details. Available at https:// www. conac yt. gob. mx/ Miemb rosde-comis iones. htm. (Last accessed July 2021). 23 See "Lineamientos para el funcionamiento de las Comisiones Dictaminadoras" for more details. Available at https:// www. conac yt. gob. mx/ PDF/ sni/ linea mient os-comis iones-dicta minad oras-y-trans versa les-sept-2019. pdf. There are multiple commissions for each of the corresponding research areas described in Table 7. The commission evaluates and assigns researchers to one level of research performance. The levels of research performance (NSR ratings) are 'SNI Candidate', 'Level I', 'Level II', 'Level III' (ordered from low to high performance). There is a special category called 'Emeritus' excluded from our analysis because this recognition is uncommon. An Emeritus recognition is an honorary recognition to professors at the end of a career. Every NSR rate has an associated economic reward that increases linearly. Belonging to the NSR, implies high recognition of the quality and academic prestige of the researcher, the result of a scientific production of considerable importance at the national level and, in some cases, also at the international level (Reyes and Suriñach, 2013). Thus, our dependent variable, NSR ratings, is not a productivity measure only, but it measures research performance in a broader sense. In contrast with bibliometric measures, our measure of research performance accounts for 'standard' of quality from the national context within each discipline. The solid curves are CEDFs of the proportion of pairs in which R Down > R Stay . Dotted curves are CEDFs for R Stay > R Down . Pairs matched by gender, age, discipline, graduation years, and same PhD university (left), or same first job university (right). From top to bottom: shortrun (Up to 2 years), medium-run (3-5 years), and long-run (6-25 years) after PhD graduation