Potential of European universities as Marie Curie grantee hosts

This study investigates the potential of European universities as hosts for Marie Skłodowska-Curie Actions (MSCA) grantees. Factors explaining both the probability of a university hosting an MSCA grantee and its extent are estimated using a zero-inflated negative binomial regression model. Results reveal that the probability of hosting MSCA grantees increases significantly with excellence (research performance), size and country group of the university. In addition, a deepening of excellence (citations), international orientation and the teaching burden (student-staff ratio) are significant predictors for the extent of grantees. Based on the estimates, the relative performance of a university is identified by use of a frontier production function. This reveals that some universities in the Northwest of Europe host more MSCA grantees than would have been expected given their attributes, and certain top universities host fewer. These results could be related to marketing and support activities that partially offset the importance of research performance or alternative models for financing.


Introduction
The Marie Skłodowska-Curie Action (MSCA) is a major grant established by the European Commission to support excellent research and knowledge transfer within its domain (European Commission 2018). Despite rising competition for academic talent (Edler et al. 2011;Wildavsky 2012;Stephan et al. 2015a) and presumptive benefits arising from knowledge transfer and cooperation (Ackers 2005(Ackers , 2008Auriol et al. 2013;Flanagan 2015), little is investigated about the role of sole universities (see Reiner et al. 2017 for an exception). Availability of data on the MSCA grants allows this to be partly mitigated.
The aim of this study is to advance the empirical insights into the potential of European universities as hosts for the Marie Skłodowska-Curie Action grantees, by using the information from the Cordis H2020 and the Times Higher Education Ranking (THE) databases. There are 390 European universities listed in the 2016/2017 ranking, of which 203 hosted MSCA fellows. Particular focus is put on the role of excellence (research performance and citations). A zero-inflated negative binomial (ZINB) regression model is used to investigate if and to what extent universities host MSCA grantees. Based on these estimates, a frontier production function approach is used to calculate the hypothetical capacity of the university as a grantee host.
The MSCA programme was established in 1996 and significantly expanded in the EU Horizon 2020 programme (€6 billion for the period 2014-2020). It aims to promote crossborder mobility, development and training of researchers at all stages of their careers (European Commission 2018) and to enable research-oriented organisations (universities, research centres and firms) to host talented foreign researchers. According to the founder, MSCA fellowships are among the most competitive and prestigious awards in Europe, aimed at supporting the best and the most promising scientists (European Commission 2018). 1 There are indications that the MSCA fellows (grantees) perform better on average both before and after the grant (Jonkers et al. 2018).
To date, universities in the northwest of Europe are hosting the largest number of MSCA grantees, with the UK alone accommodating almost a third of the 5000 scholars awarded the grant between the years 2014 and 2017 (source: Cordis H2020 database). The University of Oxford, University of Cambridge and Imperial College, University College of London (UCL) host the largest number of fellows. Danish and Irish universities welcome a disproportionally large number of grantees given their sizes, while the opposite situation is found for German and Italian universities. Universities in southern and eastern Europe are seldom MSCA grantee hosts, but a large proportion of the grantees receive their PhDs in these countries (Jonkers et al. 2018).
A main contribution of this study is the perspective of the analysis: the institution rather than the individual. Another novelty is the assessment of the actual MSCA performance of each university in relation to its potential and in comparison with other universities. In contrast to previous studies, which benchmarks smaller groups of universities in terms of quality of research and teaching, this analysis is based on a broader set of features for a large representative group of European universities.
This study is structured as follows: the 'Conceptual background' section outlines the theoretical basis; the 'Empirical approach' section describes the approach: the 'Data' section introduces the dataset and the descriptive statistics. The results are presented and discussed in the 'Empirical results' section and the 'Conclusions' section concludes.

Conceptual background
Even though benchmarking of universities is common (the already mentioned THE, for instance), less is found in a large international setting about their roles as hosts of specific grantees or research fellows. Cattaneo et al. (2017) investigate how Italian universities contest about students by the use of a competition destination model and find that size of university and internationalisation is of high importance. Several other studies analyse the overall research or teaching performance of universities (for instance, Worthington and Lee 2008;Thanassoulis et al. 2011). Typical output indicators in these studies are the number of undergraduates, graduates, postgraduates (including PhDs), industry grants or publications. Turner (2005) benchmarks universities based on their research and teaching quality as outputs and the student-staff ratio as an input. There is also a parallel literature that investigates the performance of cities as hosts of fairs and exhibitions, where the dimensions size and internationalisation are of particular importance (Rubalcaba-Bermejo and Cuadrado-Roura 1995).
Another source that might influence the formulation of the potential as an MSCA host is the literature on mobility of students and researchers, although few explicitly model the determinants of the probability to move abroad after completion of a PhD. Exceptions to this are analyses by Cattaneo et al. (2019) and Reale et al. (2019), where the former find that the decision to move abroad depends on individual characteristics, features of the source (home) university and of the host country (R&D expenditures). The latter study demonstrates that individual characteristics and the total expenditures on R&D in the host country are important but not the quality of the source university (measured by the CWTS Leiden index).
There are also the so-called pull factors that may explain the wish for a certain host university. Such factors relate both to the university itself (reputation and quality of the university, institution and its programmes, tuition costs and language, expertise of the staff and degree of innovativeness), to its surroundings (costs-of-living, culture, possibility of longterm employment in the host country) and to other desirable characteristics or amenities of the host city (Mazzarol and Soutar 2002;González et al. 2011;Perkins and Neumayer 2011;Van Bouwel and Veugelers 2013;Stephan et al. 2015a;Min and Falvey 2018;Bratti and Verzillo 2019).
The Marie Skłodowska-Curie Action offers individually tied scholarships following a bottom-up call for proposals where the selection criteria encompass measurement of research excellence, valuation of the quality of the research project and an assessment of the supervisor and the host institution (European Commission 2018). This means that top universities are expected to accommodate more MSCA grantees. Excellence of universities can be determined in several ways. The most common measure of research performance is the number of publications (Moed et al. 1985), usually adjusted for size of the department or university. An alternative indicator is the number of citations, which measures the relevance, dissemination and possibly also the impact of the research.
The main expectation of the study is that a bundle of factors affects the potential of the universities as MSCA grantee hosts, although universities with a relatively high number of publications and citations are envisaged to be more active in general. Another aspect that could be of importance is the possibility to offer an international orientation or collaborations, measured as the proportion of international students, international staff, international joint publications/research networks, international research grants or joint degree programmes, for instance (Gao 2018;Spencer-Oatey and Dauber 2019). The potential of universities as MSCA grantee hosts is also likely to increase with their size, a higher staff-student ratio and industry collaboration.
Since the ranking of the most experienced MSCA hosts does not fully coincide with that of the THE (Table 5 in the Appendix), there are, as discussed above, additional aspects that might be of importance for the potential such as reputation, country group, appeal of the nearest city, language, infrastructure and cost-of-living. Some of these factors are difficult to measure or outside the domain of which the university may have an impact on, but could possibly be considered in a broader description of its features. Unfortunately, detailed regional cost-ofliving data across Europe are not available, but since the main stream of MSCA fellows goes to high-cost and high-wage countries, this may not be a major determinant. Language barriers could affect the potential, even if most institutions allow post-doctoral researchers to work in English. In addition, an international airport nearby would indicate the status of the infrastructure surrounding the university and the size or kind of the city reveals something about its possible amenities and the price level.

Empirical approach
The empirical model is partly inspired by the literature on determinants of student (Van Bouwel and Veugelers 2013; Bratti and Verzillo 2019) and academic mobility (Stephan et al. 2015b;Janger and Nowotny 2016;Reale et al. 2019) although another aggregation level and perspective is used, partly motivated by destination competition (Rubalcaba-Bermejo and Cuadrado-Roura 1995; Cattaneo et al. 2019). This leads to a specification where the number of MSCA grantees a university hosts depends on a set of features: Subscript i denotes university, ε i reflects the error term, β 0 is the constant and ln() is the natural logarithm. The continuous explanatory variables are all lagged 1 year allowing for the time between application and acceptance. A list of the variables is found in Table 1 (and an indepth explanation in the 'Data' section). Composite index on research performance (research volume, research income and reputation) (index 1-100).

Citations
Number of times university-published work is cited by scholars globally (index 1-100). Industry income Research income an institution earns from industry (index 1-100).

International outlook
Composite indicator of international-to-domestic-student ratio, international-to-domestic-staff ratio and international collaboration (index 1-100). Ln (size) Number of full-time equivalent students (natural logarithm (ln)). Ln (students/staff) The dependent variable ranges from 0 to 38 grantees in 2017 (Cordis H2020), with a median of 1, a mean of 2.1 and a variance of 16.7 ( Fig. 1 in the Appendix). Since the variance is much larger than the mean value, both OLS and standard count data models would lead to biased results. Instead, a count data model accounting for over dispersion is best suited for the data (Cameron and Trivedi 2013). The ZINB model allows zeros to be generated by two distinct processes: one for the probability (by logit or probit) of a university hosting an MSCA fellow and the other for the number of MSCA grantees (count data part). Thus, the probability distribution for the number of MSCA fellows y i in a given year is written as: where π i denotes the probability of excess zeros (universities that host no grantees). Subsequently, the negative binomial distribution g(y i ) is as follows: with μ i representing the expected mean of the non-zero response for the ith university, α is the overdispersion parameter of the underlying distribution and Γ(.) is the gamma function. The negative binomial component in turn, μ i , can be expressed as a function of explanatory covariates X using a natural logarithm link function: where e ß 0 is the intercept and ß represents the coefficients. The logistic link function then follows from.
Overdispersion appears if either π i or α is larger than 0. A likelihood ratio test can be used to investigate whether the parameter α is significant and the Vuong test (Vuong 1989) reveals if the chosen model is more appropriate than the standard negative binomial model. Based on the estimates, the relative performance of a university may be identified by use of nonparametric methods and by stochastic frontier production functions. These methods are often employed for ranking or benchmarking, in the case of universities, for instance, with respect to their research performance, teaching quality or number of graduates (Turner 2005;Worthington and Lee 2008;Thanassoulis et al. 2011). Since the dependent count data variable has a skewed distribution, a simple deterministic frontier production function approach is employed, where the fit of the number of grantees predicted by the model is compared with actual numbers (Aigner and Chu 1968). Thus, by using this approach, the central research question is translated into efficiency terminology. Universities with the highest number of MSCA grantees given their characteristics are considered to be the most efficient ones.

Data
Two major datasets are employed for the study: Cordis 2020 with information about the MSCA and their host institution and the THE including individual features of 390 European universities. 2 The MSCA has four parts: host research, training and career development activities for young researchers with doctoral degrees, individual fellowships for experienced researchers (post-docs), RISE -Research and Innovation Staff Exchange Scheme and COFUNDco-funding of regional, national and international research programmes (European Commission 2018). In the H2020 period, the programme on individual MSCA Fellowships (MSCA-IF) for researchers moving within Europe is the largest one. The Cordis database contains information on the title of the MSCA-funded scholarship, objective (abstract), start and end dates, contribution of the European commission, coordinating university or research institution and the name of the MSCA grant holder. Cañibano et al. (2011) suggest the academic mobility depends on the discipline, although presently there is no information about field of the research in the proposal.
Information on individual grantees is aggregated at the host institution level. The number of MSCA-IF grants assigned during the period 2014 to 2017 based on acceptance year ranges between 1200 and 1360 per year (Source: Cordis H2020 database). 3 In this study, the year 2017 is selected for the scholarship data, while information on university characteristics refers to 2016, to allow for the application and acceptance processes. For 2017, information on the 817 MSCA grantees can be linked to the 390 universities listed in the THE database.
Universities with the highest number of MSCA grantees in 2014-2017 are often found in UK, confirming that this is one of the leading scientific nations in Europe (Table 4, Appendix). The pattern does not change much over time, but the post-exit deal between the UK and the European Union may have an impact (Courtois and Veiga 2019). Data also show that German universities host a disproportionally small number of MSCA fellows given their sizes.
Excellence and other university features are available in the THE database, which includes a mix of quantitative and qualitative performance indicators on teaching (the learning environment), research (volume, income and reputation), citations (research influence), international outlook (staff, students and research) and industry income (knowledge transfer) (Marginson and Van der Wende 2007). All indicators used are measured as indexes and scaled between 1 and 100. Data is self-reported by the universities and the calculations undergo external audit. Universities are excluded from the ranking if they do not teach undergraduates, if their research output amounts to fewer than 1000 articles between 2011 and 2015 (and a minimum of 150 a year), leaving a dataset of 390 research-oriented universities in Europe out of between 3300 and 4000 in 2016. 4 The THE listing, as well as the Shanghai university ranking, is commonly used to compare the performance of universities (Jöns and Hoyler 2013;Reiner et al. 2017) and in analyses of the economics of innovation (Siedschlag et al. 2013). Despite different methodologies, there are reasonable similarities between the various rankings of European universities (including the CWTS Leiden, Waltman et al. 2012), particularly so for the measure of citations (Aguillo et al. 2010;Olcay and Bulu 2017). The rankings are also stable over time (Selten et al. 2019). Common criticism of the university rankings relates to non-transparent methods, coverage, bias towards research written in English and that it is less relevant in certain fields such as arts and humanities (Aguillo et al. 2010;Lim 2018). In this study, the THE database is used because it encompasses a broader set of features of the universities than the pure research bibliometrics, making the quantitative analysis possible and the comparisons consistent. The ranking criteria, specifically excluding universities without research, also help to construct the control group of non-hosts, which otherwise would have been based on more arbitrary judgments.
From 2016 onwards, the THE database includes a richer set of universities, explaining why a panel data approach is not attempted. The list of MSCA grantees also includes public research institutions like the CNRS (Centre national de la recherche scientifique) in France and the Fraunhofer Institute in Germany. These institutions cannot be included in the empirical analysis as no compatible information on their research performance is available and they are also expected to operate under different conditions than universities.
The composite indicator of research performance encompasses research volume, research income and reputation. Research volume builds on the number of articles published in scientific journals, indexed per scientist by the Elsevier Scopus database, scaled for institutional size and normalised for subject. Number of publications is often used as an indicator of research performance (Jonkers and Cruz-Castro 2013). Information on research income is adjusted for the number of scientific employees, the Purchasing Power Parity (PPP) and the reputation of the university. Volume and income each accounts for 20% of the performance variable and the remaining part is allocated to reputation.
Citations are measured by the frequency with which the published work of a university is quoted by scientists worldwide. 5 The database encompasses 23,000 scientific journals indexed by Scopus and their indexed publications between 2011 and 2015. All data are standardised to reflect differences in citations volume between disciplines. This means that institutions with extensive research activity in subjects with traditionally high citation numbers do not receive any unjustified advantages.
The international outlook (staff, students, research) indicator measures the ability of a university to attract students, postgraduates and lecturers from all over the world. This composite indicator also includes the proportion of research articles by the university with at least one international co-author. Industrial income is an alternative variable for knowledge transfer, capturing the revenue an institution generates from industry, adjusted for PPPs and weighted by the number of scientific employees.
Descriptive statistics reveal that the average number of MSCA fellows in 2017 is two and that the standardised number of publications and citations in the year before are 27 and 59 on average, respectively (Table 2). There are 22 students per university academic employee (natural logarithm value is 3.0) and the average number of students is 16,320 (ln value is 9.8). The research performance and number of citations of universities with at least one MSCA grantee are markedly higher than of those without grantees. MSCA grantee host universities are also more international and slightly larger.

Empirical results
Just like for individual flows of students and researchers, the marginal effects of the zeroinflated negative binomial model reveal that excellence (measured as research performance) is important for the potential of universities as MSCA grantee hosts, as is their size (number of students) and country group (Table 3 part (i)). The extent of grantees is related to similar factors, but also to a deepening of the excellence (number of citations), to the student-staff ratio and to the international outlook (Table 3 part (ii)). A larger number of students relative to the academic employees are associated with fewer MSCA grantees. This implies that the potential may improve with a lower teaching burden.
Non-significant variables are excluded in the final specification of the model. The Vuong test shows that the negative binomial model is rejected against the zero-inflated negative binomial model (Table 3). The likelihood ratio test of the overdispersion parameter alpha (and whether it is significantly different from zero) rejects the zero-inflated Poisson model at the 1 % level.
Based on the predicted probabilities from the logit estimation, the extent to which the research performance needs to be increased in order to host at least one MSCA grantee may be calculated, given the size of the university. These calculations show that the research performance must increase by a factor of 2.1 (from 18 to 39) to attain the probability of hosting at least one MSCA grantees. 6 This means that non-hosting universities such as the University of Trieste, the University of Valencia and Montpellier University, all with a research performance  Since the variables are scaled differently, the effect of a one standard deviation change for each of them is calculated. Subsequently, a one standard deviation increase in the research performance variable leads to a rise in the number of MSCA fellows by 1.0 (0.061 × 17.4) and the corresponding relationship for citations is 0.70 (0.030 × 24.2). A similar exercise for size of the university is associated with a surge of 0.9 and a reduction in the student-staff ratio by one standard deviation is related to 0.6 more grantees. Thus, these calculations reveal that excellence of the university, measured as research performance is the most important factor for the potential as a MSCA grantee host. The findings related to the importance of research performance coincide with the parallel literature on student mobility (Van Bouwel and Veugelers 2013; Bratti and Verzillo 2019) and mobility of researchers (Janger and Nowotny 2016) in that excellence is of high importance.
Based on the estimations of the ZINB model, the predicted number of MSCA grantees is calculated. This makes it possible to benchmark the actual and the potential performance of universities. University of Copenhagen is used as the benchmark because it hosts the largest number of MSCA fellows. The comparison reveals that a group of universities, mainly in the Northwest of Europe (University of Copenhagen, KU Leuven, University of Birmingham, University of Leeds, Aarhus University, University of Bristol, Eindhoven University of Technology, University of Antwerp, Aston University, University of Oslo, University of Warwick and Paris-Sorbonne University), but also Ca Foscari University in Venice, Pompeu Fabra University and University of the Basque Country, accommodate far more fellows than would have been expected given their attributes while Imperial College London, University College of London, ETH Zürich and LMU Munich, for instance, host fewer ( Table 5 in the Appendix). The results are in line with Rubalcaba-Bermejo and Cuadrado-Roura (1995), who find that cities (in their case) can perform well as hosts of events even if they are not the largest or the most international ones, depending on the trade-off among the factors of importance. Thus, one explanation behind a lower than potential performance by a top-ranked university could be that there are other person-bound grants available that reduces the pressure to attract MSCA funding and fellows (Jonkers et al. 2018). Garland (2020), for instance, concludes that United Kingdom universities established pre-1992 generally have a diversified financing portfolio and thus are not overly vulnerable to changes in the external environment. Alternatively, due to the availability of staff, space or other reasons, the university decides not to use its full potential.
Another factor that could shed light on the relative performance is the way in which universities market themselves and the MSCA scholarships or embed them in their regular activities. Although this information is difficult to obtain, there are indications that certain institutions, for instance KU Leuven and University of Copenhagen, are particularly keen on promoting MSCA grants and they also offer assistance in the application process by organising master classes for presumptive grantees. 7 Ca Foscari in Venice uses an unconventional approach and offers to prolong the MSCA scholarship with a third year financed by the university. 8 Vidal et al. (2015) conclude that assistance of grant managers in the MSCA applications process leads to a higher success rate for research institutions.
Several robustness checks are conducted. To rule out possible multicollinearity between the independent variables, their bivariate correlations are investigated ( Table 6 in the Appendix). These are low or in the medium range. The variance inflator factor for the research and size variables is 1.0 and between research and citations is 1.2, indicating that multicollinearity is not present. Second, results for simpler count data models such as the negative binomial (NB) model are reported (Table 3 part (iii)). Although this is rejected against the ZINB, the sign and the significances of the count part are similar.
Third, the specification is extended and re-estimated including variables for capital city, infrastructure (airport nearby) and presence of a world heritage site. Information about capital cities originates from Wikipedia, data on commercial airports are available in the Eurostat Avia database and the world heritage sites are listed by UNESCO. 9 These variables are meant to control for university features in the broader sense: accessibility, local price level and possible supply of amenities or culture, although they are not strong (significant) enough to offset the importance of excellence. However, it is possible that a combination of one of these, together with unmeasurable factors such as marketing activities, would be of importance.
Fourth, variables reflecting age of the university and a dummy for Nobel Prize winner are tested, but do not appear significant. 10 An explanation to the latter could be that a Nobel laureate dummy partly coincides with the excellence variables or that most prize winners in recent years are based in the USA and thus have no impact on the potential of European universities as MSCA grantee hosts. Fifth, the empirical model is estimated by using the number of MSCA grantees for the year 2016, to secure that the results are not sensitive to the period of time chosen, which they are not. Unreported results show that the estimations and the ranking of predicted MSCA grantees are similar, except university of Venice that exhibits much fewer grantees in 2016.

Conclusions
The aim of this paper is to empirically investigate the potential of European universities as hosts for Marie Curie (MSCA) grantees. Besides the novel perspective, the large, international and representative dataset on 390 universities (Cordis H2020 and the THE), the study also advances knowledge by benchmarking the MSCA host potential of universities. A zero-inflated negative binomial regression model is used to extract estimates for this benchmarking.
Results reveal that excellence (measured as research performance) is important for hosting MSCA grantees, as is university size (number of students) and country group in line with studies on individual flows of students, researchers and investments in R&D. The extent of grantees is related to similar factors, but also to a deepening of excellence (citations), international orientation and the student-staff ratio. Among the aspects, the university may have a direct influence on research performance and size which are the most important ones for the potential as an MSCA host. Findings also indicate that universities that do not host grantees may need to improve their research performance by a factor of more than two to change this, something that is difficult to achieve in the short term.
A comparison of the actual number of MSCA fellows with the predicted ones inform that some universities have prospects for improvements, while others are already performing above their potential. Availability of other kinds of grants or financing models could be an explanation behind why, although already doing well, some top-ranked universities do not use their full potential. Those performing well above their potential may be more active in marketing and support activities relating to the grants and thus possibly partly offset the importance of research performance.
A lower than predicted level of grantees hosted by the absolute top universities implies that the knowledge transfer from this group could become limited. However, at the same time, this opens a window for a broader group of universities to host more grantees than its potential, possibly through careful marketing or embedded scholarships. By doing this, a rise in excellence can be initiated, since the MSCA fellows are expected to perform better than the average scholar. Future studies are needed to investigate which of these factors are the strongest. If the post-Brexit collaboration between the European Union and the UK does not include European research programmes, there might be a significant shift in intra-European flows of students, post-doctoral fellows and MSCA grantees. How this unfolds could be a subject of importance for future analyses.
A main implication of the study for presumptive grantees, universities and policymakers is that traditional strengths of universities are also valid for becoming MSCA grantee hosts, but these could be offset to some extent, by for instance support in the application process or additional tailor-made research programmes. Given that the MSCA grantees are expected to perform better than average, hosting more than its potential, the university might indirectly find a faster track to improved performance. However, a main shift in the research performance of universities is a long-term strategic process that likely needs public or other financial support.
A main drawback of the study is that data characteristics only allow the use of a static model. Since the number of MSCA grantees shows a high level of persistency over time, future work would benefit from dynamic modelling of the relationship between university attributes and the number of MSCA grantees. Further, the annual variations in research performance are negligible, implying that consistent data for longer periods of time are required for such studies. A further limitation is the absent information on domain (science, engineering, social sciences or humanities). Future work should try to include such perspectives in the analysis.    Source: Cordis H2020 Database, the THE and own calculations. Asterisks ***, ** and * denote significance at the 1, 5 and 10% levels. The number of observations is 390 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.