Guide to the galaxy of EU regional funds recipients: evidence from new data

This study presents a new firm- and project-level dataset containing data on over two million projects co-funded by the EU structural and cohesion funds in 25 EU member states during the programming period 2007–2013. Information on individual beneficiary firms and institutions is linked with business data of Bureau van Dijk’s ORBIS database. Moreover, text mining techniques are applied to categorise the EU cohesion policy projects into fifteen thematic categories. Stylised facts reveal substantial regional heterogeneity in the distribution of funds to certain projects and beneficiaries (with respect to their size or industry). Furthermore, regional funds distribution differs across less developed and higher-income as well as urban and rural regions. In an econometric analysis, we control for project and firm characteristics that we expect to determine the single project’s value, which is confirmed by the results. Nevertheless, there remains unexplained variation in individual project volumes, which differs systematically across countries.


3
in order to foster regional development and cohesion. There is a broad literature that investigates different aspects of the effectiveness of those financial contributions, e.g., in terms of increasing income growth (see Dall'Erba and Fang 2017, for a survey). Due to a lack of data, research so far has mainly focused on the implementation of the policy and its effects at an aggregated, mostly regional, level. Nevertheless, the specific design of cohesion policy programmes remains a prominent theme in the academic and public debate, especially with respect to post-2020 EU cohesion policy. Apart from the allocation of funds to thematic priorities in a region, however, the intraregional distribution of funds to specific projects and beneficiaries has been a "black box" to researchers and European policy makers so far.
In principle, the European Regional Development Fund (ERDF), the European Social Fund (ESF) and the Cohesion Fund (CF), in this study together referred to as the EU's regional funds, co-finance projects that are part of operational programmes (OP) and pursue strategic priorities like strengthening the labour market, improving social infrastructure or building better traffic networks. 1 The projects are carried out by firms, institutions or other entities and are selected to be co-financed by the particular OP's managing authority (a public or private body nominated by the member state). 2 Therefore, next to the managing authorities' responsibility for choosing suitable and promising projects, the actual beneficiaries are accountable for the single projects' success. The appropriate fulfilment of these tasks (project selection and successful implementation) likely contributes to achieving the corresponding OP's target and, finally, the overall effectiveness of a region's EU cohesion policy implementation. 3 This study presents newly collected information on individual projects cofinanced by regional funds during the multi-annual financial framework (MFF) 2007-2013 and explores how the regional funds committed to European NUTS-2 regions are distributed within the regions. 4 The resulting database contains data on over two millions of co-funded projects in 25 EU member states. Those projects are carried out by 1,076,097 beneficiaries which we matched with the ORBIS business database by Bureau van Dijk in order to gain information on their business characteristics.
Using that data, this paper contributes to the literature by providing a comprehensive analysis of the projects selected by national or regional managing authorities during the MFF 2007-2013. The high level of granularity of the data allows to investigate whether the distribution of regional funds differs across European 1 We do not consider projects co-funded by the European Agricultural Fund for Rural Development (EAFRD) and the European Maritime and Fisheries Fund (EMFF) which are also EU cohesion policy instruments. 2 Refer to European Council (2006b) for detailed information on cohesion policy implementation. 3 The European Commission considers various indicators, e.g., the number of jobs created or the number of direct investment aid projects to small and medium-sized enterprises, for its evaluation of cohesion policy (see http://ec.europ a.eu/regio nal_polic y/en/polic y/evalu ation s/ec/2007-2013/#1). 4 Since the multi-annual financial framework 2007-2013, the managing authorities of the OPs are required to report the firms and institutions which receive financial support for carrying out projects that contribute to economic and social cohesion across European regions. regions with respect to project, geographical and firm-level characteristics of the funds' beneficiaries. 5 Thus, the availability of this micro data increases the transparency of policy implementation. For example, it shows whether the EU's funding priorities (e.g., the focus on small and medium-sized enterprises) are mirrored in cohesion policy implementation or if more guidance would be desirable. Moreover, the findings of this paper may be interesting for EU cohesion policy evaluators, as differences in the distribution of regional funds within regions may play a role for explaining differences in EU cohesion policy effectiveness across regions.
First, this study investigates the intraregional distribution of regional funds along different dimensions like the thematic priorities set by the OPs' managing authorities. This descriptive analysis distinguishes between different types of regions, i.e., less developed and richer, as well as urban and rural ones. One result is that a large share of regional funds in low-income regions is allocated to Transportation Infrastructure, Environment and Innovation and Research and Technological Development (RTD) projects. In the majority of the other regions, the largest project amounts are also assigned to the latter two themes, while, in addition, there is a stronger focus on labour market projects.
Second, since we are one of few contributions so far that are able to analyse regional fund data at the level of projects and beneficiaries in an encompassing way, we also focus on individual projects. In particular, we document substantial variation in project volumes across different funding instruments (i.e., the ERDF, ESF or CF), the objective of the OP (i.e., Convergence or Regional competitiveness and employment), the regional focus on thematic priorities, and beneficiaries' characteristics like firm size or industry. Moreover, related to the projects' themes, we find that managing authorities in urban (as compared to rural) and less developed (as compared to higher-income) regions allocate the regional funds on average to relatively large projects.
Third, after controlling for the project and beneficiary characteristics in a regression analysis, results show residual variation in project volumes that varies across countries and regions. In general, there is little literature on whether the size of (cohesion policy) grants to firms is related to the effectiveness and efficiency of a policy. Locatelli et al. (2017) indicate that the tendering of larger projects favours corruption in the public procurement process. Therefore, we argue that future research should investigate whether regional residual variation in average project values is linked to institutional settings or historically grown funding traditions in the respective country. Furthermore, the link between the latter as well as regional funds distribution patterns and the effectiveness of policy implementation should be explored. 6 5 This paper provides details on the creation of the dataset that may be interesting to researchers working with similar data. One important feature in this context is the publication of the R package (fastTextR) which we use for classifying projects into fifteen themes defined by the European Commission and which we made publicly available. 6 E.g., for the field of industrial policy, Criscuolo et al. (2012) indicate that granting funds to smaller firms yields better results than funding large firms, as, beside other reasons, the latter are more likely to displace own investments by the firm grants. Bachtrögler et al. (2018) show for seven EU member states that, independently of their volume, projects carried out by manufacturing firms in regions with 1 3 The remainder of this paper is structured as follows: Chapter 2 gives an overview on the EU cohesion policy design and places our analysis into the context of existing literature. Chapter 3 describes the content of the database in detail and compares it with aggregate official data. Chapter 4 provides stylised facts regarding the distribution of total project volumes across (different types of) regions, project and firm characteristics. Chapter 5 shows the results of the econometric analysis with a focus on the individual project volume and, finally, Chapter 6 concludes.

The design of EU cohesion policy
In the multi-annual financial framework (MFF) 2007-2013, the regional funds amounting to EUR 348,865 million were the second largest item of the EU budget. 7 The cohesion policy's objective is to increase convergence across European regions by co-financing member states' initiatives targeted at specific priorities. First, a NUTS-2 region's principle eligibility for funding is determined according to three main objectives, namely, (1) Convergence (former Objective 1), (2) Regional Competitiveness and Employment (former Objective 2), and (3) Territorial Cooperation (former Objective 3). 8 Second, for eligible areas, the funds are allocated to operational programmes which co-finance projects that are carried out by private or public firms or organizations. The new dataset presented in this study is based on lists of these beneficiary firms, institutions, non-governmental organizations or other types of entities (in the following we refer to all types of beneficiaries by the term firm) that have to be made public since the MFF 2007-2013 (Article 7 in European Commission 2006).

The distribution of EU regional funds
The EU's cohesion policy works under the principle of shared management (refer to Article 14 and 15 in European Commission 2006). That is why multiple European and national (as well as sub-national) institutions are involved in the allocation process of regional funds.
Footnote 6 (continued) lower GDP per capita tend to be (partly statistically significantly) more effective in increasing beneficiary firms' employment and value added growth than similar projects run by firms in richer regions within the same country. 7 Most expenditure, i.e., EUR 412,611 million, is allocated to the "natural resources" programme which includes agricultural subsidies. For each MFF, the European Council prepares a document with strategic guidelines on reducing economic, social and territorial disparities. These guidelines for 2007-2013 encompass three priorities (European Council 2006a): (1) improving transport infrastructure, environmental and energy issues, (2) creating more and better jobs, and (3) a focus on knowledge transfer and innovation. Regarding the latter, special emphasis is put on supporting small and medium-sized enterprises (SME) which "often represent the highest source of employment at the regional level" (European Council 2006a, p. 19). The strategic guidelines on cohesion serve as a basis for the so-called national strategic reference framework that needs to be provided by each member state. The national strategic reference framework provides an overview of fields for intervention of cohesion policy in the particular country and undergoes a review process by the European Commission. Besides a proposal for the annual allocation of regional funds across the period, the member states need to generate a list of operational programmes (per objective and fund) for the objectives Convergence and Regional Competitiveness and Employment. Each operational programme is prepared and implemented by a managing authority, a private or public body appointed by the member state (Article 59 of European Council 2006b). In the most cases (in the case of a national OP always) it has a specified thematic target, e.g., improving regional human capital or social infrastructure. In most cases, they refer to particular NUTS-2 or NUTS-1 regions, however, there are also national, NUTS-0, programmes (see Appendix A.1 and Title III in European Council 2006b). The OPs must incorporate reasons for focusing on specific priority axes proposed by the Commission (see Annex IV of European Council 2006b). 9 From these priority axes, the European Commission derives fifteen so-called (priority) themes.
In the next step, the OPs' managing authorities select appropriate projects that are carried out by firms, i.e., the beneficiaries. 10 According to Article 2 of Council Regulation (EC) No. 1083/2006, a beneficiary is defined as "an operator, body or firm, whether public or private, responsible for initiating and implementing operations". An operation is referred to as "a project or groups of projects selected by the managing authority of the operational programme [...] allowing achievement of the goals of the priority axis to which it relates" (European Council 2006b). The amount of EU co-funding for each project depends on the eligible expenditure and the designated co-financing rate. 11 The structure of the new database builds on these regulations. The dataset includes the OP to which each observation (project) is assigned, the corresponding fund, objective as well as the theme. Thus, we are able to check the validity of our 9 See also "ERDF/ESF/CF Priority theme overview 2007-2013" at http://ec.europ a.eu/regio nal_polic y/ en/polic y/evalu ation s/data-for-resea rch/. 10 Note that data on applications for cohesion policy projects are not available. Therefore, we cannot control for the self-selection of firms. 11 Annex III of European Council (2006b) reports the ceilings for co-financing rates, i.e., the maximum percentage of eligible expenditure that is financed by a regional fund. E.g., the maximum co-financing rate for Spain amounts to 80% for the Convergence and to 50% for the Regional Competitiveness and Employment objective. The detailed regulation on the eligibility of expenditure can be found in Council Regulation (EC)  While the next section covers the literature that evaluates the effects of cohesion policy on regions or firms, there is also a literature that discusses the allocation process and the factors that potentially influence this bargaining. Bachtler and Mendez (2007) discuss the history of the allocation process up to the beginning of the MFF 2007-2013. They narratively highlight the negotiation process between member states and the EU Commission as well as the ongoing spatial concentration of the majority of funds. A more quantitative line of research (e.g., Bouvet and Dall'Erba 2010;Dellmuth 2011;Dellmuth and Stoffel 2012;Tosun 2014) finds that, indeed, factors such as the political situation, the type of governance, previous success of implementation or even the degree of eurosceptisism drive part of the allocation process.

Literature review
Most studies that analyse cohesion policy in a pan-European setting focus on the regional (in many cases NUTS-2) level (Hagen and Mohl 2009;Pienkowski and Berkowitz 2016). Most recently, Dall'Erba and Fang (2017) provide a meta-analysis of econometric studies on the evaluation of cohesion policy effects.
Generally speaking, there is no consensus in the literature regarding the outcome of cohesion policy. The effectiveness is measured, e.g., as a positive effect on gross domestic product (GDP) per capita growth (e.g., Pellegrini et al. 2013), investments per capita (Becker et al. 2013) or regional research and development activity (Ferrara et al. 2016). Most studies find a conditional positive effect of regional funds assignment (e.g., Becker et al. 2013;Cappelen et al. 2003;Ferrara et al. 2016), while others provide results that even suggest a negative impact (Breidenbach et al. 2016). In recent years, the potential reasons for heterogeneous (conditional) cohesion policy effects have gained major attention.
First, Rodríguez-Pose and Fratesi (2004) move the focus to expenditure categories of the main funding instruments. They find that investments in infrastructure or agriculture do not have sustainable effects on regional growth (see also Puga 2002;Dall'Erba and Le Gallo 2008), though, projects that foster human capital lead to sustainable positive effects on economic cohesion. This and other studies use data on the distribution of expenditure across NUTS-2 regions and themes (e.g., Dall'Erba and Le Gallo 2007;Percoco 2013;Ferrara et al. 2016). Second, regional heterogeneity as a determinant of policy effectiveness has become a topic of interest and is often modelled by a region's capacity to take advantage of regional funds. Becker et al. (2013) indicate that human capital and institutional quality matter for the effectiveness of Objective 1 funds in terms of their effect on GDP per capita growth and investment. Institutions are confirmed as influencing factor for the success of cohesion policy by other authors as well (e.g., Cappelen et al. 2003;Bachtler et al. 2014;Rodríguez-Pose 2013). Recently, Gagliardi and Percoco (2017) show that European cohesion policy is most effective in rural regions that are located close to cities. Third, Becker et al. (2012) take the amount of regional funds expenditure spent in a region into account (instead of treatment dummies) and conclude that there is a maximum efficient level of funds and paying more does not increase the effectiveness any more (see also Kyriacou and Roca-Sagalés 2012;Rodríguez-Pose and Garcilazo 2015).
Fourth, Becker et al. (2018) and Bachtrögler (2016) analyse the effects of structural funds (on income growth) in lagging regions over time and in the context of the economic and financial crisis starting in 2007. The latter finds that the effectiveness of cohesion policy in terms of increasing GDP per capita growth appears to decrease in the crisis compared to former periods when controlling for regional structural characteristics. Barone et al. (2016) also take the time dimension into account and show that cohesion policy effects are not persistent over time.
Finally, spatial heterogeneity and spillovers are considered to play a role for cohesion policy effectiveness (Le Gallo et al. 2011;Breidenbach et al. 2016;Maynou et al. 2016). This strand of the literature widely confirms some small positive effects on regional growth or convergence for a number of regions but no general overall effect.
The majority of studies named above are based on data at the regional or local level to study growth or convergence effects. One major drawback of this type of analysis is the potential endogeneity of structural funds. As they are especially granted to lower-income regions, cohesion policy is most likely not exogenous with respect to regional growth. Attempts to overcome this statistical problem include the use of time lags (Rodríguez-Pose and Fratesi 2004), different instruments (Dall'Erba and Le Gallo 2007Gallo , 2008, generalised method of moments (GMM) estimators (Breidenbach et al. 2016) or a computable general equilibrium approach (Horridge and Rokicki 2018). Another way to possibly identify causal effects is the use of microeconometric methods with regional or micro data.
Turning to the beneficiaries as unit of observation, De Zwaan and Merlevede (2013) evaluate the effects of Objective 1 and Objective 2 payments in the programming period 2000-2006 in 25 EU member states on productivity and employment growth of firms in treated and non-treated regions. However, they do not use actual recipients of regional funds but compare all manufacturing firms (available in the ORBIS database) located in treated regions with the manufacturing firms in nontreated regions. Additional firm-level analyses are available for sub-national geographical units (Bernini and Pellegrini 2011) and certain types of regional funds (Hartsenko and Sauga 2012).
As we overcome this lack of data with the new database, this paper contributes to the literature by giving the first detailed insights on actual beneficiaries (and projects) of cohesion policy in 25 EU member states between 2007 and 2013. A combined analysis of the projects' theme, firm-level characteristics of corresponding beneficiaries and the size of the projects' expenditure may help to explain heterogeneous effects of regional funds allocation found in the literature, and thereby, lead to important policy implications. Moreover, it allows to identify regional funds allocation patterns across regions, e.g., less developed and others as well as urban and rural regions, and countries (see Sects. 4 and 5).

Content of the dataset
The European Commission's Directorate-General for Regional and Urban Policy (DG REGIO) provides a collection of links to national or regional websites that make lists of beneficiaries available. 12 Unfortunately, the degree of detail of these lists' content as well as their structure vary significantly across countries, regions and even operational programmes. 13 Moreover, most documents are provided in national languages, different data formats and using non-standardised definitions.
Besides collecting and processing all information, we extend the data on beneficiaries by matching it with the ORBIS business database by Bureau van Dijk.
The resulting set of variables can be grouped into three blocks, namely, (1) project information, (2) funding (co-financing) information, and (3) business characteristics of the beneficiary retrieved from ORBIS. First, the project information includes the country and NUTS region in which the project is carried out according to the OP and list of beneficiaries, respectively. 14 Moreover, it covers the cofinancing fund, the objective and corresponding OP to which the project is assigned. As already noted, the dataset includes projects co-funded by the ERDF, the ESF and the CF, under the objectives of Convergence and Regional Competitiveness and Employment. Next, a project name or description, the start and end date as well as the theme of the project are specified. The theme is not reported by all managing authorities, which is why we classify the remaining projects according to available project information using supervised text classification (see Sect The second group of variables describes the funding structure. It contains the committed co-financing amounts by the EU (C_EU) and the national public funding (C_NAT; including co-funding of the recipient regions) as defined in the beginning of the MFF 2007-2013. 15 If the amount borne by the firm itself (ineligible cost, Inelig 16 ) is reported, it is added to the sum of co-funding commitments in order to calculate a total value for project i: 12 See http://ec.europ a.eu/regio nal_polic y/en/atlas /benefi ciar ies/. 13 In Appendix A.1, we provide an overview of all OPs together with information on each one covered in our database. 14 If that information is not evident in the managing authority's report, we consider the NUTS-2 region in which the beneficiary is located according to ORBIS, if available. 15 Appendix A.1 reports the degree of detail in which the financial information on projects is provided in the lists of beneficiaries. 16 Article 56 in European Council (2006b) states the definition of project expenditure that is eligible for co-funding.
In addition to the commitments, project values that were actually paid out by European (Paid_EU) or national (Paid_NAT) public funds are available for a subset of observations. If only the actually paid-out value is declared, the total project value represents the sum of EU (ERDF, ESF or CF) and national payments (including those from regional governments): Furthermore, the declaration date refers to the time of reporting of the respective list of beneficiaries. In case it is not noted, we use the date of download. 17 The third information block relates to the beneficiary. This data is produced by a matching exercise (using the name of the firm and its home country) with the ORBIS business database. We are aware of several shortcomings of this database (see, e.g., Kalemli-Ozcan et al. 2015), however, it represents the most comprehensive and accessible international business database. 18 The resulting dataset contains the firm's name in ORBIS and its location, its founding year and information on the industry in which it operates (NACE Rev. 2 industry and four-digits code), the firms' number of employees and sales volume. Moreover, there is a size classification by ORBIS that is based on at least one of the following variables: the firms' number of employees, total assets, operating revenue and whether it is listed at the stock exchange. 19 Furthermore, the database contains information on whether a firm belongs to a corporate group and, if so, on the number of entities in this group. Table 1 shows all variables and their coverage in the database, i.e., the share of all observations for which it is available. The OP, its location, the funding instrument, the objective as well as the name of the project and the beneficiary is available for each project. Moreover, the dataset provides at least a total project value for each observation. 39% of the observations could be matched with ORBIS.
For analysing whether the distribution of regional funds in (less) developed as well as in urban and rural regions, respectively, shows different patterns, we need to gather further regional characteristics. Less developed regions are defined as NUTS-2 regions which are eligible for funds under the Convergence objective (former Objective 1), i.e., whose income per capita is lower than 75% of the EU-25 average (in 2000-2002) (European Council 2006b). According to the share of the population living in urban areas, DG REGIO classifies European NUTS-3 regions into predominantly urban, predominantly rural and intermediate ones (see Dijkstra and Poelman 2011)]. In this study, we distinguish between predominantly urban and other regions. As one can see in Table 1, due to the low coverage of the more disaggregated NUTS-3 locational information in ORBIS, this variable is known for a third of all observations.

Missing themes
As described in Sect. 2, the projects can be categorised into fifteen themes as defined by DG REGIO. Since not all managing authorities publish these themes in their lists of beneficiaries but most of them provide a project description and a project name, we employ supervised text classification to predict the missing project themes.
In order to train the classification algorithm (classifier) we use theme labels reported by some managing authorities, augmented with manually assigned theme labels. Since some of the project descriptions are given in a language other than English, we first use Google Cloud Translation API to translate the project descriptions and project names into English. Although we cannot quantify how many errors have been introduced during the translation process, we report the overall accuracy of the classification, where some part of error is attributed to translation errors. The records which cannot be translated are left unchanged. Then we remove those observations where the project name and description together have fewer than 30 characters. Thus, we overall use 1,698,191 projects (82.62% of all observations) and 588,713 labeled projects (28.64% of all observations) to train and evaluate the classifier.
Choosing an appropriate classifier is a non-trivial task. We find the recently published fastText 20 library (Joulin et al. 2016) to perform well on our dataset in terms of the performance metrics precision, recall and accuracy. 21 In text classification, it is often desirable not to use the words of a text directly for estimation but to first map the text into a vector space with a much lower dimension. The fastText library which uses a single hidden layer neural network can be applied for text classification and to learn the vector representations of words.
The basic idea of the model is to proceed in two steps: first, the data is mapped into a low dimensional vector space (i.e., each sentence is mapped into a numeric vector) in such a way that similar texts have similar vector representations. Second, multinomial logistic regression is used to predict the labels.
For the evaluation of the classification we use the tenfold cross validation method (Stone 1974). In k-fold cross validation the data is randomly split into k parts, where k − 1 parts are used for training the model and the remaining part is used for model evaluation. To test the model on all available data, this process is typically repeated k times. In Fig. 1 we report the confusion matrix (a special type of contingency Name of the beneficiary 100

Funding information
Currency 100 Amount of EU support-committed 15 Amount of national support (co-financing)-committed 5 Non-eligible cost paid by beneficiary 2 Total project value 100 Amount of EU support-paid out 15 Amount of national support (co-financing)-paid out 3 Declaration date 100

Business information (ORBIS)
Name of the firm 39 Number of companies in corporate group 17

Regional characteristics
Less developed region or not (classification by NUTS-2 region) 82 Location (NUTS-3) 33 Predominantly urban region or not (classification by NUTS-3 region) 33    Fig. 1 visualise the row and column percentages of the confusion matrix. The width of the rectangles represents the row percentages and the height of the rectangles the column percentages. Therefore, the width of the rectangles of the diagonal corresponds to the recall, the height of the rectangles of the diagonal to the precision and the volumes of the rectangles of the diagonal to the squared G-measure i (G-measure i = √ precision i × recall i ). In the confusion matrix, we see that, given the true theme Other Transport, in 105 cases the model is able to predict the true label and in seven cases the model predicts the theme Road. Overall, we obtain an average classification accuracy of 0.94. However, for completeness we note that there are duplicates in the trainings and test set. Accounting for this, the average accuracy without duplicates is 0.90. In order to make the classification results easily reproducible, we assemble the R (R Core Team 2016) package fastTextR that contains an interface to the fast-Text library and is available at CRAN. 22

Comparison of the dataset with official data
We assess the validity of the assembled data by checking for outliers and plausibility and comparing its dimension with official data on regional funds assignment (equivalent to C_EU i in Eq. 1) published by DG REGIO. Table 2 shows the sum of total project values in the database ( Total value i in Eq. 1) per country and objective, excluding projects which cannot be assigned to a specific objective. The total values in the database in general do not only consist of committed values by the EU. Multiplying the total project values with the maximum co-financing rate per country and objective (see Sect. 2.1) results in the highest amount the EU should provide. If the official committed value, given in the last column of Table 2, is lower or equal to the maximum EU's co-funding, we expect the sum of total project values in our database to be plausible. The latter is true for the large majority of member states. For Bulgaria, lists of beneficiaries are available for only less than a third of their operational programmes. The Estonian source is an online database which might not yet contain all projects. For Denmark, the gap may arise due to the total project value summing up paid-out and not committed amounts in the database.
Next, we compare the distribution of regional funds among co-financing instruments and themes as reported in the database and by DG REGIO. First, 55% of the sum of total project values correspond to projects co-funded by the ERDF (unequivocal classification) while according to data by DG REGIO around the same amount (58%) of structural funds and the Cohesion Fund is transferred via the ERDF. For ESF, the share of the total project values amounts to 22% which is exactly the same one as in official data. As several operational programmes are co-funded by the ERDF and the CF and there is no more detailed information reported in the lists of beneficiaries, we are able to attribute only about 4% of total project values in the database to the Cohesion Fund. For about 20%, we cannot clearly say which one of the funds is the supporting one. Following DG REGIO data, about 20% of structural and Cohesion funds commitments are settled via the Cohesion Fund, i.e., it is likely that the major part of the not uniquely assigned total values in the lists of beneficiaries can be attributed to the Cohesion Fund.
Regarding the distribution of funds and total project values across the fifteen themes (project categories), our database proves to be consistent with official data (Fig. 2). The highest project expenditure is dedicated to Innovation & RTD and Environment according to the dataset and as also reported by DG REGIO.

Stylised facts: the intraregional distribution of regional funds
In cohesion policy regulations, the EU institutions specify priority themes to be targeted by operational programmes in a programming period but do not preset any detailed requirements regarding the size or other characteristics of the projects or beneficiaries to be selected. 23 Therefore, we expect that there is no uniform strategy for distributing regional funds at the project level across European regions. The selection of projects by managing authorities is likely to depend on the accordance with an OP's underlying priority themes and its main objective, as well as the assessment of the capability of potential beneficiaries to carry out the project (with a certain volume).
One of the features of the data that has not been analysed in previous literature is the variation in average project values shown in the top right-hand side of Fig. 3. This variable ranges between EUR 15,520 per project in Marche, Italy, and EUR 57,814,068 in Ireland. While it is comparably small in Central European countries, the average project value is relatively high in some North European regions (in the UK, Denmark or Belgium) as well as the member states that joined the EU in 2004 and later. While there are additional reasons for the regional variation in the average total values such as project and beneficiary characteristics, this suggests that regions with fewer projects, on average, have higher total values per project.
The left upper part of Fig. 3 presents the number of projects per region, which varies between 32 in South-East England, UK, and over 85,000 in Puglia, Italy. There are regions with many projects like Puglia, Italy, or North Rhine-Westphalia, Germany, which are typically characterised by a high number of projects related to the themes Labour Market or Human Capital. Both themes are associated with relatively low project amounts. 24 Contrarily, the regions in which the largest share of funds is allocated to Transportation Infrastructure projects (in Poland and Croatia) 1 3 Empirica (2019)    intermediate beneficiaries like public institutions on a municipal level. Those bodies apply for the funds, however, they redistribute them further or use them to carry out projects for smaller entities. In those cases, the ultimate beneficiaries are not known publicly. Finally, a part of the remaining variation in the number of projects may be largely explained by poor data availability. In Croatia, not all projects could be assigned to a NUTS-2 region given the report by the managing authority. For Bulgaria, lists of beneficiaries are only available for two out of nine operational programmes. The dataset consists of over two million projects granted to approximately one million individual beneficiaries. That means, on average, every beneficiary receives co-financing for two projects. Only 17% of beneficiaries carry out more than one project, only 3% have more than five and only 1% more than ten. The beneficiary with the most co-financed projects (more than 18,000) is the Spanish ICEX Espana Exportación e Inversiones, a governmental institution that promotes (foreign) investments in Spain. The second most (more than 11,000) are carried out in the city of Florence, Italy, the third most (more than 10,000) by the governmental training and orientation section of the region of Tuscany, Italy.
Besides project and beneficiary characteristics, one stylised fact arising from the data confirms the focus of EU cohesion policy of providing most financial support to less developed regions: one main objective of the EU's cohesion policy is to support the catching-up of those regions, i.e., NUTS-2 regions with a GDP per capita below 75% of the EU-average, in economic and social terms. As one can see in Fig. 3 (left lower map), in Southern Italy, Portugal and Spain as well as Poland, Romania and Bulgaria, the Baltics or Slovenia, the total values of co-financed projects relative to the respective regional income tend to be larger than in other regions. E.g., for the Lithuania and Poland, project values account for up to eight percent of regional GDP. 25 Contrarily, in the so-called "Blue Banana" 26 the sum of total project values lies below 0.1% of regional GDP. In those regions, projects with the highest expenditure tend to be carried out by small and medium-sized firms in the education sector, in public administration as well as in professional, scientific and technical activities. Those projects are mostly aimed at Regional Competitiveness and Employment and the themes Innovation & RTD as well as Human Capital. The sum of project values in Scandinavian, French, Northern Italian, North-Eastern Spanish regions as well as Scotland and Northern Ireland, UK, ranges between 0.1% and 0.5% of regional GDP. Project values per capita follow a similar overall distribution. The amount in 25 All of Estonia, Latvia, Lithuania, Poland, Romania, Bulgaria and Slovenia are eligible for Convergence (former Objective 1) funding in (European Council 2006b. The region with the highest project values as percentage of its regional GDP is Podkarpackie in Southeast Poland with 8.45% or EUR 8.2 billion. According to DG REGIO, over EUR 67 billion have been committed to Poland in 2007-2013. Bulgaria and Romania joined the cohesion policy programme 2007-2013 with a lag and therefore received little funds relative to their GDP. 26 Populous and usually rich regions that range from the UK, Belgium and the Netherlands, parts of Germany, Austria to Northern Italy. most regions falls below EUR 500. The region with the highest value per capita is Bratislava, Slovakia, with EUR 12,575 per inhabitant over the course of seven years. Table 3 presents summary statistics for the average project values in less developed versus developed regions, and confirms this finding. Projects carried out by beneficiaries in less developed regions are larger on average. A t-test on the significance of the mean differences shows that they are statistically significant at the 1%-level. Most of the projects there correspond to the Road, Environment but also Innovation & RTD themes and are carried out by very large firms that operate in public administration and the manufacturing industry. By contrast, large shares of regional funds in higher-income regions go to Innovation & RTD, Environment and Labour market projects, while less money is dedicated to transportation infrastructure. Gagliardi and Percoco (2017) indicate that the effectiveness of EU cohesion policy varies across urban and rural (NUTS-3) regions, thus, we are also interested in potential differences in the usage of regional funds in such areas. The summary statistics of project values shown in Table 3 reveal statistically significantly higher average single project values in (predominantly) urban NUTS-3 regions. The following section provides more details on the differences in regional funds distribution within urban versus within rural regions.

Type of fund, objectives and themes
The allocation of regional funds to OPs co-financed by different types of funds and under different main objectives is closely related to previous findings regarding the development status of regions and the choice of thematic priorities. 27 The total values of projects subsidised by the ERDF sum to roughly EUR 270 billion compared to EUR 107 billion from the ESF and EUR 18 billion from the CF. In addition, projects accounting for EUR 100 billion are most likely funded by the CF. Typically, the Cohesion Fund (CF), for which only regions in countries with a gross national income below 90% of the EU average are eligible, is targeted at co-financing (large) infrastructure projects and the accessibility of lagging-behind regions. A major part of the European Social Fund (ESF) is allocated to projects with relatively smaller project values fostering, e.g., human capital, while the ERDF co-finances a broad spectrum of project types. These funding priorities are mirrored in the volume of corresponding projects: the median CF project value amounts to about EUR 100,000 compared to EUR 37,000 for the ERDF and EUR 3200 for the ESF.
Co-financing by different funds additionally varies across types of regions (Table 4). By construction, the CF is more important in less developed regions, while the ESF accounts for only around 10% of project volumes in those poorest regions. It is interesting that the CF plays a larger role for co-funded project volumes in (predominantly) urban regions as compared to rural ones. According to the data, this may be related to the fact that there are more relatively large Road projects funded in areas with higher population density.
The overall objective of the OP each project is part of, is closely related to the development status of the regions, as, along with several exemptions, only less developed regions are eligible for Convergence funds. While around half of the (number of) projects in our sample is aimed at Convergence, their value is three times as large as that of Regional Competitiveness and Employment projects. As expected, operational programmes in less developed regions have a focus on Convergence, while OPs in richer regions receive more to boost Regional Competitiveness and Employment.
For the appropriate targeting of regional funds, the European Commission defines fifteen priority themes (see Sect. 3.2) to classify projects. In total, the largest sums are committed to projects in categories Innovation & RTD (EUR 70 billion), Environment (EUR 65 billion) and Other SME and Business Support (EUR 46 billion), followed by Labour Market, Road and Human Capital projects. The number of observations per theme ranges from 1058 Rail projects to over 600,000 projects related to the Labour Market. There is also considerable variation of individual project values across themes. Projects related to Transportation (EUR 1-3 million), Urban and Territorial Dimension as well as Culture Heritage and Tourism (EUR 200,000 each) are the largest. Human Capital (EUR 6000), Labour Market (EUR 2200) and Energy (EUR 1250) projects are the smallest.
Additionally, the lower right part of Fig. 3 shows the theme at which the maximum sum of project values in a region is targeted. For better readability the map shows five groups of themes instead of all fifteen. 28 In total, the largest sums in Poland and Croatia are related to Transportation projects. Energy and Environment projects are most important in Latvia, parts of the Czech Republic, two French regions, three Spanish regions and a Polish region. Contributions to Culture, Tourism and Social Infrastructure are most pronounced in some Czech regions and East Slovakia. Scotland, South Sweden, almost all of Italy and parts of Germany and France have their largest project sums related to Human Capital, the Labour Market Comparing all less developed to all higher-income regions, differences in the average distribution of total project values across priority themes become apparent (see Table 4). For example, Road accounts for 17% of the sum of project values in less developed regions and only for 1% in higher-income regions. Moreover, the share targeted at Labour Market projects in richer regions exceeds that in poorer ones by 11 percentage points.

Industrial structure and firm size
The matched data from ORBIS allows us to go into more detail with respect to characteristics of the beneficiaries. Figure 4 shows the distribution of funds across NACE Rev. 2 industries. The left-hand side illustrates total sums while the righthand side shows the distribution of individual project values. The matching process and the coverage in ORBIS enable us to assign an industry to one third of observations, i.e., almost 700,000 projects.
Overall, Fig. 4 (left part) shows that projects with the highest sum of total values are carried out by firms or institutions operating in public administration, defence and social security (EUR 80 billion), transportation and manufacturing (EUR 30 billion each) and education (EUR 20 billion). The right-hand side of Fig. 4 shows that the single project volumes also vary across beneficiaries in different industries. While the median value of projects carried out by public beneficiaries (NACE Rev. 2 industry O) lies clearly below EUR 100,000, firms operating in the energy and water sector are responsible for the projects with highest median values (over EUR 200,000). Most of the latter projects have to do with electricity production, air conditioning, water supply as well as waste collection and treatment.
Beneficiary firms or institutions also differ in their size. Table 4 shows the distribution of regional funds across four categories of firm size defined by ORBIS. 29 In terms of overall project sums, small firms make up the largest share (EUR 120 billion), followed by very large firms (EUR 80 billion) and medium-sized as well as large firms (EUR 50 billion each). Note that there are fifteen times more small beneficiaries of structural funds than very large ones. They are especially strongly represented in projects co-financed by the ESF. In total, 85% of recipients in the database are small or medium-sized (SME) companies, which reflects the priorities set out in the Community's strategic guidelines on cohesion. Moreover, this may indicate that the projects, which smaller and medium-sized firms submitted in order to apply for co-funding and which were selected, are smaller in terms of single project values than the projects carried out by large firms.
Interestingly, Table 4 shows that while managing authorities in regions with an income above 75% of the EU average (developed ones) select mostly projects carried out by small firms, in less developed regions the volumes of projects carried out by very large companies is higher. Also the share of project values administered by SMEs in less developed regions falls behind the share in higher-income regions by 17 percentage points. There is a similar heterogeneity when comparing projects in rural and urban regions, and it is the share of the value of projects carried out by very large firms in urban regions which 29 ORBIS considers companies to be small if they could not be classified, which could possibly inflate the number of small companies due to missing data. However, labelling the observations based on the number of employees, we find a similar distribution of labels. exceeds the one in rural areas by more than 30 percentage points. This pattern may be driven by the fact that cities implement relatively many large projects.

Focus on the single beneficiary
Given the evidence on substantial variation in the total value of single projects, we investigate the potential determinants of this variable for the projects in our database. In particular we hypothesise that the project size is closely related to the following characteristics: (1) the funding instrument (ERDF, ESF or CF), as certain types of funds typically co-finance different kinds of projects, (2) the objective of the corresponding OP (Convergence or Regional Competitiveness and Employment), which mirrors the development status of the region, (3) the managing authorities' focus on thematic priorities (themes), e.g., a project which consists of building road infrastructure will have a higher value than a specific employee's training, and (4) beneficiaries' characteristics like size or industry. Regarding firm size, our hypothesis is that, on average, larger firms are capable of carrying out larger projects than smaller firms.
To this end, we estimate the following equation: where ln(TV i ) represents the logarithm of the total project value of project i, R i the region in which project i is located (a sort of regional fixed effect), F i the type of (4) ln(TV i ) = + R i + F i + O i + T i + I i + S i + fund which supports project i, O i the objective of the funding for project i, T i the theme under which project i is supported, I i the industry of the beneficiary and S i the size of the beneficiary of project i. The variables R i , F i , O i , T i , I i and S i are factor variables and the category with the median coefficient is always the one that is excluded respectively (in the following Tables). Thus, the resulting coefficients should be interpreted relative to the project with the coefficient being in the middle of the (conditional) distribution of this variable, which is always shown in the results tables with a coefficient of zero. Significance levels are not shown, as they depend entirely on the chosen reference category and have no general meaning in such a context. 30 We do not include projects with total values which are zero or negative and end up with 482,040 observations for which Eq. (4) is estimated. 31 In this way, we are able to shed light on conditional differences between the volume of projects supported by different funds with different objectives and themes as well as with beneficiaries in different industries and of different size. Moreover, we are able to analyse whether unexplained residual variation in project volumes (in countries and regions) exists. We think that this could entail interesting policy implications as Locatelli et al. (2017) show for public procurement processes that funding larger projects prepares the ground for more corruption than it would be the case when more and smaller projects are tendered. Other analyses show that firm subsidies are not equally effective when granted to firms of different size (Criscuolo et al. 2012) and to firms located in different regions (Bachtrögler et al. 2018).
In order to take the heterogeneity of funding principles into account, we analyse total values of single projects not only for the complete sample but additionally run Eq. (4) separately for, first, projects co-funded by (1) the ERDF and the ESF (structural funds) and (2) the CF. Second, we split the sample into projects carried out by (1) public and (2) private beneficiaries. 32 Finally, in a robustness check in which we model firm size by the number of employees, volume of sales and add firm age as a further control variable, we run regressions using sub-samples of projects carried out in (1) less developed and (2) developed regions, as well as, (3) urban and (4) rural regions.

Type of fund, objective and themes
Controlling for the other variables included in Eq. (4), the projects with the smallest total value can be identified as co-funded by the ESF (left panel of Table 5). 30 While a coefficient might be highly significant with respect to reference category A, it could be not significant with respect to category B. Thus, one might create any significance level by choosing a respective reference category. Only the coefficient keeps its meaning. Wherever there is a specific meaning to the significance level, such as for revenues or the number of employees of a beneficiary, we also report the significance levels. 31 Only observations matched with ORBIS data can be considered for the econometric exercises in this section. 32 Here, we consider beneficiaries in the NACE Rev. 2 sector "O-Public administration and defense; compulsory social security" (according to ORBIS data) as public firms, whereas all other sectors are categorised as private.
They are more than three-quarters smaller than those co-funded by the ERDF and (on average) by the CF (if the projects that cannot be assigned clearly to the ERDF or the CF are counted as CF co-funded ones). This corresponds to the goals of the various funds, as the ESF is mostly funding smaller projects related to the inclusion, training and adaptability of workers in the labour market and employment, while the ERDF and CF funds are aimed at improving the economic structure and fundamentals of regions (European Commission 2017). Splitting the sample by broad industries reveals that CF projects carried out by public entities are on average twice as large in terms of their project value than ESF projects. Likewise, relative to ESF project values, CF projects are (on average) largest in the case of beneficiaries in the private sector.
Furthermore, controlling for everything else, Table 5 shows that projects corresponding to OPs with the Convergence objective are larger than projects under the Regional Competitiveness and Employment objective (right panel of Table 5).
Turning to another aspect, large differences in conditional project values arise also with respect to the projects' themes. 42% of the largest projects in the database (with a volume above EUR 50 million) are associated with the three transportrelated themes (Rail, Road, Other Transport). As also found in the unconditional analysis in Sect. 4, it might not be surprising that the projects with themes related to Labour Market, Human Capital and SME are among the projects with the lowest conditional project values, but the Energy theme would probably be expected to be among the one with the highest project values (see Table 6). Regarding the latter theme, there is large variation across subsets and it turns out that especially Energy projects co-financed by the CF and carried out by private-sector (but not publicsector) beneficiaries are rather small.
In general, the largest projects co-financed across all subsets are targeted at Road and Other Transportation infrastructure (and fostering the Urban and Territorial Dimension in the case of the CF). Interestingly, relatively bigger investment projects in Rail systems appear to be co-financed by structural funds (and not the CF) and implemented by private entities (not public ones). The same is true for projects with a relatively high conditional value in the fields of Social Infrastructure, Environment and Innovation & RTD. 33 Moreover, conditional differences in project values across themes do not necessarily correspond to the differences across industries. An interesting observation is that the lowest project values can be observed for the Energy theme, but the highest conditional project values for the energy industry (Sect. 5.2). The maximum project value within the Energy theme as well as the energy industry is found for the same project which amounts to EUR 906 million. However, the Energy theme includes 7715 projects with a project value of EUR 1250 each and 64% of the projects have a value below EUR 5000. Those projects below EUR 5000 are not conducted by firms in the energy industry but are observed mainly in "Real Estate activities" (59%), "Manufacturing" (11%) and "Wholesale/retail; repair vehicles" (8%).

Industry and beneficiaries' characteristics
The highest conditional project values are attributed to beneficiaries in the Energy and Water industries ( Table 7). The largest three project values of firms in the Energy industry are EUR 906 million, EUR 103 million and EUR 100 million and only 37% of the projects have a total value below EUR 100,000. Within the Wholesale industry, on the other hand, only 12% of the projects have total values larger than EUR 100,000.
As expected, firm size also plays a role for the level of the total project value (Table 8). Overall, we see that the larger the beneficiary, the larger is the total value. However, the difference in project volumes is surprisingly low, as the volume of a project of a very large company is not even twice the one of a medium-sized company but the average revenue of the very large companies in our sample (EUR 653 million) is approximately 130 times the average revenue of the medium-sized companies (EUR 5 million). The pattern is similar when considering the samples split according to the co-financing type of fund and the sector the beneficiary operates in.

Residual variation in single project values
Comparing the total value of single projects across NUTS-2 regions, while controlling for the type of fund, objective, theme, industry and size of the beneficiary, reveals differences of more than plus and minus 400%. The twenty top and bottom regions are presented in Table 9. The lowest conditional amounts per project can be found in Austria, Spain, Estonia, Germany and Belgium. The highest conditional values per project can be observed in the UK, the Netherlands, Finland, Malta and Luxembourg. 34 Beneficiaries which are similar in size, are in the same industry, receive money from the same fund and within the same theme, but are located in Lower Austria (NUTS-2 region AT11) on average have about 8.5 times lower project values than beneficiaries located in the East of England (NUTS-1 region UKH). Apparently, in Lower Austria, 90% of projects are smaller than EUR 10,000 and 4688 projects are smaller than EUR 1000, whereas in the East of England the smallest project is already EUR 133,168 in size. Table 10 shows regional effects based on estimating Eq. (4) for subsets related to project and beneficiary characteristics. First, the upper panel shows the ten regions in which, conditional on the other control variables, the biggest and smallest projects co-funded by structural funds on the one hand, and the Cohesion Fund on the Table 6 Regression coefficients with respect to theme Coefficients of regressions based on Eq. (4). White-robust standard errors. Dependent variable: logarithm of the total project value. Other control variables (firm size, industry, region, fund type and objective) are also included but reported in other tables. Coefficients of respective dummy variable relative to the median category. "Type of fund" is excluded as control in columns 3 and 4. "NACE industry" is excluded as control in columns 5 and 6 Other transport Culture heritage and tourism Urban and territorial dimension IT services and infrastructure Other SME and business support

I-Accommodation and food service activities
N-Administrative and support service activities − 15% − 14% − 76%  other hand, were carried out. While the results for the first case broadly reflect the regional effects identified for the complete sample, the Cohesion Fund sample only includes a limited set of countries. From the regions located in those lagging-behind countries, projects in Estonia, the Czech Republic as well as some parts of Spain are smallest. The lower panel of Table 10 indicates that there may be differences in selecting private and public beneficiaries across countries. While Austrian regions and Estonia form part of the Bottom 10-group only in the case of private firms, the smallest conditional project values for beneficiaries in the public sector are present in Spanish, one Italian and two German regions. Interestingly, next to British and Dutch regions, projects by public institutions are relatively big in size in Helsinki-Uusima (FI1B), Finland, and those of firms in the private sector in Malta.

K-Financial and insurance activities
The estimation of Eq. (4) controls for regional fixed effects, thus, any regional influence on the remaining differences between total project values of single projects should have been removed. Figure 5 shows the average residuals of a regression similar to Eq. (4), but excluding the regional control variables ( R i ). Hence, the figure shows the (average) unexplained part of the project value when controlling for fund type, objective, theme, industry and firm size split by country. It confirms that the projects, which are similar in many dimensions but the region, with the lowest total values can be found in Austria, Estonia and Spain and the ones with the highest values in Luxembourg, the Netherlands, Denmark and the UK.
Due to the findings in Sect. 4 and as Luxembourg, the Netherlands, the UK and Denmark are relatively densely populated, one could suggest that population density determines that finding. However, the relationship is not that clear. The residual variation for Finland, which is the least densely populated country in Europe, is also relatively high, while Germany, which is relatively densely populated, shows a downward bias with regard to conditional average project Table 9 Regression coefficients with respect to NUTS-2 region Coefficients of regressions based on Eq. (4) (482,040 degrees of freedom, adj. R 2 of 0.49). White-robust standard errors. Dependent variable: logarithm of the total project value. Other control variables (firm size, industry, fund type and objective, theme) are also included but reported in subsequent tables. Coefficients are relative to the median NUTS2-region which is FR25 Top 20 (Eq. 4) Bottom 20 (Eq. 4) volumes. Referring to Locatelli et al. (2017), the countries with the largest residuals are characterised by relatively low levels of corruption and good institutions at the regional level (Charron et al. 2015). Therefore, we suspect that the variation may be determined by historically grown funding strategies and further national institutional settings. Verifying this assumption remains for future research.

Public beneficiaries Private beneficiaries
Top 10 Bottom 10 Top 10 Bottom 10

Conclusion
The novel database introduced in this study contains detailed information on over two million projects co-financed by the European Regional Development Fund, the European Social Fund and the Cohesion Fund in 25 EU member states in the programming period 2007-2013. Additional to project information such as the total value of each project and a project category (theme), the beneficiaries are matched with the ORBIS business database. This study shows that there are different patterns in the intraregional funds distribution across and within countries, both in terms of project and beneficiary characteristics. Moreover, the analysis points to the fact that managing authorities select different kinds of projects, with significantly different single project volumes, in urban and rural regions. The same turns out to be true for less developed and other NUTS-2 regions, which seems to be linked to the priorities and regulations of certain types of funds, e.g., the CF, and main objectives.
In addition, we find that most regional funds are dedicated to transportation infrastructure in less developed regions, whereas in higher-income regions a larger focus is put on fostering Labour market and Social inclusion. In all regions, Innovation and RTD as well as Environment projects form a large share of the sum of project values. Regarding beneficiaries' characteristics, the largest share-on average around 40%-of the (unconditional) sum is allocated to projects carried out by small firms, whereas this does not hold true for less developed and urban regions.
In the econometric analysis we test for the importance of certain project and beneficiaries' characteristics in determining a single project's size. The largest single projects in terms of their total value are co-funded by the ERDF and the CF (as compared to the ESF), and under the Convergence objective (as compared to Regional Competitiveness and Employment). In line with the priorities of the different funding instruments, the largest projects are attributed to transportation infrastructure projects (Road, Rail and Other Transport). Regarding the beneficiary, larger firms (with higher revenues and more employees) carry out projects with higher total value, however, the average single project value of (very) large firms is only about twice as high as that of small entities. Having controlled for all characteristics, some variation in single project values remains unexplained, and this residual variation appears to differ across countries. From that we draw the conclusion that national institutional settings or traditional funding procedures may play a role.
We contribute to the academic and political debate by making a dimension of EU cohesion policy implementation visible that has not gained much attention until now. The possibility to compare individual projects' and beneficiaries' characteristics across heterogeneous regions and countries opens a new strand of research questions, e.g., whether projects in a region are carried out by firms located in the same region and by what this is determined. Moreover, the data could feed in dynamic stochastic general equilibrium or other forecasting models that simulate potential policy outcomes under different scenarios. Finally, the analysis of this dataset may entail interesting conclusions on more or less effective ways of distributing regional funds within European regions. In this respect, we also think that future research could explore how the considerable residual variation in the size of projects found in this paper is related to institutional settings in the respective countries as well as to the efficiency and effectiveness of EU cohesion policy. each OP, i.e., the reporting date of the list of beneficiaries or, if not provided by the managing authority, the date when we downloaded the list. Finally, Table 11 indicates the degree of detail of reported project sums: From only committed co-financing values (C) to EU and national public co-funding and private ineligible expenditure data (C_EU + C_NAT + I), and whether we know the value actually paid-out.

Robustness check
For a smaller sample of projects additional information from ORBIS is available. Table 12 presents the regression results including additional data, i.e., estimating the following: where U i is the sales volume of the beneficiary of project i, E i the number of employees of the beneficiary (sales and number of employees corresponds to the last available observation in ORBIS) and Y i the founding year of the beneficiary (in five brackets-before 1950, between 1950 and 1980, between 1980 and 2000, between 2000 and 2010 and after 2010). Furthermore, the regression contains all control variables included in Eq. (4), apart from the size of the beneficiary S i as this ORBIS variable depends on the number of employees of the beneficiary E i and its revenue U i . In order to exploit the variance of relationships between project and beneficiary characteristics across regions, the regressions are not only done for the whole sample for which all of these variables are available. Further, we split the sample into (1) rural versus (2) urban regions as well as (3) less developed versus (4) developed (not classified as less developed by the 75% threshold applied to regional GDP relative to the EU average) regions. In those estimations, contrarily to previous ones, country-fixed effects are included instead of region-fixed effects.
Estimation results confirm that the size of the beneficiary matters for the value of a single project. Controlling for all other variables, higher revenues and more employees are associated with significantly higher single project values. When the number of employees of a beneficiary firm increases by 1%, the project it carries out is by 12% larger in terms of the project value. Taking into account whether a project and beneficiary, respectively, is located in an urban or rural region, this number amounts to 14% in rural and 17% in urban regions. Also in less developed regions this difference holds true. However, when considering only developed regions a onepercent-rise in employment is associated with only a slightly higher total project value (by 2%) .
Furthermore, the age of the firm or institution receiving funds matters. Especially in less developed (as compared to other) and rural (as compared to urban) regions, younger beneficiaries which were founded after 2000 and after 2010 have projects with significantly higher total values than companies which were founded before that. Beneficiaries incorporated before 1980 carry out projects with the smallest total values.
Single projects' values in various industries (NACE Rev. 2 sectors) also differ between urban and rural as well as less developed and other regions. In some cases          the difference might be due to location decisions of the companies and the regional sectoral structure. For example, the projects in Financial and insurance activities or Other service activities are larger in urban and developed regions. Moreover, projects with beneficiaries in Water supply, Education and Energy industries are larger in rural as compared to urban and (for the Energy industry only slightly smaller) in less developed as compared to higher-income (developed) regions. This table lists the operational programmes and information found in the official lists of beneficiaries provided by regional managing authorities or other regional or national authorities. The underlying list of operational programmes including their names can be downloaded at http://ec.europ a.eu/regio nal_polic y/en/polic y/evalu ation s/data-for-resea rch/ in the section "EU Budget commitments by fund by year and by programme" when selection the programming period 2007-2013. a Projects of 2007NL162PO002 and 2007NL162PO004 are part of the list of beneficiary for OP 2007NL162PO001. b Unequivocal assignment of ERDF projects to one of ERDF OPs is not possible. C_EU stands for committed EU co-financing, C_NAT for committed national co-financing, and I signifies ineligible cost, i.e., the cost carried by the beneficiary. Hence, C_EU + C_NAT + I means that the structure of the total project value is known. C means the a committed value, without declared partition across national and EU budget, is reported. A "Yes" in column Paid-out means that we have information on paid-out values. "Yes (2)" denotes that we know the partition of the paid-out amount into EU and national public co-financing. If the fifth and consequent columns do not contain any information, the dataset does not cover them as we have not found beneficiaries lists provided by the respective authorities. When the NUTS dimension given is "ORBIS", it means that we have NUTS-2 information for those beneficiaries of the respective OP we could find and and match with the ORBIS database. Objective 1 refers to the Convergence objective, Objective 2 to Regional competitiveness and employment No. of employees (log) 0.12*** 0.14*** 0.17*** 0.15*** 0.02*** Revenues (log) 0.08*** 0.10*** 0.03*** 0.10*** 0.11*** When comparing the influence of theme on project values it becomes apparent that project size varies considerably between urban and rural (NUTS-3) regions for some of the themes. Projects corresponding to the Rail, Other Transport, Environment, Urban and Territorial Dimension and Energy themes are larger in urban (as compared to rural) and developed (compared to less developed) regions. Also Road projects are larger in urban regions which may be partly due to higher construction costs for infrastructure projects in urban areas. All other themes, especially those targeted at Social Inclusion, Social Infrastructure, Labour Market and Other SME and Business Support, are large in terms of conditional project values in rural regions. With regard to the development status of the NUTS-2 regions, results suggest that relative to IT Services and Infrastructure projects, just Road and Social Infrastructure are considerably larger in less developed than in higher-income regions.
Finally, the plot of the residual variation, which is not explained by project and firm characteristics, by country shows a very similar picture as in the baseline regression. The graph is provided in the Online-Appendix of this paper.