1 Introduction

Under the cohesion policy framework, the European Union (EU) committed over EUR 348 billion from 2007 to 2013 (over EUR 371 billion in the multi-annual financial framework 2014–2020) to the re-distribution of funds among European regions in order to foster regional development and cohesion. There is a broad literature that investigates different aspects of the effectiveness of those financial contributions, e.g., in terms of increasing income growth (see Dall’Erba and Fang 2017, for a survey). Due to a lack of data, research so far has mainly focused on the implementation of the policy and its effects at an aggregated, mostly regional, level. Nevertheless, the specific design of cohesion policy programmes remains a prominent theme in the academic and public debate, especially with respect to post-2020 EU cohesion policy. Apart from the allocation of funds to thematic priorities in a region, however, the intraregional distribution of funds to specific projects and beneficiaries has been a “black box” to researchers and European policy makers so far.

In principle, the European Regional Development Fund (ERDF), the European Social Fund (ESF) and the Cohesion Fund (CF), in this study together referred to as the EU’s regional funds, co-finance projects that are part of operational programmes (OP) and pursue strategic priorities like strengthening the labour market, improving social infrastructure or building better traffic networks.Footnote 1 The projects are carried out by firms, institutions or other entities and are selected to be co-financed by the particular OP’s managing authority (a public or private body nominated by the member state).Footnote 2 Therefore, next to the managing authorities’ responsibility for choosing suitable and promising projects, the actual beneficiaries are accountable for the single projects’ success. The appropriate fulfilment of these tasks (project selection and successful implementation) likely contributes to achieving the corresponding OP’s target and, finally, the overall effectiveness of a region’s EU cohesion policy implementation.Footnote 3

This study presents newly collected information on individual projects co-financed by regional funds during the multi-annual financial framework (MFF) 2007–2013 and explores how the regional funds committed to European NUTS-2 regions are distributed within the regions.Footnote 4 The resulting database contains data on over two millions of co-funded projects in 25 EU member states. Those projects are carried out by 1,076,097 beneficiaries which we matched with the ORBIS business database by Bureau van Dijk in order to gain information on their business characteristics.

Using that data, this paper contributes to the literature by providing a comprehensive analysis of the projects selected by national or regional managing authorities during the MFF 2007–2013. The high level of granularity of the data allows to investigate whether the distribution of regional funds differs across European regions with respect to project, geographical and firm-level characteristics of the funds’ beneficiaries.Footnote 5 Thus, the availability of this micro data increases the transparency of policy implementation. For example, it shows whether the EU’s funding priorities (e.g., the focus on small and medium-sized enterprises) are mirrored in cohesion policy implementation or if more guidance would be desirable. Moreover, the findings of this paper may be interesting for EU cohesion policy evaluators, as differences in the distribution of regional funds within regions may play a role for explaining differences in EU cohesion policy effectiveness across regions.

First, this study investigates the intraregional distribution of regional funds along different dimensions like the thematic priorities set by the OPs’ managing authorities. This descriptive analysis distinguishes between different types of regions, i.e., less developed and richer, as well as urban and rural ones. One result is that a large share of regional funds in low-income regions is allocated to Transportation Infrastructure, Environment and Innovation and Research and Technological Development (RTD) projects. In the majority of the other regions, the largest project amounts are also assigned to the latter two themes, while, in addition, there is a stronger focus on labour market projects.

Second, since we are one of few contributions so far that are able to analyse regional fund data at the level of projects and beneficiaries in an encompassing way, we also focus on individual projects. In particular, we document substantial variation in project volumes across different funding instruments (i.e., the ERDF, ESF or CF), the objective of the OP (i.e., Convergence or Regional competitiveness and employment), the regional focus on thematic priorities, and beneficiaries’ characteristics like firm size or industry. Moreover, related to the projects’ themes, we find that managing authorities in urban (as compared to rural) and less developed (as compared to higher-income) regions allocate the regional funds on average to relatively large projects.

Third, after controlling for the project and beneficiary characteristics in a regression analysis, results show residual variation in project volumes that varies across countries and regions. In general, there is little literature on whether the size of (cohesion policy) grants to firms is related to the effectiveness and efficiency of a policy. Locatelli et al. (2017) indicate that the tendering of larger projects favours corruption in the public procurement process. Therefore, we argue that future research should investigate whether regional residual variation in average project values is linked to institutional settings or historically grown funding traditions in the respective country. Furthermore, the link between the latter as well as regional funds distribution patterns and the effectiveness of policy implementation should be explored.Footnote 6

The remainder of this paper is structured as follows: Chapter 2 gives an overview on the EU cohesion policy design and places our analysis into the context of existing literature. Chapter 3 describes the content of the database in detail and compares it with aggregate official data. Chapter 4 provides stylised facts regarding the distribution of total project volumes across (different types of) regions, project and firm characteristics. Chapter 5 shows the results of the econometric analysis with a focus on the individual project volume and, finally, Chapter 6 concludes.

2 The design of EU cohesion policy

In the multi-annual financial framework (MFF) 2007–2013, the regional funds amounting to EUR 348,865 million were the second largest item of the EU budget.Footnote 7 The cohesion policy’s objective is to increase convergence across European regions by co-financing member states’ initiatives targeted at specific priorities. First, a NUTS-2 region’s principle eligibility for funding is determined according to three main objectives, namely, (1) Convergence (former Objective 1), (2) Regional Competitiveness and Employment (former Objective 2), and (3) Territorial Cooperation (former Objective 3).Footnote 8 Second, for eligible areas, the funds are allocated to operational programmes which co-finance projects that are carried out by private or public firms or organizations. The new dataset presented in this study is based on lists of these beneficiary firms, institutions, non-governmental organizations or other types of entities (in the following we refer to all types of beneficiaries by the term firm) that have to be made public since the MFF 2007–2013 (Article 7 in European Commission 2006).

2.1 The distribution of EU regional funds

The EU’s cohesion policy works under the principle of shared management (refer to Article 14 and 15 in European Commission 2006). That is why multiple European and national (as well as sub-national) institutions are involved in the allocation process of regional funds.

For each MFF, the European Council prepares a document with strategic guidelines on reducing economic, social and territorial disparities. These guidelines for 2007–2013 encompass three priorities (European Council 2006a): (1) improving transport infrastructure, environmental and energy issues, (2) creating more and better jobs, and (3) a focus on knowledge transfer and innovation. Regarding the latter, special emphasis is put on supporting small and medium-sized enterprises (SME) which “often represent the highest source of employment at the regional level” (European Council 2006a, p. 19). The strategic guidelines on cohesion serve as a basis for the so-called national strategic reference framework that needs to be provided by each member state. The national strategic reference framework provides an overview of fields for intervention of cohesion policy in the particular country and undergoes a review process by the European Commission. Besides a proposal for the annual allocation of regional funds across the period, the member states need to generate a list of operational programmes (per objective and fund) for the objectives Convergence and Regional Competitiveness and Employment. Each operational programme is prepared and implemented by a managing authority, a private or public body appointed by the member state (Article 59 of European Council 2006b). In the most cases (in the case of a national OP always) it has a specified thematic target, e.g., improving regional human capital or social infrastructure. In most cases, they refer to particular NUTS-2 or NUTS-1 regions, however, there are also national, NUTS-0, programmes (see Appendix A.1 and Title III in European Council 2006b). The OPs must incorporate reasons for focusing on specific priority axes proposed by the Commission (see Annex IV of European Council 2006b).Footnote 9 From these priority axes, the European Commission derives fifteen so-called (priority) themes.

In the next step, the OPs’ managing authorities select appropriate projects that are carried out by firms, i.e., the beneficiaries.Footnote 10 According to Article 2 of Council Regulation (EC) No. 1083/2006, a beneficiary is defined as “an operator, body or firm, whether public or private, responsible for initiating and implementing operations”. An operation is referred to as “a project or groups of projects selected by the managing authority of the operational programme [...] allowing achievement of the goals of the priority axis to which it relates” (European Council 2006b). The amount of EU co-funding for each project depends on the eligible expenditure and the designated co-financing rate.Footnote 11

The structure of the new database builds on these regulations. The dataset includes the OP to which each observation (project) is assigned, the corresponding fund, objective as well as the theme. Thus, we are able to check the validity of our data by comparing it in different dimensions with official numbers on a more aggregate level by DG REGIO (see Sect. 3.3).

While the next section covers the literature that evaluates the effects of cohesion policy on regions or firms, there is also a literature that discusses the allocation process and the factors that potentially influence this bargaining. Bachtler and Mendez (2007) discuss the history of the allocation process up to the beginning of the MFF 2007–2013. They narratively highlight the negotiation process between member states and the EU Commission as well as the ongoing spatial concentration of the majority of funds. A more quantitative line of research (e.g., Bouvet and Dall’Erba 2010; Dellmuth 2011; Dellmuth and Stoffel 2012; Tosun 2014) finds that, indeed, factors such as the political situation, the type of governance, previous success of implementation or even the degree of eurosceptisism drive part of the allocation process.

2.2 Literature review

Most studies that analyse cohesion policy in a pan-European setting focus on the regional (in many cases NUTS-2) level (Hagen and Mohl 2009; Pienkowski and Berkowitz 2016). Most recently, Dall’Erba and Fang (2017) provide a meta-analysis of econometric studies on the evaluation of cohesion policy effects.

Generally speaking, there is no consensus in the literature regarding the outcome of cohesion policy. The effectiveness is measured, e.g., as a positive effect on gross domestic product (GDP) per capita growth (e.g., Pellegrini et al. 2013), investments per capita (Becker et al. 2013) or regional research and development activity (Ferrara et al. 2016). Most studies find a conditional positive effect of regional funds assignment (e.g., Becker et al. 2013; Cappelen et al. 2003; Ferrara et al. 2016), while others provide results that even suggest a negative impact (Breidenbach et al. 2016). In recent years, the potential reasons for heterogeneous (conditional) cohesion policy effects have gained major attention.

First, Rodríguez-Pose and Fratesi (2004) move the focus to expenditure categories of the main funding instruments. They find that investments in infrastructure or agriculture do not have sustainable effects on regional growth (see also Puga 2002; Dall’Erba and Le Gallo 2008), though, projects that foster human capital lead to sustainable positive effects on economic cohesion. This and other studies use data on the distribution of expenditure across NUTS-2 regions and themes (e.g., Dall’Erba and Le Gallo 2007; Percoco 2013; Ferrara et al. 2016).

Second, regional heterogeneity as a determinant of policy effectiveness has become a topic of interest and is often modelled by a region’s capacity to take advantage of regional funds. Becker et al. (2013) indicate that human capital and institutional quality matter for the effectiveness of Objective 1 funds in terms of their effect on GDP per capita growth and investment. Institutions are confirmed as influencing factor for the success of cohesion policy by other authors as well (e.g., Cappelen et al. 2003; Bachtler et al. 2014; Rodríguez-Pose 2013). Recently, Gagliardi and Percoco (2017) show that European cohesion policy is most effective in rural regions that are located close to cities.

Third, Becker et al. (2012) take the amount of regional funds expenditure spent in a region into account (instead of treatment dummies) and conclude that there is a maximum efficient level of funds and paying more does not increase the effectiveness any more (see also Kyriacou and Roca-Sagalés 2012; Rodríguez-Pose and Garcilazo 2015).

Fourth, Becker et al. (2018) and Bachtrögler (2016) analyse the effects of structural funds (on income growth) in lagging regions over time and in the context of the economic and financial crisis starting in 2007. The latter finds that the effectiveness of cohesion policy in terms of increasing GDP per capita growth appears to decrease in the crisis compared to former periods when controlling for regional structural characteristics. Barone et al. (2016) also take the time dimension into account and show that cohesion policy effects are not persistent over time.

Finally, spatial heterogeneity and spillovers are considered to play a role for cohesion policy effectiveness (Le Gallo et al. 2011; Breidenbach et al. 2016; Maynou et al. 2016). This strand of the literature widely confirms some small positive effects on regional growth or convergence for a number of regions but no general overall effect.

The majority of studies named above are based on data at the regional or local level to study growth or convergence effects. One major drawback of this type of analysis is the potential endogeneity of structural funds. As they are especially granted to lower-income regions, cohesion policy is most likely not exogenous with respect to regional growth. Attempts to overcome this statistical problem include the use of time lags (Rodríguez-Pose and Fratesi 2004), different instruments (Dall’Erba and Le Gallo 2007, 2008), generalised method of moments (GMM) estimators (Breidenbach et al. 2016) or a computable general equilibrium approach (Horridge and Rokicki 2018). Another way to possibly identify causal effects is the use of microeconometric methods with regional or micro data.

Turning to the beneficiaries as unit of observation, De Zwaan and Merlevede (2013) evaluate the effects of Objective 1 and Objective 2 payments in the programming period 2000–2006 in 25 EU member states on productivity and employment growth of firms in treated and non-treated regions. However, they do not use actual recipients of regional funds but compare all manufacturing firms (available in the ORBIS database) located in treated regions with the manufacturing firms in non-treated regions. Additional firm-level analyses are available for sub-national geographical units (Bernini and Pellegrini 2011) and certain types of regional funds (Hartsenko and Sauga 2012).

As we overcome this lack of data with the new database, this paper contributes to the literature by giving the first detailed insights on actual beneficiaries (and projects) of cohesion policy in 25 EU member states between 2007 and 2013. A combined analysis of the projects’ theme, firm-level characteristics of corresponding beneficiaries and the size of the projects’ expenditure may help to explain heterogeneous effects of regional funds allocation found in the literature, and thereby, lead to important policy implications. Moreover, it allows to identify regional funds allocation patterns across regions, e.g., less developed and others as well as urban and rural regions, and countries (see Sects. 4 and 5).

3 A novel dataset

3.1 Content of the dataset

The European Commission’s Directorate-General for Regional and Urban Policy (DG REGIO) provides a collection of links to national or regional websites that make lists of beneficiaries available.Footnote 12 Unfortunately, the degree of detail of these lists’ content as well as their structure vary significantly across countries, regions and even operational programmes.Footnote 13 Moreover, most documents are provided in national languages, different data formats and using non-standardised definitions. Besides collecting and processing all information, we extend the data on beneficiaries by matching it with the ORBIS business database by Bureau van Dijk.

The resulting set of variables can be grouped into three blocks, namely, (1) project information, (2) funding (co-financing) information, and (3) business characteristics of the beneficiary retrieved from ORBIS. First, the project information includes the country and NUTS region in which the project is carried out according to the OP and list of beneficiaries, respectively.Footnote 14 Moreover, it covers the co-financing fund, the objective and corresponding OP to which the project is assigned. As already noted, the dataset includes projects co-funded by the ERDF, the ESF and the CF, under the objectives of Convergence and Regional Competitiveness and Employment. Next, a project name or description, the start and end date as well as the theme of the project are specified. The theme is not reported by all managing authorities, which is why we classify the remaining projects according to available project information using supervised text classification (see Sect. 3.2 for a detailed description). The fifteen themes are: (1) Capacity Building, (2) Culture, Heritage and Tourism, (3) Energy, (4) Environment, (5) Human Capital, (6) Innovation & Research and Technological Development (RTD), (7) IT Services and Infrastructure, (8) Labour Market, (9) Other SME and Business Support, (10) Other Transport, (11) Rail, (12) Road, (13) Social Inclusion, (14) Social Infrastructure, and (15) Urban and Territorial Dimension.

The second group of variables describes the funding structure. It contains the committed co-financing amounts by the EU (C_EU) and the national public funding (C_NAT; including co-funding of the recipient regions) as defined in the beginning of the MFF 2007–2013.Footnote 15 If the amount borne by the firm itself (ineligible cost, IneligFootnote 16) is reported, it is added to the sum of co-funding commitments in order to calculate a total value for project i:

$$\begin{aligned} \text {Total value}_i = \text {C}\_\text {EU}_i + \text {C}\_\text {NAT}_i + \text {Inelig}_i \end{aligned}$$
(1)

In addition to the commitments, project values that were actually paid out by European (Paid_EU) or national (Paid_NAT) public funds are available for a subset of observations. If only the actually paid-out value is declared, the total project value represents the sum of EU (ERDF, ESF or CF) and national payments (including those from regional governments):

$$\begin{aligned} \text {Total value}_i = \text {Paid}\_\text {EU}_i + \text {Paid}\_\text {NAT}_i \end{aligned}$$
(2)

Furthermore, the declaration date refers to the time of reporting of the respective list of beneficiaries. In case it is not noted, we use the date of download.Footnote 17

The third information block relates to the beneficiary. This data is produced by a matching exercise (using the name of the firm and its home country) with the ORBIS business database. We are aware of several shortcomings of this database (see, e.g., Kalemli-Ozcan et al. 2015), however, it represents the most comprehensive and accessible international business database.Footnote 18 The resulting dataset contains the firm’s name in ORBIS and its location, its founding year and information on the industry in which it operates (NACE Rev. 2 industry and four-digits code), the firms’ number of employees and sales volume. Moreover, there is a size classification by ORBIS that is based on at least one of the following variables: the firms’ number of employees, total assets, operating revenue and whether it is listed at the stock exchange.Footnote 19 Furthermore, the database contains information on whether a firm belongs to a corporate group and, if so, on the number of entities in this group.

Table 1 shows all variables and their coverage in the database, i.e., the share of all observations for which it is available. The OP, its location, the funding instrument, the objective as well as the name of the project and the beneficiary is available for each project. Moreover, the dataset provides at least a total project value for each observation. 39% of the observations could be matched with ORBIS.

For analysing whether the distribution of regional funds in (less) developed as well as in urban and rural regions, respectively, shows different patterns, we need to gather further regional characteristics. Less developed regions are defined as NUTS-2 regions which are eligible for funds under the Convergence objective (former Objective 1), i.e., whose income per capita is lower than 75% of the EU-25 average (in 2000–2002) (European Council 2006b). According to the share of the population living in urban areas, DG REGIO classifies European NUTS-3 regions into predominantly urban, predominantly rural and intermediate ones (see Dijkstra and Poelman 2011)]. In this study, we distinguish between predominantly urban and other regions. As one can see in Table 1, due to the low coverage of the more disaggregated NUTS-3 locational information in ORBIS, this variable is known for a third of all observations.

Table 1 List of variables and their coverage in the dataset

3.2 Missing themes

As described in Sect. 2, the projects can be categorised into fifteen themes as defined by DG REGIO. Since not all managing authorities publish these themes in their lists of beneficiaries but most of them provide a project description and a project name, we employ supervised text classification to predict the missing project themes.

In order to train the classification algorithm (classifier) we use theme labels reported by some managing authorities, augmented with manually assigned theme labels. Since some of the project descriptions are given in a language other than English, we first use Google Cloud Translation API to translate the project descriptions and project names into English. Although we cannot quantify how many errors have been introduced during the translation process, we report the overall accuracy of the classification, where some part of error is attributed to translation errors. The records which cannot be translated are left unchanged. Then we remove those observations where the project name and description together have fewer than 30 characters. Thus, we overall use 1,698,191 projects (82.62% of all observations) and 588,713 labeled projects (28.64% of all observations) to train and evaluate the classifier.

Choosing an appropriate classifier is a non-trivial task. We find the recently published fastTextFootnote 20 library (Joulin et al. 2016) to perform well on our dataset in terms of the performance metrics precision, recall and accuracy.Footnote 21 In text classification, it is often desirable not to use the words of a text directly for estimation but to first map the text into a vector space with a much lower dimension. The fastText library which uses a single hidden layer neural network can be applied for text classification and to learn the vector representations of words.

The basic idea of the model is to proceed in two steps: first, the data is mapped into a low dimensional vector space (i.e., each sentence is mapped into a numeric vector) in such a way that similar texts have similar vector representations. Second, multinomial logistic regression is used to predict the labels.

Fig. 1
figure 1

Confusion matrix of the theme prediction. Notes: The numbers in the figure show the results of the classification of projects. The width of the green rectangles of the diagonal represents the recall and the height of the rectangles of the diagonal pictures the precision. Since all rectangles are almost filling their cell, the confusion matrix shows that the classification works well

For the evaluation of the classification we use the tenfold cross validation method (Stone 1974). In k-fold cross validation the data is randomly split into k parts, where \(k-1\) parts are used for training the model and the remaining part is used for model evaluation. To test the model on all available data, this process is typically repeated k times. In Fig. 1 we report the confusion matrix (a special type of contingency table) for one of the ten cross folds, specifically, we choose to report the results for the cross fold with the lowest accuracy. The rows indicate the true known themes and the columns the predicted themes. Let X be the confusion matrix, then precision, recall and accuracy are defined as follows:

$$\begin{aligned} \text {precision}_j = \frac{ X_{jj} }{ \sum _{i} X_{ij} }, \ \ \text {recall}_i = \frac{ X_{ii} }{ \sum _{j} X_{ij} }, \ \ \text {accuracy} = \frac{ \sum _i X_{ii} }{ \sum _{ij} X_{ij} }. \end{aligned}$$
(3)

Therefore, precision, recall and accuracy are equal to one if all the predicted themes are correct (the confusion matrix has only entries in the main diagonal) and zero if all the predicted themes are wrong (the confusion matrix has no entries in the main diagonal). The rectangles in Fig. 1 visualise the row and column percentages of the confusion matrix. The width of the rectangles represents the row percentages and the height of the rectangles the column percentages. Therefore, the width of the rectangles of the diagonal corresponds to the recall, the height of the rectangles of the diagonal to the precision and the volumes of the rectangles of the diagonal to the squared \(\text {G-}\text {measure}_i\)\((\text {G-}\text {measure}_i = \sqrt{\text {precision}_i \times \text {recall}_i})\).

In the confusion matrix, we see that, given the true theme Other Transport, in 105 cases the model is able to predict the true label and in seven cases the model predicts the theme Road. Overall, we obtain an average classification accuracy of 0.94. However, for completeness we note that there are duplicates in the trainings and test set. Accounting for this, the average accuracy without duplicates is 0.90. In order to make the classification results easily reproducible, we assemble the R (R Core Team 2016) package fastTextR that contains an interface to the fastText library and is available at CRAN.Footnote 22

3.3 Comparison of the dataset with official data

Table 2 Comparison with official DG REGIO data on regional funds (in million Euros)

We assess the validity of the assembled data by checking for outliers and plausibility and comparing its dimension with official data on regional funds assignment (equivalent to \(\text {C}\_\text {EU}_i\) in Eq. 1) published by DG REGIO. Table 2 shows the sum of total project values in the database (\(\text {Total value}_i\) in Eq. 1) per country and objective, excluding projects which cannot be assigned to a specific objective. The total values in the database in general do not only consist of committed values by the EU. Multiplying the total project values with the maximum co-financing rate per country and objective (see Sect. 2.1) results in the highest amount the EU should provide. If the official committed value, given in the last column of Table 2, is lower or equal to the maximum EU’s co-funding, we expect the sum of total project values in our database to be plausible. The latter is true for the large majority of member states. For Bulgaria, lists of beneficiaries are available for only less than a third of their operational programmes. The Estonian source is an online database which might not yet contain all projects. For Denmark, the gap may arise due to the total project value summing up paid-out and not committed amounts in the database.

Fig. 2
figure 2

Comparison of distribution of committed regional funds with DG REGIO data: shares (%) of sum of total project values. Notes: Source DG REGIO: “ERDF/ESF/CF Priority theme overview 2007–2013” downloadable at http://ec.europa.eu/regional_policy/en/policy/evaluations/data-for-research/

Next, we compare the distribution of regional funds among co-financing instruments and themes as reported in the database and by DG REGIO. First, 55% of the sum of total project values correspond to projects co-funded by the ERDF (unequivocal classification) while according to data by DG REGIO around the same amount (58%) of structural funds and the Cohesion Fund is transferred via the ERDF. For ESF, the share of the total project values amounts to 22% which is exactly the same one as in official data. As several operational programmes are co-funded by the ERDF and the CF and there is no more detailed information reported in the lists of beneficiaries, we are able to attribute only about 4% of total project values in the database to the Cohesion Fund. For about 20%, we cannot clearly say which one of the funds is the supporting one. Following DG REGIO data, about 20% of structural and Cohesion funds commitments are settled via the Cohesion Fund, i.e., it is likely that the major part of the not uniquely assigned total values in the lists of beneficiaries can be attributed to the Cohesion Fund.

Regarding the distribution of funds and total project values across the fifteen themes (project categories), our database proves to be consistent with official data (Fig. 2). The highest project expenditure is dedicated to Innovation & RTD and Environment according to the dataset and as also reported by DG REGIO.

4 Descriptive statistics

4.1 Stylised facts: the intraregional distribution of regional funds

In cohesion policy regulations, the EU institutions specify priority themes to be targeted by operational programmes in a programming period but do not preset any detailed requirements regarding the size or other characteristics of the projects or beneficiaries to be selected.Footnote 23 Therefore, we expect that there is no uniform strategy for distributing regional funds at the project level across European regions. The selection of projects by managing authorities is likely to depend on the accordance with an OP’s underlying priority themes and its main objective, as well as the assessment of the capability of potential beneficiaries to carry out the project (with a certain volume).

One of the features of the data that has not been analysed in previous literature is the variation in average project values shown in the top right-hand side of Fig. 3. This variable ranges between EUR 15,520 per project in Marche, Italy, and EUR 57,814,068 in Ireland. While it is comparably small in Central European countries, the average project value is relatively high in some North European regions (in the UK, Denmark or Belgium) as well as the member states that joined the EU in 2004 and later. While there are additional reasons for the regional variation in the average total values such as project and beneficiary characteristics, this suggests that regions with fewer projects, on average, have higher total values per project.

The left upper part of Fig. 3 presents the number of projects per region, which varies between 32 in South-East England, UK, and over 85,000 in Puglia, Italy. There are regions with many projects like Puglia, Italy, or North Rhine-Westphalia, Germany, which are typically characterised by a high number of projects related to the themes Labour Market or Human Capital. Both themes are associated with relatively low project amounts.Footnote 24 Contrarily, the regions in which the largest share of funds is allocated to Transportation Infrastructure projects (in Poland and Croatia) are characterised by relatively large project values. Also other regions, like South-East England, UK, or Vienna, Austria, have few but on average large projects. One underlying reason may be that some regions tend to report relatively many intermediate beneficiaries like public institutions on a municipal level. Those bodies apply for the funds, however, they redistribute them further or use them to carry out projects for smaller entities. In those cases, the ultimate beneficiaries are not known publicly. Finally, a part of the remaining variation in the number of projects may be largely explained by poor data availability. In Croatia, not all projects could be assigned to a NUTS-2 region given the report by the managing authority. For Bulgaria, lists of beneficiaries are only available for two out of nine operational programmes.

Fig. 3
figure 3

Regional distribution. Notes: Number of projects (observations) per region: Min.: 32, 1st Qu.: 569, Median: 3220, Mean: 6526, 3rd Qu.: 8326, Max: 85,420. Average value per project (in million Euros): Min.: 0.02, 1st Qu.: 0.22, Median: 0.44, Mean: 1.18, 3rd Qu.: 1.10, Max: 14.10. Sum of regional project values as share of regional GDP (sum of 2007–2013): Min.: 0.01, 1st Qu.: 0.17, Median: 0.35, Mean: 0.91, 3rd Qu.: 1.10, Max: 8.45. Theme group with maximum sum of total project values by region: transportation refers to Road, Rail, Other transport; Social inclusion is part of the human capital group; business services are Innovation & RTD, OtherSMEand business support, Capacity building, IT services and infrastructure. Total project values are defined in Sect. 3. While the lists of beneficiaries published by the managing authorities do not permit to assign projects to regions Helsinki, Finland, as well as Dutch regions, the figure shows observations for Dutch regions that are matched with ORBIS which contains information on the location of firms

The dataset consists of over two million projects granted to approximately one million individual beneficiaries. That means, on average, every beneficiary receives co-financing for two projects. Only 17% of beneficiaries carry out more than one project, only 3% have more than five and only 1% more than ten. The beneficiary with the most co-financed projects (more than 18,000) is the Spanish ICEX Espana Exportación e Inversiones, a governmental institution that promotes (foreign) investments in Spain. The second most (more than 11,000) are carried out in the city of Florence, Italy, the third most (more than 10,000) by the governmental training and orientation section of the region of Tuscany, Italy.

Besides project and beneficiary characteristics, one stylised fact arising from the data confirms the focus of EU cohesion policy of providing most financial support to less developed regions: one main objective of the EU’s cohesion policy is to support the catching-up of those regions, i.e., NUTS-2 regions with a GDP per capita below 75% of the EU-average, in economic and social terms. As one can see in Fig. 3 (left lower map), in Southern Italy, Portugal and Spain as well as Poland, Romania and Bulgaria, the Baltics or Slovenia, the total values of co-financed projects relative to the respective regional income tend to be larger than in other regions. E.g., for the Lithuania and Poland, project values account for up to eight percent of regional GDP.Footnote 25

Contrarily, in the so-called “Blue Banana”Footnote 26 the sum of total project values lies below 0.1% of regional GDP. In those regions, projects with the highest expenditure tend to be carried out by small and medium-sized firms in the education sector, in public administration as well as in professional, scientific and technical activities. Those projects are mostly aimed at Regional Competitiveness and Employment and the themes Innovation & RTD as well as Human Capital. The sum of project values in Scandinavian, French, Northern Italian, North-Eastern Spanish regions as well as Scotland and Northern Ireland, UK, ranges between 0.1% and 0.5% of regional GDP. Project values per capita follow a similar overall distribution. The amount in most regions falls below EUR 500. The region with the highest value per capita is Bratislava, Slovakia, with EUR 12,575 per inhabitant over the course of seven years.

Table 3 presents summary statistics for the average project values in less developed versus developed regions, and confirms this finding. Projects carried out by beneficiaries in less developed regions are larger on average. A t-test on the significance of the mean differences shows that they are statistically significant at the 1%-level. Most of the projects there correspond to the Road, Environment but also Innovation & RTD themes and are carried out by very large firms that operate in public administration and the manufacturing industry. By contrast, large shares of regional funds in higher-income regions go to Innovation & RTD, Environment and Labour market projects, while less money is dedicated to transportation infrastructure.

Table 3 Summary statistics in EUR 1000

Gagliardi and Percoco (2017) indicate that the effectiveness of EU cohesion policy varies across urban and rural (NUTS-3) regions, thus, we are also interested in potential differences in the usage of regional funds in such areas. The summary statistics of project values shown in Table 3 reveal statistically significantly higher average single project values in (predominantly) urban NUTS-3 regions. The following section provides more details on the differences in regional funds distribution within urban versus within rural regions.

4.2 The intraregional distribution of funds with respect to project and firm characteristics

Table 4 Summary statistics by type of fund, theme and beneficiary size

Type of fund, objectives and themes

The allocation of regional funds to OPs co-financed by different types of funds and under different main objectives is closely related to previous findings regarding the development status of regions and the choice of thematic priorities.Footnote 27 The total values of projects subsidised by the ERDF sum to roughly EUR 270 billion compared to EUR 107 billion from the ESF and EUR 18 billion from the CF. In addition, projects accounting for EUR 100 billion are most likely funded by the CF. Typically, the Cohesion Fund (CF), for which only regions in countries with a gross national income below 90% of the EU average are eligible, is targeted at co-financing (large) infrastructure projects and the accessibility of lagging-behind regions. A major part of the European Social Fund (ESF) is allocated to projects with relatively smaller project values fostering, e.g., human capital, while the ERDF co-finances a broad spectrum of project types. These funding priorities are mirrored in the volume of corresponding projects: the median CF project value amounts to about EUR 100,000 compared to EUR 37,000 for the ERDF and EUR 3200 for the ESF.

Co-financing by different funds additionally varies across types of regions (Table 4). By construction, the CF is more important in less developed regions, while the ESF accounts for only around 10% of project volumes in those poorest regions. It is interesting that the CF plays a larger role for co-funded project volumes in (predominantly) urban regions as compared to rural ones. According to the data, this may be related to the fact that there are more relatively large Road projects funded in areas with higher population density.

The overall objective of the OP each project is part of, is closely related to the development status of the regions, as, along with several exemptions, only less developed regions are eligible for Convergence funds. While around half of the (number of) projects in our sample is aimed at Convergence, their value is three times as large as that of Regional Competitiveness and Employment projects. As expected, operational programmes in less developed regions have a focus on Convergence, while OPs in richer regions receive more to boost Regional Competitiveness and Employment.

For the appropriate targeting of regional funds, the European Commission defines fifteen priority themes (see Sect. 3.2) to classify projects. In total, the largest sums are committed to projects in categories Innovation & RTD (EUR 70 billion), Environment (EUR 65 billion) and Other SME and Business Support (EUR 46 billion), followed by Labour Market, Road and Human Capital projects. The number of observations per theme ranges from 1058 Rail projects to over 600,000 projects related to the Labour Market. There is also considerable variation of individual project values across themes. Projects related to Transportation (EUR 1–3 million), Urban and Territorial Dimension as well as Culture Heritage and Tourism (EUR 200,000 each) are the largest. Human Capital (EUR 6000), Labour Market (EUR 2200) and Energy (EUR 1250) projects are the smallest.

Additionally, the lower right part of Fig. 3 shows the theme at which the maximum sum of project values in a region is targeted. For better readability the map shows five groups of themes instead of all fifteen.Footnote 28 In total, the largest sums in Poland and Croatia are related to Transportation projects. Energy and Environment projects are most important in Latvia, parts of the Czech Republic, two French regions, three Spanish regions and a Polish region. Contributions to Culture, Tourism and Social Infrastructure are most pronounced in some Czech regions and East Slovakia. Scotland, South Sweden, almost all of Italy and parts of Germany and France have their largest project sums related to Human Capital, the Labour Market and Social Inclusion. The largest project sums in the rest of Europe are associated with the fifth category that includes SME and Business Services, IT services and infrastructure, Capacity Building, Innovation & RTD as well as Urban and Territorial Dimension.

Comparing all less developed to all higher-income regions, differences in the average distribution of total project values across priority themes become apparent (see Table 4). For example, Road accounts for 17% of the sum of project values in less developed regions and only for 1% in higher-income regions. Moreover, the share targeted at Labour Market projects in richer regions exceeds that in poorer ones by 11 percentage points.

Industrial structure and firm size

The matched data from ORBIS allows us to go into more detail with respect to characteristics of the beneficiaries. Figure 4 shows the distribution of funds across NACE Rev. 2 industries. The left-hand side illustrates total sums while the right-hand side shows the distribution of individual project values. The matching process and the coverage in ORBIS enable us to assign an industry to one third of observations, i.e., almost 700,000 projects.

Overall, Fig. 4 (left part) shows that projects with the highest sum of total values are carried out by firms or institutions operating in public administration, defence and social security (EUR 80 billion), transportation and manufacturing (EUR 30 billion each) and education (EUR 20 billion). The right-hand side of Fig. 4 shows that the single project volumes also vary across beneficiaries in different industries. While the median value of projects carried out by public beneficiaries (NACE Rev. 2 industry O) lies clearly below EUR 100,000, firms operating in the energy and water sector are responsible for the projects with highest median values (over EUR 200,000). Most of the latter projects have to do with electricity production, air conditioning, water supply as well as waste collection and treatment.

Fig. 4
figure 4

Sum of total project values and single project values by NACE Rev. 2 industry. Notes: Left: sum of total project values by NACE Rev. 2 industry. Right: distribution of values per project; dark horizontal line marks the median. A: agriculture, forestry, fishing; B: mining, quarrying; C: manufacturing; D: energy; E: water supply, sewerage, waste management; F: construction; G: wholesale; H: transportation; I: accommodation and food services; J: information and communication; K: financial services; L: real estate services; M: professional, scientific and technical activities; N: administration and support activities; O: public administration, defence, social security; P: education; Q: human health and social work; R: arts, entertainment, recreation; S: other services, T: household services, U: activities of extraterritorial organisations. The figure represents 33% of observations and 51% of the sum of total values

Beneficiary firms or institutions also differ in their size. Table 4 shows the distribution of regional funds across four categories of firm size defined by ORBIS.Footnote 29 In terms of overall project sums, small firms make up the largest share (EUR 120 billion), followed by very large firms (EUR 80 billion) and medium-sized as well as large firms (EUR 50 billion each). Note that there are fifteen times more small beneficiaries of structural funds than very large ones. They are especially strongly represented in projects co-financed by the ESF. In total, 85% of recipients in the database are small or medium-sized (SME) companies, which reflects the priorities set out in the Community’s strategic guidelines on cohesion. Moreover, this may indicate that the projects, which smaller and medium-sized firms submitted in order to apply for co-funding and which were selected, are smaller in terms of single project values than the projects carried out by large firms.

Interestingly, Table 4 shows that while managing authorities in regions with an income above 75% of the EU average (developed ones) select mostly projects carried out by small firms, in less developed regions the volumes of projects carried out by very large companies is higher. Also the share of project values administered by SMEs in less developed regions falls behind the share in higher-income regions by 17 percentage points. There is a similar heterogeneity when comparing projects in rural and urban regions, and it is the share of the value of projects carried out by very large firms in urban regions which exceeds the one in rural areas by more than 30 percentage points. This pattern may be driven by the fact that cities implement relatively many large projects.

5 Focus on the single beneficiary

Given the evidence on substantial variation in the total value of single projects, we investigate the potential determinants of this variable for the projects in our database. In particular we hypothesise that the project size is closely related to the following characteristics: (1) the funding instrument (ERDF, ESF or CF), as certain types of funds typically co-finance different kinds of projects, (2) the objective of the corresponding OP (Convergence or Regional Competitiveness and Employment), which mirrors the development status of the region, (3) the managing authorities’ focus on thematic priorities (themes), e.g., a project which consists of building road infrastructure will have a higher value than a specific employee’s training, and (4) beneficiaries’ characteristics like size or industry. Regarding firm size, our hypothesis is that, on average, larger firms are capable of carrying out larger projects than smaller firms.

To this end, we estimate the following equation:

$$\begin{aligned} ln(TV_i) = \alpha + \psi R_i + \gamma F_i + \delta O_i + \rho T_i + \phi I_i + \omega S_i + \varepsilon \end{aligned}$$
(4)

where \(ln(TV_i)\) represents the logarithm of the total project value of project i, \(R_i\) the region in which project i is located (a sort of regional fixed effect), \(F_i\) the type of fund which supports project i, \(O_i\) the objective of the funding for project i, \(T_i\) the theme under which project i is supported, \(I_i\) the industry of the beneficiary and \(S_i\) the size of the beneficiary of project i. The variables \(R_i\), \(F_i\), \(O_i\), \(T_i\), \(I_i\) and \(S_i\) are factor variables and the category with the median coefficient is always the one that is excluded respectively (in the following Tables). Thus, the resulting coefficients should be interpreted relative to the project with the coefficient being in the middle of the (conditional) distribution of this variable, which is always shown in the results tables with a coefficient of zero. Significance levels are not shown, as they depend entirely on the chosen reference category and have no general meaning in such a context.Footnote 30 We do not include projects with total values which are zero or negative and end up with 482,040 observations for which Eq. (4) is estimated.Footnote 31

In this way, we are able to shed light on conditional differences between the volume of projects supported by different funds with different objectives and themes as well as with beneficiaries in different industries and of different size. Moreover, we are able to analyse whether unexplained residual variation in project volumes (in countries and regions) exists. We think that this could entail interesting policy implications as Locatelli et al. (2017) show for public procurement processes that funding larger projects prepares the ground for more corruption than it would be the case when more and smaller projects are tendered. Other analyses show that firm subsidies are not equally effective when granted to firms of different size (Criscuolo et al. 2012) and to firms located in different regions (Bachtrögler et al. 2018).

In order to take the heterogeneity of funding principles into account, we analyse total values of single projects not only for the complete sample but additionally run Eq. (4) separately for, first, projects co-funded by (1) the ERDF and the ESF (structural funds) and (2) the CF. Second, we split the sample into projects carried out by (1) public and (2) private beneficiaries.Footnote 32 Finally, in a robustness check in which we model firm size by the number of employees, volume of sales and add firm age as a further control variable, we run regressions using sub-samples of projects carried out in (1) less developed and (2) developed regions, as well as, (3) urban and (4) rural regions.

5.1 Type of fund, objective and themes

Controlling for the other variables included in Eq. (4), the projects with the smallest total value can be identified as co-funded by the ESF (left panel of Table 5). They are more than three-quarters smaller than those co-funded by the ERDF and (on average) by the CF (if the projects that cannot be assigned clearly to the ERDF or the CF are counted as CF co-funded ones). This corresponds to the goals of the various funds, as the ESF is mostly funding smaller projects related to the inclusion, training and adaptability of workers in the labour market and employment, while the ERDF and CF funds are aimed at improving the economic structure and fundamentals of regions (European Commission 2017). Splitting the sample by broad industries reveals that CF projects carried out by public entities are on average twice as large in terms of their project value than ESF projects. Likewise, relative to ESF project values, CF projects are (on average) largest in the case of beneficiaries in the private sector.

Table 5 Regression coefficients with respect to type of fund and objective

Furthermore, controlling for everything else, Table 5 shows that projects corresponding to OPs with the Convergence objective are larger than projects under the Regional Competitiveness and Employment objective (right panel of Table 5).

Turning to another aspect, large differences in conditional project values arise also with respect to the projects’ themes. 42% of the largest projects in the database (with a volume above EUR 50 million) are associated with the three transport-related themes (Rail, Road, Other Transport). As also found in the unconditional analysis in Sect. 4, it might not be surprising that the projects with themes related to Labour Market, Human Capital and SME are among the projects with the lowest conditional project values, but the Energy theme would probably be expected to be among the one with the highest project values (see Table 6). Regarding the latter theme, there is large variation across subsets and it turns out that especially Energy projects co-financed by the CF and carried out by private-sector (but not public-sector) beneficiaries are rather small.

Table 6 Regression coefficients with respect to theme

In general, the largest projects co-financed across all subsets are targeted at Road and Other Transportation infrastructure (and fostering the Urban and Territorial Dimension in the case of the CF). Interestingly, relatively bigger investment projects in Rail systems appear to be co-financed by structural funds (and not the CF) and implemented by private entities (not public ones). The same is true for projects with a relatively high conditional value in the fields of Social Infrastructure, Environment and Innovation & RTD.Footnote 33

Moreover, conditional differences in project values across themes do not necessarily correspond to the differences across industries. An interesting observation is that the lowest project values can be observed for the Energy theme, but the highest conditional project values for the energy industry (Sect. 5.2). The maximum project value within the Energy theme as well as the energy industry is found for the same project which amounts to EUR 906 million. However, the Energy theme includes 7715 projects with a project value of EUR 1250 each and 64% of the projects have a value below EUR 5000. Those projects below EUR 5000 are not conducted by firms in the energy industry but are observed mainly in “Real Estate activities” (59%), “Manufacturing” (11%) and “Wholesale/retail; repair vehicles” (8%).

5.2 Industry and beneficiaries’ characteristics

The highest conditional project values are attributed to beneficiaries in the Energy and Water industries (Table 7). The largest three project values of firms in the Energy industry are EUR 906 million, EUR 103 million and EUR 100 million and only 37% of the projects have a total value below EUR 100,000. Within the Wholesale industry, on the other hand, only 12% of the projects have total values larger than EUR 100,000.

Table 7 Regression coefficients with respect to NACE-industry
Table 8 Regression coefficients with respect to the beneficiaries’ size

As expected, firm size also plays a role for the level of the total project value (Table 8). Overall, we see that the larger the beneficiary, the larger is the total value. However, the difference in project volumes is surprisingly low, as the volume of a project of a very large company is not even twice the one of a medium-sized company but the average revenue of the very large companies in our sample (EUR 653 million) is approximately 130 times the average revenue of the medium-sized companies (EUR 5 million). The pattern is similar when considering the samples split according to the co-financing type of fund and the sector the beneficiary operates in.

5.3 Residual variation in single project values

Comparing the total value of single projects across NUTS-2 regions, while controlling for the type of fund, objective, theme, industry and size of the beneficiary, reveals differences of more than plus and minus 400%. The twenty top and bottom regions are presented in Table 9. The lowest conditional amounts per project can be found in Austria, Spain, Estonia, Germany and Belgium. The highest conditional values per project can be observed in the UK, the Netherlands, Finland, Malta and Luxembourg.Footnote 34 Beneficiaries which are similar in size, are in the same industry, receive money from the same fund and within the same theme, but are located in Lower Austria (NUTS-2 region AT11) on average have about 8.5 times lower project values than beneficiaries located in the East of England (NUTS-1 region UKH). Apparently, in Lower Austria, 90% of projects are smaller than EUR 10,000 and 4688 projects are smaller than EUR 1000, whereas in the East of England the smallest project is already EUR 133,168 in size.

Table 9 Regression coefficients with respect to NUTS-2 region
Table 10 NUTS-2 regression coefficients for subsamples

Table 10 shows regional effects based on estimating Eq. (4) for subsets related to project and beneficiary characteristics. First, the upper panel shows the ten regions in which, conditional on the other control variables, the biggest and smallest projects co-funded by structural funds on the one hand, and the Cohesion Fund on the other hand, were carried out. While the results for the first case broadly reflect the regional effects identified for the complete sample, the Cohesion Fund sample only includes a limited set of countries. From the regions located in those lagging-behind countries, projects in Estonia, the Czech Republic as well as some parts of Spain are smallest.

The lower panel of Table 10 indicates that there may be differences in selecting private and public beneficiaries across countries. While Austrian regions and Estonia form part of the Bottom 10-group only in the case of private firms, the smallest conditional project values for beneficiaries in the public sector are present in Spanish, one Italian and two German regions. Interestingly, next to British and Dutch regions, projects by public institutions are relatively big in size in Helsinki-Uusima (FI1B), Finland, and those of firms in the private sector in Malta.

The estimation of Eq. (4) controls for regional fixed effects, thus, any regional influence on the remaining differences between total project values of single projects should have been removed. Figure 5 shows the average residuals of a regression similar to Eq. (4), but excluding the regional control variables (\(R_i\)). Hence, the figure shows the (average) unexplained part of the project value when controlling for fund type, objective, theme, industry and firm size split by country. It confirms that the projects, which are similar in many dimensions but the region, with the lowest total values can be found in Austria, Estonia and Spain and the ones with the highest values in Luxembourg, the Netherlands, Denmark and the UK.

Due to the findings in Sect. 4 and as Luxembourg, the Netherlands, the UK and Denmark are relatively densely populated, one could suggest that population density determines that finding. However, the relationship is not that clear. The residual variation for Finland, which is the least densely populated country in Europe, is also relatively high, while Germany, which is relatively densely populated, shows a downward bias with regard to conditional average project volumes. Referring to Locatelli et al. (2017), the countries with the largest residuals are characterised by relatively low levels of corruption and good institutions at the regional level (Charron et al. 2015). Therefore, we suspect that the variation may be determined by historically grown funding strategies and further national institutional settings. Verifying this assumption remains for future research.

Fig. 5
figure 5

Comparison of residuals of regression of Eq. (4) by country

6 Conclusion

The novel database introduced in this study contains detailed information on over two million projects co-financed by the European Regional Development Fund, the European Social Fund and the Cohesion Fund in 25 EU member states in the programming period 2007–2013. Additional to project information such as the total value of each project and a project category (theme), the beneficiaries are matched with the ORBIS business database.

This study shows that there are different patterns in the intraregional funds distribution across and within countries, both in terms of project and beneficiary characteristics. Moreover, the analysis points to the fact that managing authorities select different kinds of projects, with significantly different single project volumes, in urban and rural regions. The same turns out to be true for less developed and other NUTS-2 regions, which seems to be linked to the priorities and regulations of certain types of funds, e.g., the CF, and main objectives.

In addition, we find that most regional funds are dedicated to transportation infrastructure in less developed regions, whereas in higher-income regions a larger focus is put on fostering Labour market and Social inclusion. In all regions, Innovation and RTD as well as Environment projects form a large share of the sum of project values. Regarding beneficiaries’ characteristics, the largest share—on average around 40%—of the (unconditional) sum is allocated to projects carried out by small firms, whereas this does not hold true for less developed and urban regions.

In the econometric analysis we test for the importance of certain project and beneficiaries’ characteristics in determining a single project’s size. The largest single projects in terms of their total value are co-funded by the ERDF and the CF (as compared to the ESF), and under the Convergence objective (as compared to Regional Competitiveness and Employment). In line with the priorities of the different funding instruments, the largest projects are attributed to transportation infrastructure projects (Road, Rail and Other Transport). Regarding the beneficiary, larger firms (with higher revenues and more employees) carry out projects with higher total value, however, the average single project value of (very) large firms is only about twice as high as that of small entities. Having controlled for all characteristics, some variation in single project values remains unexplained, and this residual variation appears to differ across countries. From that we draw the conclusion that national institutional settings or traditional funding procedures may play a role.

We contribute to the academic and political debate by making a dimension of EU cohesion policy implementation visible that has not gained much attention until now. The possibility to compare individual projects’ and beneficiaries’ characteristics across heterogeneous regions and countries opens a new strand of research questions, e.g., whether projects in a region are carried out by firms located in the same region and by what this is determined. Moreover, the data could feed in dynamic stochastic general equilibrium or other forecasting models that simulate potential policy outcomes under different scenarios. Finally, the analysis of this dataset may entail interesting conclusions on more or less effective ways of distributing regional funds within European regions. In this respect, we also think that future research could explore how the considerable residual variation in the size of projects found in this paper is related to institutional settings in the respective countries as well as to the efficiency and effectiveness of EU cohesion policy.