ABSTRACT
This research presents several machine learning algorithms and prediction models to anticipate the European Structural and Investment Funds (ESIF) application in different European Union (EU) countries. These analyses start with data training from 2014 to 2020 ESIF, to test and predict the application of the future ESI Funds for 2021–2027. We deliver an analysis focused on the priorities of each fund, highlighting the differences between the programs in different time periods. In the framework of the European Regional Development Fund (ERDF), we will specifically address the assessment of the following themes: support innovation of small and medium-sized businesses, to greener, low-carbon, and resilient projects with enhanced mobility. In what concerns the European Social Fund (ESF), we will evaluate projects that promote and increase the EU’s employment, social, education, and skills policies, including structural reforms in these areas. Regarding the cohesion funds (CF), we will be targeting the improvements between the two ESIFs, looking at projects in the field of environment and trans-European networks in the area of transport infrastructure (TEN-T). In summary, we will be looking at the future of ESIF through the glasses of artificial intelligence.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The studies available on EU funds have mainly been focused on their implementation by country (Nigohosyan & Vutsova, 2017), by region (Iribas & Pavia, 2010), or even by program (Andrade, 2016). The previous studies focus on the different levels of investments made or on their percentage use in local projects (De Iuliis, 2016). Despite the prolific methods available to perform the evaluation of ESIF, predictive analytics has not yet been applied in this context.
In this framework, this work aims at proposing a broader and higher perspective regarding the use of predictive analysis. Towards this end, the analysis performed herein is done by trying to predict the total investment that will occur during the implementation of the next programming period of 2021–2027, according to the data available for the previous programming period of 2014–2020. Therefore, the purpose is to predict the total amount that will be devoted to each fund based on the difference between the total investment funds spent and the total amount planned, for each fund, i.e., ERDF, ESF, Youth Employment Initiative (YEI), CF, European Agricultural Fund for Rural Development (EAFRD), and European Maritime and Fisheries Fund (EMFF).
An artificial intelligence tool has been used, specifically the Rapidminer, to test the best prediction model for each ESIF. In this way, it will be possible to anticipate the amount that should be assigned, thus decreasing the difference between the amount spent and planned.
This paper is structured as follows. Section 2 delivers the literature review on the subject. Section 3 briefly goes through the main underpinning assumptions of the methodology employed. Section 4 provides the description of the implementation of the models. The discussion of results is given in Sect. 5. Finally, conclusions are drawn, highlighting the main limitations found and future work developments.
2 Literature Review
Several authors underline the relevance and contribution of big data analytics and the use of machine learning in predictive analytics, reinforcing the role and contribution of decision-making based on business environments (Lismont et al., 2017; Meyer et al., 2019; Psarras, et al., 2020). Quite important also is the methodology and all the processes to achieve the results. Ge et al. (2017), describe all the required processes for extracting the dataset from the database, namely the analysis of the metadata for data preparation and exploration. Only after the dataset identification and preparation, the regression model is selected to perform the analysis. Several hypotheses could be extracted from the application of the different regression methods. In this context, Linear Regression, Support Vector Regression (SVR), Artificial Neural Network (ANN), k-Nearest, Neighbors (k-NN), and M5 model tree are some of the regression models used according to the main goal, which could be employed to model the relationship between the dependent and independent variables. Each model has its own merits and demerits. If the main concern is to maintain the error framed to a short interval, the SVR should be used (Hotzlast, 2022). Although the CRISP-DM is not new, it is a model quite tested and serves as the main structure for the data science process.
3 Methodology
The predictive analytics process described uses the CRISP-DM process model is the most used and common data science process. The step-by-step analysis of the CRISP-DM focuses its attention on the different predictive models. In many cases, it will be the user, not the data scientist, who will carry out the deployment steps. He/she will test the model application for his/her business values (i.e., model hyperparameters). This means that the model should be generic enough for the adaptation to different business variables.
3.1 Rapidminer Automation Procedure
To test the most used predictive methods, the Rapidminer was used as a data science platform that allows data engineering, model building, and machine learning operation, among others. It allows the application of the CRISP-DM model. Therefore, it was used to do the prediction analyses and simulations for each of the ESIFs. This tool has a two-phase automation procedure: the TurboPrep for data preparation and the AutoModel, to test and simulate the different prediction models. For the first 3 phases of the CRISP-DM model, business understanding, data understanding, and data preparation were applied for each of the EU funds, i.e., ERDF, CF, ESF, EAFRD, and EMFF.
Then, after the dataset preparation, the prediction models are simulated and tested using the AutoModel, in order to fulfill all the process modeling phases.
4 Implementation Models
4.1 Data Preparation of EU Funds Using Rapidminer TurboPrep
The first step was to collect each dataset for each fund. The data source was directly obtained from the European Commission data center. The preparation was initiated by identifying every attribute and its meaning in the dataset. This corresponds to step two of the CRISP-DM model, as given below:
Ms—country initials.
Programme Title—program name.
TO_short—main program thematic objective.
National_Amount_planned—investment planned for each country.
Total_Amount_planned—investment planned with all the contributions.
Year—year id of the fund.
EU_co_financing—percentage of European financing over total investment.
Total_eligible_cost_decided_(selected)—total investment costs of all programs for each country.
Total_eligible_spending—after eligible costs, what was spent.
Reference Date—when the investment was available.
Some attributes were eliminated because of their redundancy, like the country’s name and the program’s acronym. After that, the metadata analysis table was obtained—see Fig. 1.
From Fig. 2 it can be seen that a new attribute was created for this search, calculated by the difference between the total amount spent and the total amount planned. This attribute “Implemented—Planned 20,142,020” was defined for all the countries during all years between 2014 and 2020.
A summary of the distribution of the new variable can be shown in Fig. 3 and details on the statistics of central tendency are obtainable as well.
4.2 Data Modeling of EU Funds Using RapidMiner AutoModel
The next phase of the CRISP-DM involves modeling and simulation for the ERDF. In this framework, the Auto model selects and executes all the predictive models available in its library. The dataset of the ERDF with the new attribute was tested with all possible predictive models as shown in Figs. 4, 5 and 6.
The same procedure was done for the other five EU funds applying all the steps of the CRISP-DM model with the TurboPrep for data preparation and exploitation, followed by the modeling analysis on the Automodel.
4.3 Simulation
The Automodel enables choosing the best prediction model, with the best results and lower relative error, by simulating, for each model, the best value for the implementation, which is the difference between the total amount spent and the total amount planned. For the ERDF, the following values were achieved to consider the dependent variables according to Figs. 7, 8 and 9.
All the phases of the CRISP-DM model were repeated for the other five EU funds. In the case of the ESF, the YEI, and CF the simulator presented the best prediction model, which was the Generalized Linear Model, while for the remaining funds the best prediction model was the decision tree. From the values previously achieved it is possible to build a prediction table with all the funds, the best prediction algorithm, and their results—see Table 1.
The values presented above (Table 1), validate that there are some funds whose execution differs quite significantly from the amount planned (i.e., present high negative values). Special attention should be given particularly to the execution of EMFF. Although some priorities for the 2021–2027 ESIF are different and have been reduced from eleven to five objectives, this research proposes a method to reduce the total amounts planned.
5 Conclusions and Further Research
By using the CRISP-DM and an artificial intelligence tool like Rapidminer it is possible to achieve some predictions for the next ESIF 2021–2027. Further and deeper research should be taken for each country following the new priorities and using machine learning algorithms for better predictions. A quite interesting study could be developed, in the future with the introduction of new variables, accounting for the impact of the war in Ukraine and the inflation due to the lack of materials, mainly electronic components. Nowadays, these two facts are a concern and will affect the first five priorities and project goals of ESIF. Future work should also involve a similar analysis per each thematic objective.
References
Andrade, P. (2016). Financiación de proyectos culturales con fondos de la Política de Cohesión Europea: Análisis y experiencias en Andalucía 2007–2013.
De Iuliis, C. (2016). The European territorial cooperation. Analysis of results in the seven-year programming period 2007–2013 and the next new programming strategies. Juridical Current, 19(4), 85–92. 8p.
Ge, Z., Song, Z., Ding, S. X., & Huang, B. (2017). Data mining and analytics in the process industry: The role of machine learning. IEEE Access 520590-20616 8051033, https://doi.org/10.1109/access.2017.2756872
Hotzlast, N. (2022). What is CRISP DM Life Cycle. Retrieved June 29, 2022, from https://www.datascience-pm.com/crisp-dm-2/
Iribas, B., & Pavia, J. (2010). Classifying regions for European development funding. European Urban and Regional Studies, EURO-COMMENTARY.
Lismont, J., Vanthienen, J., Baesens, B., & Lemahieu, W. (2017). Defining analytics maturity indicators: A survey approach. International Journal of Information Management, 37(3). S0268401216305655, https://doi.org/10.1016/j.ijinfomgt.2016.12.003.
Meyer, H., Reudenbach, C., Wöllauer, S., & Nauss, T. (2019). Importance of spatial predictor variable selection in machine learning applications–Moving from data reproduction to spatial prediction. Ecological Modelling, 411108815-S0304380019303230 108815. https://doi.org/10.1016/j.ecolmodel.2019.108815
Nigohosyan, D., & Vutsova, A. (2017). The 2014–2020 European Regional Development Fund Indicators: The Incomplete Evolution. Springer.
Psarras, A., et al. (2020). Applying the Balanced Scorecard and Predictive Analytics in the Administration of a European Funding Program. Administrative Science, MDPI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Santos, V. (2023). European Structural and Investment Funds 2021–2027: Prediction Analysis Based on Machine Learning Models. In: Henriques, C., Viseu, C. (eds) EU Cohesion Policy Implementation - Evaluation Challenges and Opportunities. EvEUCoP 2022. Springer Proceedings in Political Science and International Relations. Springer, Cham. https://doi.org/10.1007/978-3-031-18161-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-18161-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18160-3
Online ISBN: 978-3-031-18161-0
eBook Packages: Political Science and International StudiesPolitical Science and International Studies (R0)