1 Introduction

The United Nations Climate Conference of 2015 put forward the Paris Climate Agreement which came into force in 2016. Under this, 194 countries committed to reducing greenhouse gas emissions to below two degrees Celsius (United Nations Climate Action 2023). The European Union (EU) ratified the Paris Climate Agreement in 2016. The importance of the need to cut CO2 emissions has been growing among nations in particular Europe, which has been a forerunner, committing to cut greenhouse gas emissions to 55% and make the EU climate neutral by 2050. In order to achieve these goals, the EU has put forward green finance standards, strengthened the EU emission trading system and developed climate friendly innovation among other measures (European Council 2023).

The importance of well-developed institutions for achieving climate targets has been highlighted by international bodies such as the United Nations Climate Change Progamme (2020) and representatives, for example, Janine Felson, Deputy Permanent Representative of the Permanent Mission of Belize to the United Nations, Brice Böhmer of Transparency International (United Nations Climate Change Programme 2020), academics (Acemoglu et al. 2003; Rodrik et al. 2004), among others. All these individuals emphasize that in the absence of strong institutions, climate goals would not be realized, nor would they be sustainable. This brings us to the question of how institutions affect climate change.

Therefore, employing a Stochastic Impacts by Regression on Population and Affluence and Technology (STIRPAT) model, we extend the environmental Kuznets curve (EKC) hypothesis using institutional and other variables, in an attempt to understand the institutions and carbon emission nexus. The STIRPAT model was put forward to overcome the shortcomings of the IPAT model, which was introduced in an attempt to understand the influence of changes in population (P), affluence (A) and technology (T) on environmental impacts (I) (Ehrlich and Holdren 1971, Dietz and Rosa 1994). The STIRPAT model has the advantage of identifying nonlinear forces that drive the environment and can also be extended to incorporate other variables. We apply the ridge and lasso (least absolute shrinkage selection operator) regression methods based on machine learning, to estimate the extended STIRPAT model as these methods provide more effective results compared to standard regression methods. Ridge and lasso regressions are two widely used shrinkage methods (Hoerl 1962; Tibshirani 1996). Both these methods help to address the issue of multicollinearity and deal with overfitting and increase the accuracy of predictions in the statistical model. In summary, the present study aims to contribute to the literature by: (1) employing an extended STIRPAT model incorporating the environmental Kuznets curve (EKC + STIPRPAT) with panel fixed effects ridge and within-effects ridge (Shehata 2013a, b) and the fixed effects lasso estimation approaches (Ahrens et al. 2020). Most ridge studies employ time series analysis. The majority of studies which investigate the relation between institutions and environmental pollution examine the role of institutions within an environmental Kuznets curve (EKC) framework (Leitao 2010, Abid 2017, Sinha et al. 2019, Hasan et al. 2020), or alternatively, they investigate the role of institutions among a number of other variables, or the interaction of institutions with other variables (Lau et al. 2014; Ibrahim and Law 2016; Lægreid and Povitkina 2018; Arminen and Menegaki 2019). There is a paucity of studies and only an older literature (Halkos and Tzeremes 2013; Goel et al. 2013), which examines the effect of institutions on environmental pollution. The discussion on climate change has become increasingly important in recent years, in particular, after the Paris Climate Agreement, necessitating a study on the relation between institutions and climate change. (2) We carry out the study in the context of the EU. As aforementioned, the EU has been progressive in taking measures to deal with climate change. Given the heterogeneity in this group of countries, we divide the 22 EU countries into two groups (Panels A and B), according to their institutional structures (those above and below the mean in the sample of countries examined). This division is important to investigate whether there is a limited impact on the environment in countries with weaker institutions.

There are several channels through which strong institutions can lead to lower carbon emissions: One is through the distribution of resources, better budgeting and investment in climate-related goals (Acemoglu et al. 2004; World Bank 2021). Countries with strong institutions will channel resources to fight climate change. Two, nations with strong institutions will hold their governments accountable in meeting climate goals (World Bank 2021). Three, countries with stronger institutions will have better coordinated and consistent environmental policies that are aligned with climate targets. Four, stronger institutions will ensure greater transparency in project approvals and the channeling of funds. Five, countries with strong institutions will have oversight committees and legal frameworks to promote climate action (United Nations Climate Change Programme 2020).

The findings of our study indicate that the factors influencing CO2 emissions differ between the two groups of EU countries, that is, those with stronger and weaker institutions. Our results suggest that variables such as primary energy consumption, population density and urbanization are more important determinants of CO2 emissions in the EU compared to the institutional variables.

The rest of this paper is structured as follows. Section 2 discusses the literature. Section 3 presents the data, model and methodology. Section 4 evaluates the empirical results, and Sect. 5 discusses the empirical results and concludes.

2 The literature

As stated above, many studies examine the institutions and environmental relation within a Kuznets curve framework. Tamazian and Rao (2010) employing the generalized method of moments (GMM) methodology in a study of the relation between economic development and environmental quality incorporating the financial sector and institutional quality for 24 transition nations find support for the EKC hypothesis. They note that both institutions and financial sector development are important for improving environmental quality. Leitão (2010) examines the effects of corruption on income level at the turning point of the relation between sulfur emissions and income, employing panel data and instrumental variable methods. The author notes that the degree of corruption in a country is positively related to the threshold level of income beyond which emissions fall. Higher levels of corruption are found to delay the governments imposition of environmental laws. Abid (2017) tests the environmental Kuznets curve (EKC) hypothesis for a group of 58 Middle East and African and 41 European countries, using the generalized method of moments (GMM) over the 1990 to 2011 period. Improving the quality of domestic institutions is found to have direct and indirect effects on reducing CO2 emissions in the countries investigated. Examining the effects of corruption on carbon emissions within an environmental Kuznets curve framework for Brazil, Russia, India, China and South Africa (BRICS), and the Next 11 countries over the period 1990 to 2017, Sinha et al. (2019) note that corruption increases environmental degradation. They employ the time series methodologies of FMOLS and the Westerlund and Edgerton (2007) cointegration tests and GMM estimation. Similarly, Hassan et al. (2020) explore the relation between economic growth and carbon dioxide (CO2) emissions taking into account the impact of governance on environmental quality within an environmental Kuznets curve framework for the BRICS nations over 1996 to 2017. Employing Westerlund panel cointegration tests and the Driscoll–Kraay panel regression estimation, they find that good governance plays an important role in reducing carbon emissions.

Employing spatial econometrics, Hosseini and Kaneko (2013) investigate whether the environmental performance of nations spread spatially to neighboring countries via the spillover of institutional quality of countries. Employing data for 129 countries over the 1980–2007 period, they find evidence of an institutional spatial spillover effect. In a study of the relationship between CO2 emissions, institutional quality, economic growth and exports in Malaysia over the 1984–2008 period and Granger causality tests, Lau et al. (2014) observe the existence of a long-run relationship between the variables. Strong institutions are found to be important for reducing CO2 emissions directly and indirectly. Employing GMM estimation, Ibrahim and Law (2016) investigate the association between institutional quality, trade and their interactions in explaining CO2 emissions in a group of 40 sub-Sahara African nations. They argue that enhancements in institutional quality leads to environmental improvements. Dincer and Fredriksson (2018) using system GMM methods find that the level of trust plays an important role in the influence of corruption on public policy outcomes, in particular, environmental policies. Using the dynamic common correlated effect (DCCE) model of Chudik and Pesaran (2015), Lægreid and Povitkina (2018) examine the association between GDP per capita and carbon emissions, taking into account political institutions as a moderating factor. They find little support for the moderating effect of political institutions on the environment. Arminen and Menegaki (2019) explore the causal relationships between economic growth, carbon dioxide emissions and energy consumption in high- and upper-middle-income countries employing a simultaneous equations model and data over 1985 to 2011. They find that variations in institutional quality have only little influence on energy and environmental policies. Employing a cross-sectional augmented autoregressive distributed lag approach, Khan et al. (2021) examine the effect of fiscal decentralization on carbon emissions in seven OECD countries over 1990 and 2018. They also examine the role of institutions and human capital on the impact of fiscal decentralization on carbon emissions. They note that the association between fiscal decentralization and environmental quality increases in the presence of strong institutions and developments in human capital.

Only the studies of Halkos and Tzeremes (2013) and Goel et al. (2013) investigate the direct effects of institutions on CO2 emissions. Employing nonparametric methods to explore the carbon dioxide emissions–governance nexus for the G-20 over 1996–2010, Halkos and Tzeremes (2013) find a nonlinear relationship between CO2 emissions and the governance measures of the countries under investigation. They observe differences across countries in the governance–CO2 emissions relation and argue that it depends on the level of development of the country and country-specific factors. Governance factors do not necessarily lead to lower CO2 levels according to them. Making use of data for over 100 countries over the period 2004–2007, Goel et al. (2013) investigate the impact of institutional quality on environmental pollution in the MENA countries, paying attention to the influence of corruption and the shadow economy. The results suggest that countries that are more corrupt and countries with large shadow economies have fewer recorded emissions. This is an older literature which does not cover the more recent period during which there has been an increasing focus on the environment. A recent study by Oyewo et al. (2024) examines the effect of country governance systems on carbon emissions performance in 36 top multinational corporations over 15 years. At the macro-level, the results indicate that the control of corruption and voice and accountability are significantly and negatively related to carbon emissions while political stability and government effectiveness have a significant positive impact on the carbon emissions rate. Regulatory quality and the rule of law are found to be insignificant.

They argue that the effect of country governance on the carbon emissions performance of MNEs depends on the country, jurisdiction and geographical regions.

Hence, we extend upon these studies by looking at the direct association between institutions and CO2 emissions in the EU nations, by dividing the countries into two groups, those with high and low institutional quality. We further, in contrast to other studies, employ the STIRPAT model and novel machine learning ridge and lasso regression algorithms to estimate our model.

3 Data, model and methodology

This section discusses the variables, the model and econometric methodology that we use in our study. The study is carried out on a sample of 22 EU countries. We divide the 22 EU countries into two groups (Panels A and B) according to their institutional scores, that is, those above and below the mean level in the sample. Panel A countries are those above the mean score, comprising 10 countries: Austria, Belgium, Denmark, Finland, France, Germany, the Netherlands, Sweden, Ireland and Estonia, and are referred to as those with high institutional quality in the study. Panel B comprises of the countries below the mean score (12 countries): Croatia, Cyprus, Czech Republic, Greece, Hungary, Italy, Lithuania, Poland, Portugal, Slovakia, Slovenia and Spain, and are referred to, as those with low institutional quality.

3.1 Data and variables

The data are annual, covering the years 2002–2020. Table 1 presents variable definitions, units, data sources, descriptive statistics and unit root results. While determining the data range, we consider the longest possible timeframe for creating a balanced panel.

Table 1 Data, descriptive statistics and unit root tests

Although there are many indicators (sulfur emissions, carbon intensity, ecological footprint, environmental performance score, greenhouse gas emissions, etc.) that have been employed to measure environmental degradation, the most commonly used is carbon dioxide emissions (Tamazian and Rao 2010; Halkos and Tzeremes 2013; Goel et al. 2013; Abid 2016) and is hence, our preferred measure of the environment. This is our dependent variable. We, however, also use greenhouse gas emissions as a dependent variable.

Our main explanatory variables of interest are the governance indicators from the World Bank developed by Kaufmann and Kraay (2023), to capture institutions. The governance indicators can be divided into three: (i) economic governance (government effectiveness and regulatory quality), (ii) political governance (political stability, voice and accountability) and (iii) institutional governance (control of corruption and the rule of law). These six dimensions of governance are based on over 30 underlying data sources. These data sources are rescaled and combined to create the six aggregate indicators mentioned (Kaufmann and Kraay 2023). The indicators range from -2.5 (low governance) to + 2.5 (high governance).

These variables can affect the environment in several ways. Corruption can affect CO2 emissions through false or incomplete reporting. Second, through the relaxation of environmental controls (Goel et al. 2013; UNEP 2013), corruption can lead to the fall in quality of environment and overuse of resources (Halkos and Tzeremes 2014; Krishnan et al. 2013). Third, corruption could delay the implementation of environmental policies and regulations. Finally, it could lead to a misallocation of funds for environmental management (Lv and Gao, 2021).

Government effectiveness can lead to policies that are independent of political pressure (Abid 2017). Government effectiveness also involves an effective public service and bureaucratic structure which can reduce CO2 emissions (Khan et al. 2021).

Regulatory quality can similarly influence the environment (Gani 2012) through government policies that permit and encourage private-sector development and have beneficial effects on an economy (Halkos and Tzeremes 2013). If market entry and exit are not guided by hidden fees, arbitrary taxation or unnecessary laws, regulatory quality can lead to environmental benefits (Gani 2012).

A strong rule of law will ensure secure property rights (Lau et al. 2014), and an effective legal mechanism can minimize the effect of market failures (Liu et al. 2020) and provide an effective framework for protecting the environment (Danish et al. 2019; Hassan et al. 2020). Economic agents may not comply with the provisions of an environmental contract if no enforcement mechanism is in place (Gunningham 2011; Gani 2012; Abid 2016).

In nations with greater voice and accountability, the public will hold the government responsible for effective environmental management, the enforcement of strong policies and regulations, while political stability is important for committing to and meeting sustainable goals (Deng et al. 2024). Thus, we expect government effectiveness, regulatory quality, the rule of law, the control of corruption, voice and accountability and political stability to reduce carbon dioxide emissions (Khan et al. 2021; Liu and Dong 2021).

Per capita GDP (PP) is used in the estimation as a measure of economic prosperity, and it is also likely to be associated with more emissions (Goel et al. 2013; Krishnan et al. 2013). Economic growth is closely linked to production and consumption, and hence energy use. Or alternatively, countries with higher incomes could enforce better measures to control environmental pollution. In Grossman and Krueger’s (1991, 1995), environmental Kuznets curve (EKC) hypothesis helps explain the correlation between GDP and pollutions. This theory, which was modified by the authors, is crucial for environmental analysis, as it demonstrates the connection between environmental performance and income. Grossman and Krueger's (1991, 1995) research showed an inverse U-shaped relationship between per capita income and air pollution, with the evidence of nonlinearity, supporting the EKC hypothesis, which posits that environmental improvements will occur after a certain level of economic growth has been reached. According to the EKC, the GDP per capita coefficient is expected to be positive, while the square of GDP per capita coefficient is expected to be negative. Based on Grossman and Krueger (1991, 1995), Thio et al (2022), Huang et al. (2021), Nan et al (2022), Gani (2021) and Xing et al (2023), the affluence variable is measured as per capita GDP. Following the EKC theory, a GDP per capita quadratic term is also included in the empirical analysis.

Urbanization (Urb) and population (Pop) are employed as they have contributed to a significant increase in the demand for energy (environmental degradation increases due to increased energy demand) (Sinha et al. 2019; Gani, 2021). Energy consumption (EN) is still largely met by fossil sources, and moving away from clean energy sources has resulted in increased pollution (Goel et al. 2013; Krishnan et al. 2013). Therefore, energy consumption is used as an explanatory variable in our model. Industrialization (Ind) affects the level of emissions through three different mechanisms. The first is the scale effect (depending on the intensity of the industry), the second is the composition effect (depending on industry effects), and the third is the technique effect (depending on technological developments). The differences between these three conditions can positively or negatively affect carbon emissions (Hosseini and Kaneko 2013). Therefore, industrialization is used as a control variable in our empirical analysis.

Value added in industry and manufacturing (Man) may lead to higher pollutant emissions (Krishnan, et al. 2013; Lv and Gao, 2021). This is captured by manufacturing value added. International trade can affect emissions through its influence on domestic economic activity, especially when greater trade results in emission-related production (Tamazian and Rao 2010; Abid 2017; Goel et al. 2013). This is captured by trade as a percentage of GDP (Op).

Studies have argued that there may be a direct relationship between public expenditures and environmental pollution (Özmen et al. 2022). For example, clean energy incentives can reduce emissions and have negative effects on pollution (Halkos and Paizanos 2017). Alternatively, government expenditure in energy-intensive activities can increase environmental pollution. Therefore, we use government expenditure as a percentage of GDP (GX) to capture this. Foreign direct investment (FDI) can have one of two effects on environmental performance: A pollution haven attracts foreign polluting industries, and pollution halo encourages FDI with efficient technologies that yield environmental benefits (Zakaria and Bibi 2019; Sabir et al. 2020). This is captured by FDI as a percentage of GDP.

3.2 Standard STIRPAT model

CO2 emissions may vary according to the level of technology in a country, welfare, energy structure, economic structure and population. Many factors affect CO2, but modeling all of these variables can be difficult. Ehrlich and Holdren (1971) put forward the IPAT model which can be expressed as follows:

$$I={\text{PAT}}$$
(1)

where I represent the environmental pressure index (such as CO2, greenhouse gas and ecological footprint), P is the size of the population, A is affluence and T is technological progress or the effect per unit of economic activity. According to this model, environmental impacts are the product of population, affluence and technological progress. However, this model was difficult to test hypothetically (Fan et al. 2006). To overcome these limitations, Dietz and Rosa (1994) and York et al. (2003) extended the IPAT model to create the STIRPAT model (Gani 2021).

Dietz and Rosa (1994) and York et al. (2003) introduced STIRPAT (for Stochastic Impacts by Regression on Population, Affluence and Technology) to explain the factors affecting environmental pollution. The specification of the standard STIRPAT model is:

$${I}_{i}={aP}_{i}^{b}{A}_{i}^{c}{T}_{i}^{d}{\varepsilon }_{i}$$
(2)

The model keeps the multiplicative logic of the equation I = PAT, treating population (P), affluence (A) and technology (T) as the determinants of environmental change (I), where a is a parameter to be estimated; b, c and d indicate the coefficients of population, affluence and technology, respectively; and \(\varepsilon \) is the error term.

3.3 An extended STIRPAT (EKC + STIRPAT) hybrid model

York et al. (2003) pointed out that several factors (e.g., political regime, culture) could be added to the standard STIRPAT model as long as they were theoretically appropriate. In recent years, the STIRPAT model has been widely used by researchers. Some researchers have applied the STIRPAT model to study the relationships between CO2, emissions and various variables, extending the model to incorporate additional variables. For instance, researchers have included variables such as urbanization, age structure and education level (Wen and Zhang (2020); foreign direct investment and energy structure (Wen and Shao 2019); trade openness, environmental regulation, fixed assets investment, primary industry proportion and fossil fuel electricity production, forest cover, institutional quality, agricultural activity and industrial activity (Gani 2021).

Starting from the standard STIRPAT model, we proceed with the following functional definition of the EKC + STIRPAT model to investigate the effects of institution structures on CO2 emissions in the 22 EU countries:

$${{\text{Co}}}_{2}={\text{f}}({\text{PP}}, {{\text{PP}}}^{2},\mathrm{ DEI},{\text{EN}},{\text{OP}},{\text{EA}},{\text{GX}},\mathrm{ FDI},{\text{IQ}})$$
(3)

CO2 represents carbon dioxide emissions from fossil fuels and industry. PP is per capita gross domestic product, PP2 is square of PP, DEI is demographic influence (urban population (Urb) and population density (Pop)), EN is primary energy consumption per capita, Op is trade, EA is industry value added (Ind) and manufacturing value added (Man), GX is government expenditure, and FDI is foreign direct investment net inflows. IQ is institutional quality. IQ consists of the control of corruption (CC), government effectiveness (GE), rule of law (RL), regulatory quality (RQ), political stability (PS) and voice and accountability (VA).

When these individual variables are included in Eq. 2, the expanded EKC + STIRPAT form represented by Eq. (4) is:

$$\begin{aligned} {\text{Co}}_{2i} &= a.{\text{PP}}_{i,t}^{\sigma 1} .{\text{PP}}_{i,t}^{2\sigma 2} .{\text{ Urb}}_{i,t}^{\sigma 3} .{\text{Pop}}_{i,t}^{\sigma 4} .{\text{EN}}_{i,t}^{\sigma 5} .{\text{Ind}}_{i,t}^{\sigma 6} .{\text{Man}}_{i,t}^{\sigma 7} .{\text{GX}}_{i,t}^{\sigma 8} .{\text{Op}}_{i,t}^{\sigma 9} . {\text{FDI}}_{i,t}^{\sigma 10}\\&\quad .{\text{CC}}_{i,t}^{\sigma 11} .{\text{GE}}_{i,t}^{\sigma 12} .{\text{RL}}_{i,t}^{\sigma 13} .{\text{RQ}}_{i,t}^{\sigma 14} .{\text{PS}}_{i,t}^{\sigma 15} . {\text{VA}}_{i,t}^{\sigma 16} . \varepsilon_{i,t}\end{aligned} $$
(4)

The extended model based on the standard EKC + STIRPAT model is:

$$ \begin{aligned} {\text{Co}}_{2i,t} & = \sigma_{0} + \underbrace {{\sigma_{1} PP_{i,t} + \sigma_{2} PP^{2}_{i,t} }}_{{{\text{Affulance}}\;{\text{Effect}}}} + \underbrace {{\sigma_{3} Urb_{i,t} + \sigma_{4} Pop_{i,t} }}_{{{\text{Population}}\;{\text{Effect}}}} \\ & \quad + \underbrace {{\sigma_{5} EN_{i,t} + \sigma_{6} Ind_{i,t} + \sigma_{7} Man_{i,t} }}_{{{\text{Tecnology}}\;{\text{Effect}}}} + \underbrace {{\sigma_{8} GX_{i,t} + \sigma_{9} OP_{i,t} + \sigma_{10} FDI_{i,t} }}_{{{\text{Related}}\;{\text{Literature}}}} \\ & \quad + \underbrace {{\sigma_{11} CC_{i,t} + \sigma_{12} GE_{i,t} + \sigma_{13} RL_{i,t} + \sigma_{14} RQ_{i,t} + \sigma_{15} PS_{i,t} + \sigma_{16} VA_{i,t} + \varepsilon_{i,t} }}_{{{\text{Institution}}\;{\text{Effect}}}} \\ \end{aligned} $$
(5)

where \({\sigma }_{1}\dots {\sigma }_{16}\) are the coefficients of per capita carbon dioxide emissions with respect to the explanatory variables.

3.4 Econometric methodology and flowchart

While investigating the effects of institutional quality and GX on CO2, we assume and proceed on the basis that the variables are endogenous. We employ two shrinkage estimators and their tools in the context of fixed effects panel data models based on machine learning (ML), namely the ridge regression of Hoerl and Kennard (1970a, b) and lasso regression developed by Tibshirani (1996). Ridge and lasso regressions are two widely used shrinkage methods which reduce (or shrink) the values of the coefficients compared with ordinary least squares. The advantage of these methods is that the estimated models reflect less variance than least squares estimates. The lasso regression shrinks the estimates exactly zero, enabling the selection of the model. The lasso regression, therefore, carries out the variable selection and parameter estimation process at the same time (Chan-Lau 2017). The flowchart (Fig. 1) depicts the empirical components of this study.

Fig. 1
figure 1

Flowchart

3.4.1 Ridge regression

Hoerl and Kennard (1970a, b) developed the ridge regression technique to solve multicollinearity problems based on machine learning. Ridge regression is one of the most efficient estimators for solving multicollinearity problems (Solarin and Bello, 2019; Wen and Shao, 2019; Yu et al. 2023).

When the terms are correlated and the columns of the design matrix \(A\) have an approximate linear dependence, the singular matrix \({\left({A}^{T}A\right)}^{-1}\). Ridge regression addresses multicollinearity using the following equation:

$$\widehat{\delta }={\left({A}^{T}A+k\tau \right)}^{-1}{A}^{T}y, k>0$$
(6)

where \(k\) indicates a biasing parameter or ridge parameter and \(A\) is an explanatory matrix. \(\tau \) is a unit matrix and \(y\) is the explained vector. The \(k\) values range from zero to one. In the econometric literature, there is a value of \(k\) for which the mean squared error (MSE) of the ridge regression estimator is less than that of OLS estimators. As the ridge parameter (\(k\)) measures the ridge regression bias, the smallest value \(k\) is desirable (Solarin and Bello, 2019; Wen and Shao, 2019). We employ the method adapted by Shehata (2013a, b) panel FE ridge and within-effects ridge regression in our estimation.

Fixed effects estimators rely only on the variations within individual countries. A fixed effects approach, which we use as a more robust method of panel data analysis, reduces the impact of confounding factors by accounting for individual time-invariant measured and unmeasured characteristics (Greene 2007). We use ridge fixed effects regression to capture the characteristics of each country under multicollinearity adapted by Shehata (2013a) for Stata with xtregfem. On the one hand, we also employ ridge within-effects mean within-group (i.e., variation around group means) regression models adopted by Shehata (2013b) for Stata with xtregwem. We check the robustness of these findings with ridge within-effects (RWE) regression, which does not include a constant term for each country but considers variability over time. Within-effects regression examines time series changes within individual countries, rather than controlling for differences between individual countries (Greene 2007). The results of our ridge fixed effects regression models allow controlling for time-invariant individual country-level characteristics, as given in Tables 4 and 5.

3.4.2 Robustness checks

To check the robustness of the ridge regression coefficient estimation findings, we employ another machine learning (ML)-based efficient estimator, namely the least absolute shrinkage and selection operator, lasso, introduced by Tibshirani (1996).

Although ridge regression is a continuous process that shrinks the OLS coefficients toward zero to reduce the variance, it does not set the coefficients to zero, which makes the model difficult to interpret (Tarkhamtham et al. 2021). To address this issue, we employ the lasso method. Lasso is an estimator that penalizes the absolute size of coefficients and subset selection (Shi et al. 2020). Lasso shrinks the coefficient as the ridge regression and sets some coefficients to zero.

Lasso regression minimizes the mean squared error subject to a penalty on the absolute size of the coefficient estimates. The estimator equation is:

$$\left({\widehat{\beta }}_{{\text{Lasso}}}\right)=\begin{array}{c}{\text{argmin}}\\ \beta \end{array}\sum_{t=1}^{T}{\left({y}_{t}-{\beta }_{0}-\sum_{j=1}^{P}{\beta }_{j}{x}_{tj}\right)}^{2},$$
(7)

subject to \({\Vert \beta \Vert }_{1}\le t,\) where \(t\ge 0\) is a tuning parameter, \({y}_{t}\) and \({x}_{t}\) are the dependent and explanatory variables, respectively, and t is time. The parameter \({\beta }_{j},j=1,\dots ,p,\) indicates the effect size of the regressor \(j\) on the dependent variable (Ahrens et al. 2021).

The lasso method has several tools for effective estimation using the panel data approach. We employ two tools. The first is the square-root lasso, and the second is the post-estimation OLS. The square-root lasso equation can be expressed as follows:

$$\left({\widehat{\beta }}_{\sqrt{Lasso}}\right)=\begin{array}{c}arg min\\ \beta \end{array}\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}-{\mathcal{x}}{\prime}\beta \right)}^{2}}+\frac{\gamma }{n}\sum_{j=1}^{p}{\vartheta }_{j}\left|{\beta }_{j}\right|$$
(9)

The square-root lasso is a modification of the lasso that minimizes the root-mean-squared error (RMSE) while also imposing an \({{\ell}}_{1}\) penalty. The square-root lasso has several advantages over the standard lasso. If theoretically well balanced, the square-root lasso becomes an apparent data-driven penalization. The score vector, and thus the optimal penalty level, is independent of the unknown error variance under homoscedasticity, which facilitates a simpler procedure for selection (Ahrens et al. 2020).

The post-estimation OLS equation is:

$${\widehat{\beta }}_{{\text{Post}}}=\mathrm{arg min}\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}-{\mathcal{x}}{\prime}\beta \right)}^{2} \mathrm{subject to }{\beta }_{j}=1 {\text{if}} {\widetilde{\beta }}_{j}=0$$
(10)

where \({\widetilde{\beta }}_{j}\) is the sparse first-step estimator such as the lasso. Post-estimation OLS treats the first-step estimator as a genuine model selection technique (see Ahrens et al. 2020).

In brief, (i) lasso is designed as a model selection tool and is obtained by solving a constrained minimization problem that yields vertex solutions by setting some coefficients to exactly zero (Balima and Sokolava, 2021); (ii) it automates the model selection in linear regression because of the nature of 1-penalty, which is sensitive to multicollinearity (Khattak et al. 2021); and (iii) there are many variables in a regression, and only, some of them can capture the main features of the regression and lasso automates their detection (Maruejols et al. 2022). We employ the FE version of the lasso adapted by Ahrens et al. (2020) with a lassopack.

4 Empirical findings and discussion

Multicollinearity is a statistical issue that occurs when two or more independent variables in a regression model are highly correlated, resulting in an unbiased estimation of the coefficients of the variables. When the design matrix (X) is rank deficient, the (XˊX)−1 matrix approaches singularity, leading to multicollinearity (Ardakani and Seyedaliakbar 2019). To address multicollinearity, ridge regression is a method proposed for estimation rather than the traditional OLS. However, we first need to check for multicollinearity in our EKC + STIRPAT model (Eq. 5). (Table 2).

Table 2 Pooled OLS and multicollinearity test results

Our primary aim is to control for full multicollinearity or non-multicollinearity among the explanatory variables in the model. One method for detecting multicollinearity is to calculate the variance inflation factor (VIF). It is worth noting that our methodology does not entail conducting a hypothesis test for VIF values. Instead, we concentrate on the magnitude of the VIF scores, deeming values surpassing a specific threshold as suggestive of possible multicollinearity concerns. The mean VIF values for the full sample, Panel A and Panel B are 351.61, 914.21 and 651.59, respectively, indicating multicollinearity risk in our model.

The second important point is the phenomenon of random and fixed effects, which are widely researched in panel econometrics. Panel model estimations often use fixed effects (FE) and random effects (RE) models, which are widely employed in empirical analyses. The Hausman test is a useful tool for determining the most appropriate regression model, that is, FE or RE.

The Hausman test shows whether the individual characteristics are correlated with the regressors. The null hypothesis is that they are not (random effect). If prob > chi2 is < 0.05, we can use fixed effects. The results in Table 3 show that the Hausman tests passes the significance level, indicating that the FE model is more suitable for estimations. The rest of this section discusses the empirical findings based on ridge regression and lasso regression.

Table 3 Hausman test results

4.1 Ridge findings

Firstly, we present full sample findings. Table 4 presents the findings from the benchmark linear ridge regression model for the full sample, that is, 22 EU countries. We perform the panel FE, panel FE generalized ridge regression and panel within-effects generalized ridge regression.

Table 4 Full sample ridge results

In Eq. 5, the adjusted R2 is close to one, and the F statistical value is also significant; thus, it is a fit model. The ridge parameter (k) is small, indicating confidence in the ridge regression.

The generalized ridge FE (RFE) regression findings show that per capita income (lnPP) and per capita income square (lnPP2), industry value added (lnInd), manufacturing value added (lnMan), trade (lnOp), foreign direct investment (FDI), regulatory quality (RQ), rule of law (RL), political stability (PS) and voice and accountability (VA) do not have a statistically significant effect on ln CO2 for the 22 EU countries. Primary energy consumption per capita (lnEN) and government effectiveness (GE) are statistically significant and have an increase effect on ln CO2. In contrast, urban population (lnUrb), population density (lnPop), government expenditure (lnGx) and control of corruption (CC) are statistically significant and reduce ln CO2. Accordingly, while economic governance (government effectiveness) and institutional governance (control of corruption) have an effect on ln CO2 in the full sample of 22 countries, political governance (political stability, voice and accountability) has no effect. The RWE also confirm these findings.

Second, as mentioned above, we focus on the effects of governance variables on CO2 for the two samples based on institutional quality: high and low (above and below the mean value). Table 5 presents ridge regression findings for the Panel A and Panel B samples. In these samples, the adjusted R2 is close to one, and the F statistic value is also significant, suggesting a fit model. Both the Panel A and Panel B samples have a small ridge parameter (k), indicating the reliability of the estimator.

Table 5 Ridge results for Panel A and Panel B countries

Let us start with the Panel A. For Panel A group, the per capita income (lnPP) and per capita income square (lnPP2), population density (lnPop), industry value added (lnInd), manufacturing value added (lnMan), CC, RQ, RL and PS are statistically insignificant. Urban population (lnUrb), energy consumption per capita (lnEN), trade (lnOp), government expenditure (lnGx), FDI, GE and VA are statistically significant. Government expenditure (lnGx), FDI and VA have a statistically significant effect in ln CO2, suggesting that government expenditure, foreign investment, and voice and accountability lead to a fall in CO2 emissions. Urban population (lnUrb), energy consumption per capita (lnEN), trade (lnOp) and government effectiveness (GE) have an increasing and statistically significant effect on ln CO2. Accordingly, while economic governance (government effectiveness) has an increasing effect and political governance (voice and accountability) has a decreasing impact on ln CO2 in the strong institution group, institutional governance (control of corruption and rule of law) has no effect. Contrary to expectations, the effect of GE on ln CO2 is negative. In addition, there is no evidence of a EKC hypothesis for the Panel A group. The RWE estimates also confirm these findings.

VA covers governance in environmental management. Voice refers to the level of citizen participation in selecting government officials, while accountability pertains to the public's capacity to express concerns, provide feedback and propose solutions on environmental issues. In regions where press freedom is upheld, citizens can actively engage in shaping environmental governance (Oyewo et al. 2024). However, private organizations may face public backlash, especially in areas with strong opinions and freedom of expression, which may require them to take deliberate measures to reduce carbon emissions to satisfy public sentiment. It is important to recognize that VA has a substantial negative impact on carbon emission rates. The relationship between GE and carbon emissions is positive, suggesting that implementing effective government policies may not necessarily lead to lower carbon emissions. It is possible for even the most effective policies to prove insufficient in mitigating the climate crisis and reducing emissions, particularly in cases where such initiatives do not fully address the root cause of the issue, and climate issues are still being debated.

The ridge regression findings for the Panel B countries are fundamentally different to that of Panel A. The generalized RFE findings show that per capita income (lnPP) and per capita income square (lnPP2), industry value added (lnInd), manufacturing value added (lnMan), trade (Op), FDI, GE, RQ, RL and VA does not have a statistically significant effect on ln CO2. Additionally, lnUrb, lnPop, lnGx, CC and PS are statistically significant and lead to a fall in CO2 emissions. Energy consumption per capita (lnEN) increases and has a statistically significant effect on ln CO2. Economic governance (government effectiveness and regulation quality) has no impact on ln CO2, while institutional governance (control of corruption) and political governance (political stability) affect CO2 emissions in the low-institutional-quality group. CC and PS lead to a fall in ln CO2, emissions as theoretically expected. We find no evidence of an EKC for the high-institutional-quality group; however, evidence of an EKC is confirmand for Panel B countries.

Political stability creates a conducive environment for businesses to flourish and operate in. When society provides a favorable environment for businesses, organizations feel obligated to address pressing environmental issues, such as controlling carbon emissions (Oyewo et al. 2024). Political stability can also motivate companies to engage in ethical practices, reducing corruption.

4.2 Robustness tests with lasso

Table 6 presents the findings from the benchmark Eq. 5 estimated using lasso regression-based ML for the full sample. We perform the square-root lasso and the post-estimation OLS tests. Lasso only selects and reports the variables that are important to the dependent variable and drops the variables that are not important from the model. The variables that are dropped are denoted by Na (not available). To test our findings, we use another dependent variable, by replacing CO2 with greenhouse gas emissions, lnGhe (Model 2).

Table 6 Robustness test employing lasso

The lasso findings show that lnPP, lnInd, lnOp, FDI, PS and VA do not have a statistically significant effect on ln CO2. Additionally, lnEN, lnMan, GE, RQ and RL have a statistically significant increasing effect on ln CO2. LnPP2, lnUrb, lnPop, lnGx and CC have a statistically significant decreasing effect on ln CO2. The square-root lasso and post-estimation findings confirm the statistical significance of lnUrb, lnPop, lnEN, lnGX, CC and GE of the ridge regression estimates for the full EU sample. Accordingly, our findings for lnUrb, lnPop, lnEN, lnGX, CC and GE are stronger than those for the others because all three estimators (RFE, RWE and lasso) confirm similar findings for the full sample.

We now present the lasso findings for the two groups of countries in Table 6.

For Panel A, countries with high institutional quality, lnPP2, lnPop, RQ and RL are statistically insignificant. The findings show that lnPP, lnGx, FDI, CC and VA have a negative and statistically significant effect on ln CO2, while the effects on, lnUrb, lnEN, lnInd, lnMan, lnOp, GE and PS are positive. The results are consistent with the post-estimation results. Accordingly, economic governance (government effectiveness) and institutional governance (control of corruption) and political governance (political stability and voice and accountability) have a significant effect on ln CO2. However, contrary to expectations, GE and PS increase ln CO2. The ridge and lasso findings are consistent for CC, GE and VA. In addition, the findings for lnUrb, lnEN, lnGx and FDI under the ridge regression for the high-institutional-quality countries are confirmed by the square-root lasso and post-estimation results. Here, the findings that all estimators (RFE, RWE and lasso) confirm the results for lnEN, lnUrb, FDI, lnGx, lnOp, GE and VA.

The results of Model 1 for the Panel B groups of countries show that lnUrb, lnPop, lnEN, lnOp, lnGx, CC, RQ, RL, PS and VA are statistically significant. Of these, lnEN, RQ, RL and VA are statistically significant and increase CO2 emissions, whereas lnUrb, lnPop, lnOp, lnGX, CC and PS are statistically significant and reduce ln CO2. For the low-institutional-quality countries, economic governance, institutional governance and political governance affects ln CO2. CC and PS reduce ln CO2, as expected theoretically. RQ, RL and VA increase ln CO2. However, the strongest findings among these are for CC and PS, as all estimators (RFE, RWE and lasso) show consistent findings for two variables. In addition to these, the lasso findings are consistent with the ridge findings for lnUrb, lnPop, lnEN and lnGx.

Table 6 shows the findings of the model which employs per capita greenhouse gas emissions logged (lnGhe), as the dependent variable under the lasso method, while Table 7 shows the ridge regression findings. Comparing Models 1 and 2 for the Panel A nations shows that the coefficients on lnPP, lnEn and GE are consistent with previous results, suggesting that this group of nations are far from achieving the greenhouse gas emissions goal of tackling the climate crisis. On the other hand, when comparing Models 1 and 2 for the Panel B group, the findings are consistent, implying that lnEn appears to play a dominant role in tackling the climate crisis. In addition, lnUrb, lnPop, lnGX, CC and PS appear to reduce greenhouse gas emissions in EU countries with relatively weak institutional structures.

Table 7 Alternative dependent variable (lnGhe)

Figures 2, 3 and 4 present a graphical representation of the coefficients for the full EU sample, Panel A and Panel B countries, confirmed by three estimators (RFE, RWE and lasso).

Fig. 2
figure 2

Coefficient path for Full sample. Notes We plotted the L1 Norms of the findings, for which all three estimators were consistent

Fig. 3
figure 3

Coefficient path for Panel A countries. Notes We plotted the L1 Norms of the findings, for which all three estimators were consistent

Fig. 4
figure 4

Coefficient path for Panel B countries. Notes We plotted the L1 Norms of the findings, for which all three estimators were consistent

The magnitude of the coefficients are shown on the y-axis and the L1 norm is shown on the x-axis. For the full sample, it is found that all the selected predictors shown in Fig. 2 (on the left side) have a significant impact on ln CO2. Figures 3 and 4 show selected predictors for the Panel A and Panel B countries with L1 Norm, respectively. Different findings and sizes can be observed more easily in the figures. Once again, it can be argued that the findings differ for the Panel A and Panel B nations. Among the findings in this graph, the strongest findings are for Panel B because they have the same coefficient signs in all findings.

5 Discussion and conclusion

This study investigates the impact of institutions on CO2 emissions in the EU. The study also divides the EU nations into two groups: those with high and low institutions. Employing a STIRPAT model and ridge and lasso machine learning techniques, the results suggest that the control of corruption and government effectiveness are statistically significant for the full sample and the rule of law is not statistically significant. The rule of law may not have a significant effect on reducing carbon emissions if urbanization and primary energy consumption are high as evidenced by our results (Khan et al. 2023). It is possible that energy consumption and urbanization are rising at a faster than laws on controlling environmental pollution.

The results indicate that in the high-institutional-quality EU nations, under both the ridge and lasso regression methods that government effectiveness contrary to expectations lead to an increase in CO2 emissions. It is possible that greater government effectiveness leads to higher costs and prices, causing firms to postpone transitioning to environmentally friendly methods (Dechezleprêtre and Sato 2017). Voice and accountability lead to a fall in emissions in this group of countries.

The results also indicate that in group B, political stability and control of corruption reduce CO2 emissions. These observations are supported by Leitao (2010), Abid (2017), Sinha et al. (2019) and Oyewo et al. (2024).

There are issues that still hinder the progress of climate change policies in the EU. According to Grabbe and Lehne (2022), some EU members still rely heavily on coal, with industrial lobby groups raising fears about international competitiveness and employment which have limited the successful implementation of these policies. At the 2019 European Council meeting, Hungary, the Czech Republic, Estonia and Poland did not sign a long-term target of achieving climate neutrality by 2050 with Poland requiring financial aid from the EU for those affected by the transition to greener energy.

In addition, in group A, energy consumption, urbanization, population density and industry value added also lead to an increase in CO2 emissions. This is plausible given that higher population density and industrial growth lead to higher emission levels. Similarly, energy consumption increases CO2 emissions in group B. This is reasonable, given that many countries in this group still rely on fossil fuels.

Per capita income is found to reduce CO2 emissions in group A. The high-income EU nations have taken a strong stance on reaching carbon neutrality by 2050, which perhaps explains this. Manufacturing output leads to a fall in CO2 emissions in group B. Many countries have replaced polluting sources of energy in industry with other sources of energy such as wind and solar, which is probably the reason for this.

In a comparison of our results with the literature, to the best of our knowledge, only the study of Abid (2017) has focused on the EU countries. However, Abid (2017) classifies the 41 countries in the study geographically rather than politically, and the model is heterogeneous. Our findings differ from that of Abid (2017) when we divide the sample into those with high and low institutional quality according to different political economy characteristics. In the group with high institutional quality, government effectiveness has detrimental effects on the environment, whereas in the low-institutional-quality group, the opposite holds. Our findings imply that variables such as primary energy consumption, industry value added, population density and urbanization are more important determinants of CO2 emissions in the EU than are variables such as institutional governance (corruption and rule of law). Our results suggest that changes in institutional quality such as the rule of law and regulatory quality have a limited impact on CO2 emissions. These findings are consistent with those reported by Arminen and Menegaki (2019) and Oyewo et al. (2024).

In addition, our findings for group B are consistent with the results of Khan et al. (2021). But Khan et al. (2021) do not differentiate between the different institutional quality variables. Halkos and Tzeremes (2013) differentiate between the institutional quality variables and find that they have different effects on CO2 emissions depending on country characteristics and the level of development. They, however, conclude that improving the quality of countries’ governance factors does not necessarily always result in lower carbon dioxide emission levels. Oyewo et al. (2024) similarly argue that the effect of country governance on the carbon emissions performance of MNEs depends on the country, jurisdiction and geographical regions. Our findings are consistent with Halkos and Tzeremes’ (2013) and Oyewo et al. (2024) findings.

Our results imply that the EU needs to continue to channel resources to fight climate change, and hold governments accountable, group A taking into account factors such as urbanization, populations density, industrial value added and group B, energy consumption. The factors that affect CO2 emissions differ between the two groups of countries. The fact that some countries have more ambitious policies and others less stringent ones, does not make it easy. This suggests that the EU as a whole needs to implement regulations and environmental policies in sectors including industry, population, energy consumption and urbanization that are coordinated and consistent and aligned with climate targets to achieve the best outcomes for the group as a whole.