1 Introduction

Changes in the global climate are one of the most pressing issues of our day, and human industry emissions are a major contributor to that problem. As the atmospheric concentration of carbon dioxide (CO2) continues to rise, it is clear that a workable solution to environmental challenges is essential, even if energy utilization and industry sector expansion a crucial engines of prosperity. Literature reviews reveal several theories and methodologies for explaining the connection between the expansion of the economy and pollution. At the same time, it becomes clear that the level of per capita income affects environmental quality, leading to shifts in environmental policy and crediting to the idea that higher incomes are associated with more rapid environmental decline.

Some academics have argued that economic growth has become a dilemma advanced economies face; most countries seek growth and ignore its impact on the environment. Economic expansion has been acknowledged as contributing to rising living standards in industrialized countries (Jhingan, as cited by Osadume and University (2021). Schumpeter (1934, as stated in Osadume and University (2021), defines growth as a rise in both the rate of savings and the size of the population through time. So, economics, though focused on the nature of change and its causes instead of the type of economy, is more important to understanding growth. Therefore, the world must focus on economic growth in light of environmental preservation. He must use his great potential to achieve this.

Confusion between economic growth and environmental preservation disproves the argument that the difficulties associated with economic expansion are exclusive to developed countries. Also, developing countries are partners and responsible with developing countries for climate change. Economies in transition, such as those in Africa, are also expanding. According to Jhingan, as stated in Osadume and University (2021), economic and non-economic variables contribute to growth. Economic variables include money, business initiatives, technological advancements, and natural and human resources, whereas non-economic elements include social institutions, political climate, and moral ideals. A country’s economy relies on the innate potential of its people’s capital and the state of the art technology.

The global discourse surrounding climate change and its multifaceted implications has become increasingly urgent in recent decades. Central to this discourse is the pivotal role of CO2 emissions, a prominent driver of anthropogenic climate change. Understanding the intricate relationship between CO2 emissions and macroeconomics is paramount as the world strives to balance economic expansion and environmental sustainability. This relationship becomes even more intriguing when examined across low- and high-income countries, each marked by unique socio-economic contexts and developmental trajectories.

Traditionally, exploring the interplay between CO2 emissions and macroeconomics has relied on conventional economic models and statistical analyses. However, this relationship’s complexity and non-linear dynamics require innovative methodologies that capture intricate patterns and predict future trends. The ML techniques harness the power of data-driven insights to uncover hidden relationships, identify influential variables, and generate predictive models that transcend the limitations of traditional approaches. Hence, this investigation distinguishes itself from prior research by leveraging machine learning algorithms to precisely identify the factors influencing carbon emissions in both low- and high-income countries. This approach seeks to formulate a unified strategy for addressing environmental challenges, departing from earlier studies that examined the issue individually or in groups. Consequently, this paper is one of the pioneering contributions in recognizing environmental challenges as shared issues requiring collective action by all countries. It is worth mentioning that the Low-income countries’ population size represented about 8.5% of the total world population size in 2020, while the percentage of the high-income countries’ population size was about 15.8%. Although the low-income country’s population is about half the size of high-income countries, their CO2 emissions are very large compared to low-income countries, as their rate reached 32.3%. In comparison, their percentage in low-income countries reached about 0.5% of Total CO2 emissions globally (World Bank, dataset).

In this study, we delve into the intriguing association between CO2 emissions and macroeconomic indicators propelled by the capabilities of ML. We embark on a journey that transcends geographical boundaries, comparing low and high-income countries to unravel context-specific drivers and constraints. By leveraging the prowess of ML algorithms, we aim to uncover nuanced patterns that illuminate how economic factors intertwine with CO2 emissions. Such insights contribute to the academic discourse and give policymakers invaluable knowledge to devise targeted strategies that reconcile economic aspirations with environmental responsibilities. The research aims to address critical inquiries: Can ML algorithms effectively identify the factors influencing carbon emissions in both low- and high-income countries? Are there distinctions in these factors between low-income and high-income countries? Furthermore, is it feasible to propose recommendations for mitigating carbon emissions by uncovering and understanding their determinants?

Through this exploration, we endeavor to contribute to a more comprehensive understanding of the relationship between CO2 emissions and population size, gross domestic product growth (GDP), value added of the agricultural, industrial, and services sectors, and foreign direct investment inflow (FDI) while also highlighting the transformative potential of ML in addressing complex global challenges. Below is a summary of this research paper’s structure: Sect. 2 thoroughly analyzes the existing literature. Section 3, Empirical Framework, Sect. 4 outlines the methodology employed, detailing data collection, preprocessing, feature selection, and the selection of machine learning algorithms. Section 5 presents the results of the analysis, discusses the implications of the findings, and provides insights into the importance of various determinants. Lastly, Sect. 6 concludes the study by summarizing the key takeaways, discussing the limitations, and suggesting directions for future research.

2 Literature review

Environmental degradation has been recognized as a serious hazard to the natural world and human civilization (Aye & Edoja, 2017). The rate at which a country develops depends on various factors, including population size, economic uncertainty, and availability of natural resources. Economic growth aims to raise the standard of living for all people and the wealth of countries. However, pollution, overexploitation, degradation, loss of species, and climate change are all problems that could arise due to growth in particular areas (Phimphanthavang, 2013). So, The arguments under the heading of developing a “correct degree of growth” that should be related to the goal of lowering CO2 emissions were where most of the papers that studied the connection between GDP growth and CO2 pollution were debated. Assuming that CO2 emissions are a stand-in for environmental degradation, Azam et al. (2016) conclude that CO2 emissions contribute to the booming economies of China, Japan, and the United States. Pao and Tsai (2010) and Li et al. (2022) show that energy use positively impacts CO2 emissions in the long term in BRICS countries. Several studies have looked at the relationship between CO2 emissions and economic expansion on the country’s level; one such study is Yousefi-Sahzabi et al. (2011), which looked into the issue in Iran and identified a statistically significant link between CO2 emissions and economic growth. Moreover, Bouznit and Pablo-Romero (2016) confirm these findings on the Algerian profile, and Osadume and University (2021) researchers examined how various African countries’ economic development affected their CO2 emissions. The underlying theoretical framework is found in Simon-Steinmann’s economic growth model. Results reveal that Having a high degree of cointegration in the short run, the independent variable (CO2) positively impacted the dependent variable (GDP) growth across all samples combined. (Adu & Denkyirah, 2017), Industrialization and globalization are regarded to be catalysts for economic expansion. According to Pettitte (1987), referenced in Adu and Denkyirah (2017), industrialization is “the economic engine for expansion and prosperity.” Aye and Edoja (2017) argued that rising GDP might be used as a reliable predictor of future CO2 emissions. CO2 emissions are thought to contribute to global warming and, by extension, environmental deterioration because of their greenhouse effect. Instead of more CO2 being released when incomes rise, as was previously believed by the authors, they argue that this will occur unless steps are taken to reduce CO2 footprints.

Based on research in the related literature, we learn that CO2 emissions and GDP per capita are typically the only two variables included in the models that use the panel cointegration approach (Zhang et al., 2021; Kapusuzoglu, 2014; Arouri et al., 2012; Lean & Smyth, 2010; Martinez-Zarzoso & Bengochea-Morancho, 2004). Thus, GDP growth has been factored into the evaluation as a highly significant, closely followed economic variable and the traditional focal point of economic study.

Progress and economic growth are largely driven by energy, which influences our well-being (Mendonç et al., 2020). Therefore, public agenda tactics and economic development are essential to consolidating environmental sustainability and managing climatic stress. Although energy is a key driver of economic expansion, its negative impact on well-being can be mitigated by encouraging the correct growth, both affected by economic activity and technology. Bilan et al. (2019) analyze how renewable energy sources (RES) affect CO2 emissions and the GDP. The results affirm a connection between RES, CO2 emissions, and GDP. The GDP of EU member states is affected by RESs in the form of personnel and capital resources. When economic growth drives a rise in renewable energy use, the results also show a correcting reversal. According to the study of (Toumi & Toumi, 2019; and Cosmas et al., 2019), renewable energy production should be encouraged in countries that are candidates or may become candidates for EU membership. These findings align with several studies that found that mandates to increase the use of the RES reduced CO2 emissions. More recently published research has found that energy helps drive economic expansion (Zhang et al., 2021; Baz et al., 2021; Magazzino, 2015; Azam et al., 2021; and Shahbaz et al., 2013). However, studies also show that energy use harms economic expansion (Garcia et al., 2020). Although the existing literature suggests that growth in and of itself may lessen Adaptation to climate change and that economic uncertainty to disasters lessens with rising income, it also highlights that the amount of CO2 emissions we engage in is directly proportional to our level of wealth. This study’s itinerary was informed by a review of the existing literature, which revealed a need for more studies focusing on the EU’s country profile. This research gap motivated the authors to explore a crucial issue associated with CO2: the correlation between CO2 emissions and real GDP in the EU countries. (Kadanali & Yalcinkaya, 2020) employ linear and nonlinear approaches using a cross-sectional dependence framework for modern panel data analysis. According to this research, global warming has a negative and statistically significant impact on economic expansion, regardless of the etiological framework employed to explain the climatic regime.

Some studies have gone to test Kuznets’ hypothesis, like those in West Africa, which are predicted under the environmental Kuznets’ curve to experience environmental degradation as soon as economic development begins (Grossman & Krueger, 1995). (Youmani, 2017) stated that the rise in economic activity is to blame for the deterioration of the environment in West African countries. Technology and rising energy usage facilitate industrialization and globalization. It was argued (Brock & Taylor, 2005) that economic activities have less environmental impact when manufacturers use less polluting technologies in their processes. Some recent studies also tested Kuznets’ hypothesis (Naqvi et al., 2023); in this paper, the researcher looks at 87 middle-income countries from 1990 to 2017 and see how FDI, GDP, natural resource depletion, urbanization, biomass energy consumption, and environmental impact all connected. Using the Augmented Mean Group and Dumitrescu-Hurlin causality test, we examined the Environmental Kuznets Curve (EKC), the Renewable Energy, and the Pollution Haven Hypothesis (PHH). The empirical results support the EKC hypothesis, which reveals an inverted U-shaped relationship between economic growth and ecological footprint. Since the growing ecological footprint in middle-income countries is due to rising foreign direct investment, the empirical data lend credence to the PHH. The long-term sustainability of the environment is severely threatened by the extraction of natural resources and the trend of urbanization. In addition, the results supported the Renewable Energy-EKC by showing that the utilization of biomass energy was associated with economic growth in an inverted U-shaped relationship. The results show that two major factors influencing the use of biomass energy are urbanization and FDI. To achieve the SDGs and ensure the long-term viability of the ecosystem, these findings highlight the urgency for governments to propose all-encompassing energy and economic policies. (Yang et al., 2023), Following discussions at the 26th United Nations Climate Change Conference, this article delves into the potential consequences of reaching carbon neutrality by 2050 or 2060. This study examines the top ten manufacturing countries using renewable energy and technology to improve energy efficiency and achieve carbon neutrality, emphasizing Sustainable Development Goal 7’s objective of a clean and cheap environment. Using a new technique called the Method of Moment Quantile Regression (MMQR), the study found that renewable energy and energy efficiency had a major impact on lowering these nations’ emissions of greenhouse gases from 1990 to 2020. It should be noted that the manufacturing sector positively impacts emissions. Efficient energy use in manufacturing reduces emissions at all quantiles, suggesting that efficiency initiatives can offset the negative effects of industrial expansion on emissions. In addition, technological advancements have a notable and beneficial effect on decreasing emissions of greenhouse gases. Policymakers in the top ten manufacturing nations can use the study’s affirmation of the EKC hypothesis as a compass to guide their efforts toward carbon neutrality. (Jahanger et al., 2023), Delves into the ongoing debate surrounding the Environmental Kuznets Curve (EKC) hypothesis, particularly its shape, and implications for the relationship between environmental sustainability and economic activities. Focusing on the top nuclear energy-producing countries, the research examines whether an N-shaped EKC exists among these countries, challenging the common belief in an inverted U-shaped EKC. The findings reveal the N-shaped EKC hypothesis’s validity in these countries ' context by utilizing annual time-series data from 1990 to 2018 and employing the Dynamic Common Correlated Effects Approach. This suggests that despite economic advancements, the energy sector driving growth in these countries remains heavily reliant on fossil-based sources. Additionally, the study highlights the positive impact of nuclear energy generation on environmental quality. It identifies negative associations between military spending, human capital, and ecological footprint, emphasizing the potential roles of national security and education in enhancing environmental sustainability.

Some of the studies focused on groups of countries. The following studies analyzed the instruments for lowering CO2 emissions while allowing EU economies to expand. Recent research (Khan et al., 2022; Fávero et al., 2022) confirms a worldwide correlation between GDP expansion and CO2 emissions. In the same context, Martinez-Zarzoso & Bengochea-Morancho (2004) examine the connection between GDP growth and CO2 emissions for ten EU countries (1981–1995), concluding that significant differences exist in the approaches to reducing emissions. This finding points to the need for reducing emissions while considering each EU country’s economic situation. Both used Onofrei et al. (2022), the same study scope. Using a panel data set from 2000 to 2017, this research analyzes the changing relationships between the economic growth of EU countries and CO2 emissions. Both versions of the estimators show a positive effect of economic growth on CO2 emissions, with a 1% rise in GDP resulting in a 0.072 rise in CO2 emissions. The research also shows that people with more disposable income are more interested in protecting the environment, highlighting the importance of developing environmental regulations to curb emissions even as economies expand. Despite this, Acaravci and Ozturk (2010) test the cointegration approach for (19) European countries using autoregressive distributed lag (ARDL) bounds and find a causal relationship between CO2 emissions, usage of energy, and economic development in only (7) of the countries.

Considering that energy is one of the causes of carbon emissions, it has been subjected to many studies. Energy consumption has risen due to several variables, including industrialization, globalization, population size growth, and lifestyle changes. CO2 emissions rise in tandem with rising energy consumption as economies pursue growth in GDP (Apergis & Ozturk, 2015). According to (Youmani, 2017), the composition effect determines whether economic expansion has a favorable or negative effect on environmental chemistry. (Pata et al., 2022), With urbanization, economic structure, and economic growth as control variables, this study seeks to analyze the effect of financial development on renewable energy usage in the US from 1980 to 2019. By examining how six sub-indicators—the depth, efficiency, accessibility, and efficiency of financial markets and institutions—influence the use of renewable energy, the study adds to the current body of knowledge. This is accomplished using the innovative wavelet-transformed Fourier quantile causality test. According to the empirical results, high quantiles of renewable energy consumption are encouraged by financial development, both in the medium and long run. When it comes to promoting the consumption of renewable energy, the two most significant criteria are depth and access to financial markets.

To preserve the environment, (Yang et al., 2023) examine the effects on pollution levels of China’s low-carbon city pilot program (LCCP), emphasizing the policy’s impact on businesses. Based on data integrated from the China Enterprise Pollution Emissions database and the Chinese Industrial Enterprise database, the study uses a time-varying Difference in Difference technique to draw many important conclusions: (1) LCCP promotes environmentally friendly technology and innovation capacity, which greatly reduces pollution in enterprises. (2) LCCP helps with healthy urbanization and restores the urban ecosystem by reducing emissions intensities. (3) LCCP’s effectiveness varies depending on factors like ownership, type, and location of the firms. It is more effective for foreign or private-owned, capital- and technology-intensive, high-polluting firms in the eastern and western zones. Multiple sensitivity studies corroborate the results’ robustness. The research highlights the LCCP’s theoretical and practical backing in China. Businesses in low-carbon pilot towns can benefit greatly from utilizing this to their advantage by shifting their manufacturing practices and implementing green transformation initiatives backed by government incentives and preferences. At the end of the article, we offer some policy recommendations to help move forward with China’s carbon-neutral cities’ revitalization plan. Moreover, (Zahra et al., 2023) explore the effects of the green-blue revolution on Pakistan’s environmental sustainability from 1976 to 2020, focusing on carbon emissions from production and the agro-environmental footprint. The study uses Johansen co-integration tests to show that the country’s green-blue revolution was associated with environmental degradation over the long run. Median quantile regression analysis reveals that agricultural machinery, insecticides, and aquaculture output contribute to production-based carbon emissions. Conversely, fertilizers, agricultural finance, or high-yield variety seeds do not significantly affect production-based carbon emissions. In addition, fertilizers and seeds of high-yield varieties are known to improve Pakistan’s agro-environmental impact. It is worth mentioning that different quantiles have different effects of external factors on environmental proxies. In light of these results, implications for developing effective policies and suggestions for a sustainable environment are substantial. (Yang et al., 2023) investigate the impact of China’s low-carbon city pilot (LCCP) intervention on the electric energy consumption intensity (EECI) of enterprises, aiming to address gaps in existing research. The LCCP initiative, targeting emissions reduction, has succeeded in major Chinese cities, but its specific influence on enterprises’ EECI has not been explored. Using a time-varying difference-in-difference (DID) model and matched data from 2007 to 2014, the findings reveal: (1) LCCP significantly reduces firms’ EECI by 3.14% with a 1% point increase. (2) Heterogeneity exists based on firm ownership, location, and input characteristics. State-owned, labor-driven firms in the West show insignificant effects, while foreign, private-owned, and capital-driven firms in non-resource-based cities experience substantial EECI reduction. (3) LCCP promotes long-term energy conservation attitudes and efficiency. The study concludes by discussing policy implications aligned with its findings.

This research paper differs from previous literature in that it addresses, in addition to the impact of population size, GDP growth, and FDI inflow on carbon emissions, it also includes various economic sectors such as agriculture, industry, and services. On the other hand, most of the previous literature was conducted on individual countries or a group of countries. However, this study was conducted on a group of low- and high-income countries as a whole, which makes its results more comprehensive and through which a joint strategy can be formulated to confront environmental challenges. In addition to the above, the study relied on machine learning algorithms, which are characterized by high accuracy compared to traditional econometric tools used in most previous literature.

3 Empirical framework

In this manuscript, we can determine the impact of population size, GDP growth, agricultural, industrial, and service sectors value-added, and FDI inflow on CO2 emissions in low- and high-income countries. The following equation explains the practical framework of the manuscript, which will be implemented in each of the mentioned countries:

$${\text{C}\text{O}}_{2}\hspace{0.17em}=\text{P}\text{n}\hspace{0.17em}+\hspace{0.17em}\text{G}\text{D}\text{P}\text{g}\hspace{0.17em}+\hspace{0.17em}\text{F}\text{D}\text{I}\hspace{0.17em}+\hspace{0.17em}\text{A}\text{G}\text{V}\hspace{0.17em}+\hspace{0.17em}\text{I}\text{N}\text{V}\hspace{0.17em}+\hspace{0.17em}\text{S}\text{R}\text{V}$$

Where;

CO2 = carbon dioxide emissions.

Pn = Population size.

GDPg = Gross Domestic Product growth.

FDI = foreign direct investment inflow.

AGV = agriculture value added.

SRV = services value added.

INV = industry value added.

4 Data and methodology

4.1 Data

This manuscript contains many variables mentioned in Sect. 3 and collected from the World Bank database. However, it should be noted that the period used in low-income countries differs from high-income countries. This is due to the abundance of available data. In the first, the period used is (1990–2020) and in the second, from (1997–2020). The following Table 1 shows the characterization and statistical description of the data above:

Table 1 Description and descriptive statistics of all indicators

4.2 Methodology

The ML techniques of GB and RF are employed in this study. Each model is a supervised machine-learning model, which means it takes in existing data, processes it, and then utilizes the results to make predictions about future data. All machine-learning techniques presented in this work use the Scikit-Learn Python module Abd El-Aal (2023).

4.2.1 The ML algorithms

4.2.1.1 The GB algorithm

Friedman (2001) created the gradient-boosting algorithms (GB), a type of ensemble ML. In the GB paradigm, multiple weak learners are combined into one robust one.

In the GB model, we only need to make one leaf to build our regression trees. Regression trees, a special case of decision trees, are constructed to estimate continuous real-valued functions rather than classify them. As the regression tree is constructed, the data is repeatedly partitioned into increasingly smaller sections. At the outset, we combine all of the available data. The data is split in two using each possible split on each predictor. Quantifying residual error with Friedman’s (2001) established Friedman MSE, the study finds that the predictor that splits the tree is the one that best divides the observations into two groups.

The GB method generates a replacement tree based on the flawed structure of the original tree. The GB technique gradually boosts the weight given to the new tree to prevent over fitting based on the learning rate. This process repeats itself until the target number of trees is attained or the model can no longer be improved.

The algorithm of the GB, which is based on Friedman (2001), includes the further stages for the input data, [(xi, yi)]ni=1, and a loss function that can be differentiated L ( yi, F(x).

Step 1

Set a fixed value as the model’s starting point.

$${F}_{0}\left(x\right)={\text{a}\text{r}\text{g}\text{m}\text{i}\text{n}}_{y}\sum _{i=1}^{n} L\left(\text{Y}, {\hat{\text{y}}}\right)$$

Y is a value as observed, and ŷ is a value as predicted. F0(x) is the average of the actual values.

Step 2

For m = 1 to M.

  • Calculate.

$${{\hat{\text{y}}}}_{im}=-{\left[\frac{\partial L\left(Y,F\left({x}_{i}\right)\right)}{\partial F\left({x}_{i}\right)}\right]}_{F\left(x\right)={F}_{m-1}\left(x\right)}\text{ for }\text{i}=1,\dots ,\text{n}$$
  • Adjust a tree for regression to the \({{\hat{\text{y}}}}_{im}\) values and generate terminal regions Rjm for j = 1 …Jm

  • For j = 1 …Jm, calculate.

$${{\hat{\text{y}}}}_{jm}={\text{a}\text{r}\text{g}\text{m}\text{i}\text{n}}_{\gamma }\sum _{{x}_{i}\in {R}_{ij}} L\left({Y}_{i},{F}_{m-1}\left({x}_{i}\right)+{\hat{\text{y}}}\right)$$
  • Modify.

$${F}_{m}\left(x\right)={F}_{m-1}\left(x\right)+\alpha \sum _{j=1}^{{J}_{m}} {\hat{\text{y}}}_{m}I\left(x\in {R}_{jm}\right)$$

where \(\alpha\) is the learning rate.

By adjusting the learning rate, v, the user can alter the behavior of the employed loss functions. This feature enhances the model’s flexibility while reducing the over fitting issue through the lessons learned during the slower iterations (Hastie et al., 2009).

Step 3: Output

$$\widehat{F}\left(\text{x}\right)={F}_{M}\left(x\right)$$

Once all M iterations have been completed, the \({F}_{M}\left(x\right)\) function will have been updated, and the final model,\(\widehat{F}\left(\text{x}\right)\), will approximate the connection between the independent and the dependent variables.

4.2.1.2 The random forest algorithm (RF)

The RF method, identified by Breiman (2001), is an additional ensemble technique identical to boosting models. According to Dietterich (2000), the RF is a highly significant and powerful ensemble algorithm for ML. Like the GB model, the RF technique uses regression trees. The RF model trains Each regression tree independently, averaging the resulting predictions.

The fundamental steps of the RF model are:

Step 1. For m = 1 to M:

  • To do this, take the training data and generate a bootstrapped sample set, Z, with a size of N.

  • Follow these methods for each terminal node in the tree to grow a random Tm forest for the bootstrapped data, with a minimum node size of nmin.

    • To do this, randomly choose x of the p variables.

    • The point will be divided among the x variables; select the best one.

    • Create a new node by dividing the existing one in half. The MSE is minimized by making the following calculations to determine how to divide the data:

$${F}_{0}\left(x\right)=\frac{1}{n}\sum _{i=1}^{n} {\left(\text{Y}-{\hat{\text{y}}}\right)}^{2}$$

Where Y is an actual value, and \({\hat{\text{y}}}\) is a predicted value.

As well as randomly choosing a set of variables to divide each node in the tree via bootstrapping, we use this additional measure to increase the uncertainty of our predictions. This randomized strategy greatly reduces interdependence between trees and improves over fitting robustness. Overfitting occurs when a fully developed tree is too similar to the ideal. The acquisition of new information may render a model with near-perfect tree fits useless for making accurate predictions. To avoid this issue, an RF model might Reduce some trees by eliminating some trees or nodes (AbdElminaam et al. (2023).

Step 2: Output of the forest’s collective trees,\({\left\{{T}_{m}\right\}}_{m=1}^{M}\):

$${\widehat{F}}_{rf}^{M}\left(x\right)=\frac{1}{M}\sum _{m=1}^{M} {T}_{m}\left(x\right).$$

The final output, \({\widehat{F}}_{rf}^{M}\left(x\right)\), is determined by averaging the results from each tree, is determined. The predictive performance of trees is stabilized, and the variance is reduced when several forecasts are averaged.

4.2.2 Cross-validation

The ML algorithms in this study use several hyperactive parameters. Hyperactive parameters are fine-tuned with the help of k-fold cross-validation, commonly employed in research like this one. The k-fold cross-validation divides the training set into k-equal halves to determine how well the algorithm fits the dataset. To account for the fact that data may change over time, K-fold cross-validation employs a training set and a test set with different start and end timings. For the forecasting model to be accurate, it should refrain from containing any knowledge about events after the events used to fit the model (Tashman, 2000). To Fit and train the model, we divide the training data into ten subgroups (k = 10), following the lead of prior works such as Molinaro et al. (2005).

While the technique has criticized cross-validation as unnecessary, given that the RF models use trees created by the bagging strategy, since the out-of-bag procedure closely resembles cross-validation, cross-validation may be redundant when working with an RF model. One of the main goals of our study is to compare and contrast the performance of GB and RF models. The RF model was subjected to the cross-validation technique for the most objective comparisons possible in this investigation. The out-of-sample data for the GB and RF models have been similar to guarantee a fair comparison (Probst et al., 2019).

The method cross-validation aims to select hyperactive parameters that minimize the mean squared errors in the ten test datasets. The cross-validation-determined hyperparameters will be applied to the test data set for predictive purposes. The use a grid in a search is used in this study to determine the best hyperparameter values. The GB and RF models consider all predictors, and the user can manipulate the trees’ depth by changing the number of splits. Cross-validation aims to identify the hyperparameter settings that yield the lowest MSE (Probst et al., 2019).

5 Empirical results

5.1 Model evaluation

Metrics of accuracy are crucial in determining the efficiency of ML algorithms. Algorithms, particularly in classification tasks. These metrics offer insights into how well a model is performing and provide a comprehensive understanding of its strengths and weaknesses. Let us introduce these accuracy metrics: F1 score, precision, recall, and AUC-ROC (Powers, 2020).

  • Precision: Precision, or positive predictive value, is the ratio of true positives to the total number of expected positives (true positives plus false positives). It measures how well the model does at making correct predictions. Formula:

Precision = TP / (TP + FP).

  • Recall: The recall metric is the proportion of positive events (true positives) that were correctly predicted (true positives) out of all positive instances (true positives Plus false negatives). Recall is especially important when capturing as many positive instances as possible. Formula:

Recall = TP / (TP + FN).

  • F1 Score: The F1 score is a harmonic mean of the accuracy and memory subscores. It is a good indicator of false positives and negatives because it balances precision and memory. When there is an inequity in where the classes are split up, the F1 score shines. Formula:

F1 Score = 2 * (Precision * Recall) / (Precision + Recall).

  • AUC-ROC: The Area Under the ROC Curve (AUC-ROC) depicts the model’s performance at various categorization criteria. The Area quantifies the area under this curve Under the ROC Curve (AUC-ROC) statistic. It is especially helpful for comparing the performance of a model with different threshold values, as it provides an overall measure of the model’s ability to discriminate between classes.

AUC-ROC might be negative or positive. If your model has an AUC-ROC value below 0.5, you might as well guess, whereas a number closer to 1 suggests excellent discriminatory power.

  • These metrics are crucial for assessing different aspects of a model’s performance. The metric to prioritize depends on the specific problem you are solving and the nature of the data. For instance, recall might be more important in medical diagnoses to avoid missing positive cases (e.g., disease detection). Precision might be the key focus when false positives are costly. The F1 score gives us a balance between the two, and AUC-ROC helps you understand the overall performance across different decision thresholds.

5.2 Algorithms’ accuracy and performance

Cross-validation Python-processed accuracy scores for the employed algorithms are displayed in Table 2.

Table 2 The accuracy of an ML method using cross-validation to forecast CO2

The table shows that the GB algorithm is more accurate than the RF algorithms regarding CO2 prediction and its determinants in low-income countries, where the AUC of the GB is 78% when the RF is 74%. In contrast, in high-income countries, the AUC of the RF algorithms is greater than the GB, arriving at 95%, where the second is 50%. We depend on the GB algorithms feature selection for low-income countries and the RF feature selection in high-income countries shown in the next section, 5.3.

5.3 GB and RF algorithms feature importance

Feature importance is the key to unlocking the black box of ML models. It evaluates and quantifies the influence each input variable or feature has on the model’s predictions or classifications. By assigning importance scores to these features, data scientists gain critical insights into the inner workings of their models. This knowledge empowers them to refine their models, enhance performance, and make more accurate and interpretable decisions.

In this exploration of feature importance, we will delve into its significance, methodologies for assessment and practical applications across various domains. Join us on this journey to unveil the essence of data, where feature importance is a guiding light in the intricate world of data-driven decision-making. Table 3 shows the GB and RF algorithms’ Feature importance.

From Table 3, we find that according to the GB algorithms feature selection, the most influence on the CO2emessions in low-income countries is the agriculture sector (49.9%), the industry sector (17.%), the services sector (10.4%), population size (9.8%), GDP growth (7%), and the FDI inflow (5.3%). Moreover, according to the RF algorithm feature selection, high-income countries with the most effect on CO2 emissions are the services sector (30.8%), agriculture sector (27.1%), industry sector (21.5%), population size (19%), FDI inflow (1.2%), and GDP growth (0.4%), This can be explained through an RF tree Fig. 1, which shows the relative importance of the service, agricultural, and industrial sectors in their impact on CO2 compared to other variables in high-income countries.

Table 3 The GB and RF algorithms Feature importance indicators
Fig. 1
figure 1

Source: Compiled by the author

RF tree for High-income countries.

We note that the contribution of FDI inflow to low-income countries to CO2 emissions is much greater than that of high-income countries, which indicates that high-income countries encourage environmentally friendly investments. We also note that the contribution of GDP growth in high-income countries to CO2 emissions is very small compared to low-income countries, which indicates the former’s reliance on environmentally friendly sectors.

It is time to evaluate whether something has a positive or negative impact. To do this, use Table 4 to find the relationship between the dependent and independent variables.

Table 4 Shows the correlation between the CO2 emissions and the independent variables

We find from the table that CO2 emissions in low-income countries are positively proportional to the growth of the industrial value added, FDI inflow, GDP growth, and the increase in population size, and, in contrast, are negative to both the growth of the agricultural and service sectors. In the case of an approach, we find that CO2 emissions in high-income countries are positively proportional to the growth of the industrial and agricultural sectors, the growth of the GDP, the growth of FDI, and on the contrary, it is negatively proportional to the growth of the service sector and population size. The sieve diagram in Figs. 2 and 3 shows the strength of the relationship between CO2 and the study variables in low-income and high-income countries.

Fig. 2
figure 2

Source: Compiled by the author

Sieve diagram for Low-income countries.

Fig. 3
figure 3

Source: Compiled by the author

Sieve diagram for high-income countries.

From Figs. 2 and 3, we find that The greater the shade of the blue squares, the stronger the relationship between the two variables. We find a strong relationship between CO2, AGV, and INV in low-income countries. A strong relationship exists between CO2 and INV, SRV, AGV, and Pn in high-income countries.

The research results can be formulated in the following steps:

  • The GB algorithm is more accurate than the RF algorithms regarding CO2 prediction and its determinants in low-income countries, where the AUC of the GB is 78% when the RF is 74%. In contrast, in high-income countries, the AUC of the RF algorithms is greater than the GB, arriving at 95%, where the second is 50%.

  • The CO2 emissions in low-income countries are Positively proportional to the growth of the industrial value added, FDI inflow, GDP growth, and the increase in population size, and, in contrast, are negative to both the growth of the agricultural and service sectors.

  • The CO2 emissions in high-income countries are positively proportional to the growth of the industrial and agricultural sectors, the growth of the GDP, and the growth of FDI, and, on the contrary, it is negatively proportional to the growth of the service sector and population size.

  • A strong relationship between CO2, AGV, and INV in low-income countries.

  • A strong relationship exists between CO2 and INV, SRV, AGV, and Pn in high-income countries.

6 Conclusion and policy recommendation

6.1 Conclusion

This study uses machine-learning methods to examine the connection between CO2 emissions and macroeconomic indices in low- and high-income countries and finds numerous interesting results with important consequences. The findings prove that ML algorithms are capable of accurately modeling and forecasting CO2 emissions from a variety of macroeconomic parameters. This demonstrates the potential importance of high-level computational tools in comprehending and resolving environmental concerns, especially in climate change adaptation.

The study unveils significant variations in the relationship between CO2 emissions and macroeconomic indicators between low and high-income countries. In low-income countries, CO2 emissions positively correlate with industrial sector growth, FDI inflow, GDP growth, and population size. Conversely, they negatively affect both agricultural and service sectors’ growth. On the other hand, in high-income countries, CO2 emissions are positively linked to industrial and agricultural sector growth, GDP growth, and FDI inflow. Conversely, they are negatively related to the service sector and population size. This divergence underscores the necessity of tailoring climate policies and strategies to different countries’ distinct economic, social, and environmental contexts. Policymakers should adapt their approaches to address each income group’s challenges and opportunities. In conclusion, the study’s findings emphasize the critical role of ML in understanding the intricate interplay between CO2 emissions and macroeconomics. Moreover, they stress the necessity for tailored and comprehensive climate policies that recognize the divergence between low and high-income countries, final goal of fostering sustainable development while reducing climate change’s negative impacts. This research contributes valuable insights that can inform future policymaking and international efforts to combat climate change on a global scale.

6.2 Policy implications and recommendations

Based on the research findings, the following recommendations are formulated:

  • For Low-Income Countries: Policymakers should design targeted policies for low-income countries, where the agriculture sector significantly influences carbon dioxide emissions. These policies should promote sustainable practices and integrate environmentally friendly technologies within the agricultural domain.

  • For High-Income Countries: A balanced approach is imperative in high-income countries, considering the services sector as a major contributor to emissions. Policymakers should adopt comprehensive strategies addressing the services and agriculture sectors to reduce their environmental footprint effectively.

  • Population-Centric Interventions: Recognizing the substantial impact of population size on emissions in both low- and high-income countries, interventions should prioritize population-centric strategies. This includes implementing sustainable urban planning and resource management practices to curb emissions.

  • Incentivizing Green Investments: Given the notable role of foreign direct investment in emissions, especially in low-income countries, governments should actively incentivize green investments and adopt eco-friendly technologies to ensure sustainable development.

  • Integrated Approaches for Low-Income Countries: For low-income countries, an integrated approach is crucial, considering the interconnected effects of agriculture, industry, and services sectors on emissions. Policymakers should focus on developing holistic strategies rather than relying solely on sector-specific measures.

  • Adaptive Strategies for High-Income Countries: High-income countries should adopt adaptive strategies that account for the unique outcomes of the random forest algorithm’s feature selection. Policymakers must tailor interventions based on the specific drivers identified for their context, ensuring effective and context-specific environmental policies.

  • Monitoring and Mitigating Industrial Growth: Given the positive correlation between industrial sector growth and emissions in both income groups, vigilant monitoring and stringent measures should be implemented to mitigate the environmental impact of industrial expansion.

  • Encouraging Sustainable Practices in the Service Sector: Acknowledging the negative correlation between emissions and service sector growth in both income groups, policies should actively encourage sustainable practices within the service sector. This involves promoting a shift towards cleaner and greener services to contribute to overall emissions reduction.

6.3 Limitations

Regardless of the previous recommendations, this study has some limitations. First, foreign direct investment is important for both low-income and high-income countries. If these countries implement policies to preserve the environment, will they submit to them even if they reduce its growth when needed? Secondly, we note that the agricultural sector in low-income countries is the largest contributor to their carbon emissions. What policies will they follow to preserve its growth? It is their source of food and the funder of their traditional industries. Thirdly, The current study paved the way for researchers to view carbon emissions as a goal that both developed and developing countries must eliminate. Instead of blaming one group of countries over another, joint policies must be formulated to confront environmental challenges. The study also clarified the necessity of taking into account the economic sectors and the relative importance of each sector, which is supposed to receive attention from researchers in the coming period, as well as the necessity of tracking foreign direct investment, shedding light on the areas for which it is intended, and directing it to support the green economy worldwide.