Introduction

The third industrial revolution has severely impinged on the Earth’s original ecology by bringing about massive consumption of fossil energy and triggering successive problems such as the greenhouse effect and global warming (Yang et al. 2022; Chai et al. 2022; Cheng et al. 2022). IPCC reports revealing that the global average temperatures will be 1.5 to 4.8 degrees warmer than pre-industrial revolution levels by 2100 and that the impact of climate change on humankind’s socioeconomic situation is expected to worsen if existing climate trends are not controlled (Mondal et al. 2022). The 2021 IPCC Sixth Assessment Report indicates that extreme weather is increasing globally and that global warming is irreversible for hundreds, if not thousands, of years (Pielke et al. 2022). Climate risk is a spatial and temporal externality that warrants joint efforts by the world’s economies (Wang et al. 2020; Razzaq et al. 2022). The Kyoto Protocol, the first document in human history to limit greenhouse gas emissions through legislation, officially entered into force in 2005, kicking off a global focus on carbon reduction (Freedman and Jaggi 2011). The greenhouse effect issue is gaining attention from international parties as more than 170 worldwide signed the Paris Agreement in 2016 to jointly address global climate change after 2020 (Yan et al. 2021). As one of the world’s highest energy-consuming and carbon-emitting economies, China has consistently adhered to resource conservation and environmental protection to drive its low-carbon economic development (Irfan et al. 2022; Hao et al. 2021a, b). In 2015, the Chinese government signed the document “Intensifying Actions to Address Climate Change” to the United Nations, stating that by 2030, the carbon emission intensity should be reduced by 60–65%. Moreover, President Xi Jinping gave an ambitious statement that “carbon emissions strive to peak by 2030, and work towards realizing carbon neutrality by 2060” (Tang et al. 2022; Sun et al. 2022). Moreover, to meet carbon reduction goals, a series of measures have been taken in China, including promoting electric energy and clean energy to replace fossil energy, establishing carbon trading pilots, and using market mechanisms and economic instruments. Although China’s carbon reduction governance has yielded specific positive achievements, carbon emissions are still increasing, and carbon reduction efforts are still under tremendous pressure (Razzaq et al. 2021; Li et al. 2021a, b; Shi and Xu 2022).

Relevant statistics demonstrate that China is the most diverse in the world regarding industrial categories (Yang et al. 2021; Zhao et al. 2022; Guo 2022). According to the IPCC report, the energy sector accounted for 35% of GHG (greenhouse gas) emissions in 2010, industry for 21%, transport for 14%, and construction for 6.4%, with industry accounting for 31% of GHG emissions and construction for 19% when indirect emissions are taken into account (Pielke et al. 2022). The report, therefore, states that industry and construction are crucial to achieving the target of a 45% reduction in carbon emissions by 2030 compared to 2010 (Wu et al. 2020; Shi et al. 2022a, b). As a large industrial country, the scale of the industry has expanded rapidly with economic development, with the value added of the sector increasing from 12 billion yuan in 1952 to 31.3 trillion yuan in 2020. However, because of the characteristics of late industrial development and irrational internal structure, enterprises represented by high energy consumption, high emission, and low mainly value-added constitute the subject of the current industrial system (Shi et al. 2022a, b). Since the industry is the largest energy-consuming and carbon-emitting sector among carbon emission sources, industrial carbon peaking and reduction have emerged as the cornerstone (Sun et al. 2021; Su et al. 2021). Therefore, it is essential to improve the measurement and prediction of industrial carbon emissions, which helps to promote ecological civilization roads and is a critical guide to further enhance carbon emission reduction policies and optimize the environmental protection policy system. So, what is the industry’s path to achieving carbon peaking and reduction? What are the factors that affect the industrial sector’s carbon peaking? The answers to the above questions have significant and positive implications for carbon peaking and carbon neutrality goals.

The main contributions of this paper are as follows. On the one hand, we use machine learning to filter the factors affecting industrial carbon emissions by random forest algorithm. The factors affecting industrial carbon emissions are complicated, and the nexus between each influencing factor and industrial carbon emissions is not linear. In previous research, the influencing factors of carbon emission are determined mainly by simple qualitative division, and the prediction models applied are mostly simple linear models. For example, logistic, ARIMA (autoregressive integrated moving average model), and ridge regression models have been used to predict industrial carbon emissions. However, it is not easy to find the relationship between the influencing factors and industrial carbon emissions, and these models cannot accurately predict the future carbon emissions trend. Random forest algorithm introduces randomness, which is not prone to overfitting, and strong anti-noise capability capacity, and has high accuracy in screening out the influencing factors of industrial carbon emissions. On the other hand, the BP neural network and support vector machine are repeatedly trained and learned to reflect the pattern between the influencing factors and industrial carbon emissions, which can predict the carbon emission trend more accurately, aiming to provide direction for energy saving and emission reduction efforts.

Literature review

As it stands, there are two main areas of interest in the study of industrial carbon emissions: the factors that influence industrial carbon emissions, forecasting methods development, and the examination of peak scenario outcomes.

The scholarly consensus is that carbon emissions have a substantial relationship with economic growth and energy use. According to the environmental Kuznets curve (EKC) theory, environmental pollution and economic development are shaped like an inverted U. Economic development is usually accompanied by environmental pollution problems in the early stages. Moreover, when economic development reaches a certain level, an environmentally sustainable economic development state will be formed due to the factors of low-carbon technology and environmental policies (Kaika and Zervas 2013). Meng et al. (2022) factor bias in technological progress is determined based on data from 34 industries in China from 2000 to 2015. They found that China first tends to consume energy to pursue capital between capital input and energy input, while it tends to save energy after the Eleventh Five-Year Plan. Moreover, environmental change is closely related to the level of economic development. At present, the environmental quality of the eastern provinces of China is higher than that of the central and western provinces, and the gap between regions is increasing (Guo 2022). At the same time, the improvement of digital technology also provides more powerful technical support for academic research, which can effectively integrate all kinds of information and use different data (Xiao and Liu 2022). As the pillar industry of a country (region), the carbon dioxide emission of enterprises has a crucial influence on regional carbon emission. Scholars’ studies on carbon emissions’ influencing factors show that factors such as industrial scale, industrial structure, technology, and environmental regulation significantly affect carbon emissions. Huang et al. (2019) argued that carbon emissions are significantly positively associated with the industrial scale of the ferrous metal industry.

In contrast, Appiah et al. (2019) argued that energy intensity is the main factor that affects industrial carbon emissions. Zhu et al. (2012), on the other hand, verified the mechanism of its influence on industrial carbon emissions from the perspective of industrial structure. Du et al. (2019) further argued the inhibitory effect of green technology on industrial carbon emissions, and carbon capture, utilization, and storage (CUUS) technology, as an important way of carbon sink, has also received more and more attention from industrial enterprises in recent years (Gür 2022). It has also become a consensus among scholars that the mitigation of industrial carbon emissions can be achieved by both command-based environmental regulations, mainly by administrative orders, and market-based environmental regulations, mainly by building carbon taxes and carbon markets (Wesseh et al. 2017; Zhang et al. 2020; Yang et al. 2020; Y. Wang et al. 2018).

When investigating carbon peak prediction, the academic community has mainly used parties with the LMDI model, Kaya constant equation model, IPAT model, STIRPAT model, and machine learning model. Ehrlich and Holdren (1971) first proposed the LMDI model to differentiate each variable separately while keeping other factor variables constant to determine the degree of influence of each factor change on the target quantity. Chen et al. (2020) and other scholars used this model to predict the peak CO2 emissions of industrial and agricultural sectors in China. The IPAT model considers the growth in population size, the increase in material living standards, and the increase in resource exploitation as the root causes of the deteriorating resource and environmental problems. Wang et al. (2021a, 2021b) used this model to predict the peak path of carbon emissions in China’s Bohai Sea Rim region. STIRPAT is an enhanced version of the IPAT model, which analyzes the connection between the human population, material wealth, technological advancement, and ecological conditions. Dalton et al. (2008) used the STIRPAT model and added scenario analysis to predict carbon emissions in the USA. The essence of the Kaya model is to explore the drivers of the amount of change, i.e., the percentage contribution of each driver of carbon emission change, which has the advantages of simple mathematical form, decomposition without residuals, and reliable explanatory power for the drivers of carbon emission change (Lu and Jiahua 2013). With the development of information technology applications in environmental governance, the role of the Internet in improving energy efficiency and reducing energy-saving potential (ESP) has attracted more attention (Ren et al. 2022). Wu et al. (2022) measured the carbon emissions peak using the Kaya constant equation approach. A machine learning model is a file that can recognize specific patterns after training. The model can be trained with a set of data, providing it with an algorithm that the model uses to learn these data and make inferences. After introducing the model, it can be used to make inferences based on previously unseen data and to make predictions about those data (Wang et al. 2009). Some scholars believe that machine learning models have better prediction accuracy than multiple linear regression models and have applications in analyzing industrial carbon emissions and their prediction of it in different regions and industries (Leerbeck et al. 2020; Ağbulut and Consumption 2022). Chen et al. (2022) present two approaches to forecast parameters in the SABR model. The first approach is the vector auto-regressive moving-average model (VARMA) for the time series of the in-sample calibrated parameters, and the second is based on machine learning techniques called epsilon-support vector regression. In addition, some other scholars have used CGE, logistic, and system dynamics theory models to predict the peak carbon emissions of different regions and industries, and their different scenarios have been analyzed (Meng and Niu 2011; Mirzaei and Bekri 2017; Beuuséjour et al. 1995). Cheng et al. (2022) evaluate that the carbon trading pilot policy can promote the upgrading of industrial structure in the region and ultimately achieve the reduction of industrial carbon emissions.

The accuracy of scenario analysis methods relies heavily on human subjective judgment. A large number of studies have been conducted in the literature on industrial carbon emissions and carbon peaking, and relatively rich research results have been obtained, but there are still many shortcomings. On the one hand, the existing carbon emission measurement methods still need to be unified, and the mainstream prediction methods have high data requirements and are difficult to obtain. On the other hand, the existing carbon emission prediction models are either traditional regression-based models with low prediction accuracy or models with high requirements for a priori data. The influencing factors of industrial carbon emissions still need to be clarified, and the sample size is small. Therefore, this paper applies the energy consumption method to measure industrial carbon emissions from direct and indirect perspectives, screens out the factors that have a large impact on carbon emissions through the random forest method, and then predicts carbon emissions through a machine learning model. Due to the integrated algorithm, the random forest algorithm itself has better accuracy than most individual algorithms, so it is highly accurate, performs well on the test set, and does not easily fall into overfitting. For industrial research, the random forest has a certain anti-noise capability, which has some advantages over other algorithms. In general, there is relatively little literature on the use of machine learning to study carbon peaking, and there is even less literature related to the study of industrial carbon peaking. Therefore, this paper can provide some reference for the achievement of China’s industrial carbon peaking goal by using machine learning methods and filling some research gaps.

Study design

Research strategies

Machine learning is an algorithm based on historical data and is trained iteratively to find patterns in the data. Machine learning is widely used for prediction and classification problems, where input variables are linked to output variables by specific functions. Machine learning aims to learn tasks by training the data and making them well-suited to predicting the data (Mason et al. 2018). In this study, the soft Matlab 2020 is used to implement BP neural network and support vector machine.

BP neural network prediction

Neural networks

A neural network is a parallel distributed processor built on individual neurons, each of which is a relatively independent learning unit. A neural network is similar to the human brain in that it can be trained repeatedly to gain a particular experience and to preserve the learned knowledge. Neural networks process data based on neurons and the principle of neural networks can be described as follows.

$${u}_{k}={\sum\nolimits }_{j=1}^{m}{w}_{kj}{x}_{j}$$
(1)
$${y}_{k}=\varphi \left({u}_{k}+{b}_{k}\right)$$
(2)

where \({x}_{j}\) denotes the input information, \({u}_{k}\) denotes the neuron output information, \({w}_{kj}\) denotes the weights of individual neurons, and \({b}_{k}\) is the offset term. \({b}_{k}\) can be adjusted to the neuron output data by taking the offset term into account.

Then, the resultant data can be expressed as follows:

$${v}_{k}={u}_{k}+{b}_{k}$$
(3)

The above three equations can be combined to obtain

$${v}_{k}={\sum\nolimits }_{j=0}^{m}{w}_{kj}{x}_{j}$$
(4)
$${y}_{k}=\varphi \left({v}_{k}\right)$$
(5)

BP algorithm

The error backpropagation algorithm is the basis of the BP neural network, which is a multi-layer feed-forward neural network based on this algorithm (Muruganandam et al. 2023). BP neural network is trained on the input data set, and the output results will be compared with the actual results, the difference between the output error (Kosarac et al. 2022). The backpropagation of the output error is performed, and the weights and thresholds of each neuron are corrected based on the output error to obtain the optimal result (Bai et al. 2023). The gradient descent method is the core of the error backpropagation algorithm. Following Kusumadewi et al. (2023) using the gradient descent method, the BP neural network can predict the mean square error to achieve the minimum. Three layers make up the BP neural network model: input, output, and implicit. An excitation function is used to reflect the mapping relationship between the input layer and the output. The model is trained with an empirical risk minimization strategy, and an error backpropagation algorithm continuously corrects the neuron weights and thresholds until the error value satisfies the desired error (Sonia et al. 2023). The training process of the BP neural network is as follows. BP neural network is an efficient and widely used intelligent algorithm in the field of classification and prediction, which is suitable for solving uncertainty problems with multiple influencing factors and constraints (Liang et al. 2023). The model features distributed storage, self-learning and adaptation, large-scale parallel processing, strong learning ability, and generalization ability. At present, the neural network has become a popular prediction method in many disciplines.

Let the input data set be \(N=\left({X}_{1},{X}_{2}\dots {X}_{m}\right)\). Mapping this sequence from the input layer to the implicit layer, the sequence becomes \(N=\left({X}_{1},{X}_{2}\dots {X}_{m}\right)\). After the computation, this sequence becomes \(\theta \left(w\right)\). Then, \(\theta \left(w\right)\) is mapped from the implicit layer to the output layer. Finally, the computation is completed in the output layer and the output value is returned.

The output value is used to calculate the error, and the neuron weights and thresholds are continuously corrected by back propagation of the output error. The expressions of the weights are as follows.

Implicit layer:

$$\Delta {w}_{ij}=-\gamma \frac{\partial E}{\partial {w}_{ij}}=-\gamma \frac{\partial E}{\partial ne{t}_{i}}\frac{\partial ne{t}_{i}}{\partial {w}_{ij}}=-\gamma \frac{\partial E}{\partial {y}_{i}}\frac{\partial {y}_{i}}{\partial ne{t}_{i}}\frac{\partial ne{t}_{i}}{\partial {w}_{ij}}$$
(6)

Output layer:

$$\Delta {w}_{ki}=-\gamma \frac{\partial E}{\partial {w}_{ki}}=-\gamma \frac{\partial E}{\partial ne{t}_{k}}\frac{\partial ne{t}_{k}}{\partial {w}_{ki}}=-\gamma \frac{\partial E}{\partial {y}_{k}}\frac{\partial {y}_{k}}{\partial ne{t}_{k}}\frac{\partial ne{t}_{k}}{\partial {w}_{ki}}$$
(7)

Organized by:

$$\Delta {w}_{ki}=-\gamma {\sum\nolimits }_{p=1}^{p}{\sum\nolimits }_{n=1}^{N}\left({T}_{n}^{p}-{O}_{n}^{p}\right)\cdot {\varphi }^{^{\prime}}\left(\text{ net} \, _{n}\right)\cdot {y}_{i}$$
(8)
$$\Delta {w}_{ij}=-\gamma {\sum\nolimits }_{p=1}^{p}{\sum\nolimits }_{n=1}^{N}\left({{T}^{p}}_{n}-{{O}^{p}}_{n}\right)\cdot {\psi }^{^{\prime}}\left(\text{ net} \, _{n}\right)\cdot {w}_{ji}\cdot {\varphi }^{^{\prime}}\left(\text{ net} \, _{i}\right)\cdot {x}_{j}$$
(9)

\(k\) is the output layer node. \(j\) is the input layer node where \(i\) is the implicit layer node. \(c\) is the weight between the implicit layer node \(i\) and the input layer \(j (i=\mathrm{1,2},3,...p, j=\mathrm{1,2},3,...n)\). \({w}_{ki}\) is the weight between node implicit layer \(i\) to node output layer \(k (i=\mathrm{1,2},3,...p, k=\mathrm{1,2},3,...n)\). \(x\) is the input data value of node \(i\) in the input layer. \(\varphi\) is the excitation function of the implicit layer. \(\psi\) is the excitation function of the output layer, and \({O}^{k}\) is the output value of node \(k\) in the output layer.

Finally, the mean square error of the model output is calculated as follows.

$${E}_{k}=\frac{1}{2}{\sum\nolimits }_{j=1}^{n}{\left({Y}_{j}^{n}-{y}_{j}^{n}\right)}^{2}$$
(10)

The weights and thresholds are corrected according to the output error until the error meets the requirements.

Support vector machine prediction

Kernel function

The kernel function has to appear before the support vector machine, and then, scholars widely use its theory to support vector machines. The principle of the support vector machine is to map the data from low-dimensional to high-dimensional. If the number of dimensions of the data is large, the conversion from low to high dimensions may result in a “dimensional disaster.” This problem can be mitigated when using kernel functions. If there exists a function \(K\left(x,{x}_{i}\right)\) such that all the \(\left\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\cdots \left({x}_{n},{y}_{n}\right)\right\},x,y\in R\) s meet:

$$K\left(x,{x}_{i}\right)=\phi \left(x\right)\cdot \phi \left({x}_{i}\right)$$
(11)

If the conditions of Mercer’s theorem are met, the function is a kernel function. There are four common types of kernel functions as follows.

Linear kernel function \((linear)\)

$$K\left(x,{x}_{i}\right)=x\cdot {x}_{i}$$
(12)

Linear kernel functions have fewer parameters and are often used to solve linearly differentiable problems.

Polynomial kernel function \((poly)\)

$$\begin{array}{cc}K\left(x,{x}_{i}\right)={\left[\left(x\cdot {x}_{i}\right)+1\right]}^{q}& (\mathrm{q}=\mathrm{1,2},3\dots )\end{array}$$
(13)

Polynomial kernel functions are widely used in nonlinear problems, which not only contain more parameters, but also the parameters have more influence on the model. And this type of kernel function is less suitable for large data sets.

Radial basis kernel functions

$$K\left(x,{x}_{i}\right)=\mathrm{exp}\left[\frac{{\Vert x-{x}_{i}\Vert }^{2}}{2{\sigma }^{2}}\right]$$
(14)

Gaussian kernel functions, involving only one parameter, are often used to study nonlinear problems, can map data to infinite dimensions, and are computationally slow. Linear kernel functions are a case of radial basis kernel functions.

Sigmoid kernel function

$$K\left(x,{x}_{i}\right)=\mathrm{tan}h\left(\gamma \left(x\cdot {x}_{i}\right)+c\right)$$
(15)

Since the values of the parameters \(\gamma\) and \(c\) do not always meet Mercer’s theorem, such kernel functions are less frequently used.

Support vector machine

There exists a hyperplane in the high-dimensional space such that the plane becomes the decision boundary when all sample points are closest to this plane. Suppose there are \(n\) samples and the original input data is \(\left\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\cdots \left({x}_{n},{y}_{n}\right)\right\},x,y\in R\). The expression of linear regression in high-dimensional space is

$$f\left(x\right)=w\cdot \phi \left(x\right)+b$$
(16)

\(W\) denotes the weight and \(b\) denotes the bias term.

Given that the error is \(e\), the model prediction is \(f\left(\overrightarrow{x}\right)\) and the actual value is \({y}_{i}\). When \(\mathit{\left(\overrightarrow x\right)}-y_{\mathit i}\leqslant e\), no loss is calculated. When \(f\left(\overrightarrow{x}\right)>e\) and \({y}_{i}>e\).

$${min}_{\overrightarrow{w},b}\left(\frac{1}{2}\| \overrightarrow{w}{\| }^{2}+C{\sum\nolimits }_{i=1}^{N}{L}_{\varepsilon }\left(f\left(\overrightarrow{x}\right)-{y}_{i}\right)\right)$$
(17)

where \(C>0\) is the penalty parameter. \({L}_{\varepsilon }\) denotes the loss function:

$${L}_{\varepsilon }\left(Z\right)=\left\{\begin{array}{c}0,\hspace{0.33em}\hspace{0.33em}\left|Z\right|\le \varepsilon \\ \left|Z\right|-\delta , other\end{array}\right.$$
(18)

For the optimization problem, introducing slack variables \({\zeta }_{i}\) and \({\widehat{\zeta }}_{l}\), the

$$\left\{\begin{array}{c}\mathit{min}\frac{1}{2}\| w{\| }^{2}+{\sum }_{i=1}^{m}\varepsilon \left(f\left({x}_{i}-{y}_{i}\right)\right)\\ {y}_{i}-\left(w\cdot \varphi \left({x}_{i}\right)+b\right)\le \varepsilon +\xi \\ \left(w\cdot \varphi \left({x}_{i}\right)+b\right)-{y}_{i}\le \varepsilon +\xi \\ \xi ,{\xi }^{n}\ge 0\end{array}\right.$$
(19)

The Lagrange function is introduced to solve for the above function.

$$L\left(w,\xi ,{\xi }^{n}\right)=\frac{1}{2}\| w{\| }^{2}+C{\sum }_{i=1}^{M}\left({\xi }_{i}+{\xi }^{n}\right)-{\sum }_{i=1}^{M}{\alpha }_{i}\left({\xi }_{i}+\varepsilon +{y}_{i}-\langle w,{x}_{i}\rangle -b\right)$$
(20)
$$-{\sum }_{i=1}^{M}\alpha \left({\xi }_{i}^{n}+\varepsilon +{y}_{i}-\langle w,{x}_{i}\rangle -b\right)-{\sum }_{i=1}^{M}\left({\eta }_{i}{\xi }_{i}+{\eta }_{i}^{n}{\xi }_{i}^{n}\right)$$
(21)

where \(\langle w,{x}_{i}\rangle\) denotes the inner product. \({\alpha }_{i},{\xi }_{i}^{n},{\mathrm{and }\eta }_{i}\) are the Lagrangian multiplier. By the KKT condition, this study knows that:

$$\left\{\begin{array}{c}{\alpha }_{i}\left(f\left({\overrightarrow{x}}_{i}\right)-{y}_{i}-\varepsilon -{\xi }_{i}\right)=0\\ {\widehat{a}}_{l}\left(f\left({\overrightarrow{x}}_{i}\right)-{y}_{i}-\varepsilon -{\xi }_{i}\right)=0\\ {\alpha }_{i}{\widehat{\alpha }}_{i}=0,\xi ,\widehat{\xi }n=0\\ \left(C-{\alpha }_{i}\right){\xi }_{i}=0,\left(C-{\alpha }_{i}\right){\widehat{\xi }}_{i}=0\end{array}\right.$$
(22)

Substituting the above equation into the kernel function:

$$f\left(\overrightarrow{x}\right)={\sum }_{i=1}^{n}\left({\widehat{\alpha }}_{i}-{\alpha }_{i}\right)k\left(\overrightarrow{{x}_{l}},\overrightarrow{x}\right)+b$$
(23)

where \(k\left({\overrightarrow{x}}_{1},\overrightarrow{x}\right)=\phi {\left(\overrightarrow{{x}_{l}}\right)}^{T}\phi \left(\overrightarrow{x}\right)\), the kernel function.

Variable selection

Industrial carbon emission measurement

Direct emissions, or those that come directly from factories, are one kind of indirect emission. Carbon emissions from fossil energy sources such as crude oil, coke, crude oil, gasoline, kerosene, and diesel are the primary contributors to direct emissions. Carbon emissions from the use of fossil fuels to provide power and heat for factories are a major source of indirect emissions (Hao et al. 2021a, b). This is the model used to quantify carbon emissions from industry.

$$E={E}_{\text{dir }}+{E}_{\text{ind }}=\frac{44}{12}\sum {C}_{i}\times {\alpha }_{i}+\frac{44}{12}\sum {M}_{j}\times {\beta }_{j}$$
(24)

In the above equation, E denotes total CO2 emissions; \({E}_{\text{dir}}\) denotes direct emissions; \({E}_{\text{ind}}\) is indirect emissions; C and M denote direct and indirect energy consumptions, respectively; and α and β are the carbon emission coefficients of that energy source. Considering the industrial consumption of energy and the availability of data, the industrial consumption of raw coal, coke, crude oil, gasoline, kerosene, diesel, fuel oil, liquefied petroleum gas, refinery dry gas, and natural gas is used to represent the industrial direct energy consumption, and the industrial heat and electricity consumption are used to measure the industrial indirect energy consumption. The specific carbon emission factors for each energy source are listed in the following (Table 1).

Table 1 Carbon emission factors for various types of energy

Table 1 indicates that the unit of average low level heat content is kJ/kg and the unit of average low-level heat content of natural gas is kJ/m3; the unit of discount standard coal factor is kgce/kg, the unit of natural gas discount standard coal factor is kgce/m3, the unit of heat discount standard coal factor is kgce/MJ, and the unit of electricity discount standard coal factor is kgce/kW-h. Unit calorific value of carbon content unit is tons of carbon/TJ. The unit of carbon emission factor is kg-CO2/kg, the unit of natural gas carbon emission factor is kg-CO2/m3, and the unit of thermal carbon emission factor is tCO2/GJ. As the form of power generation varies from region to region, the carbon emission factors vary, and the carbon emission factors for electricity in different regions are shown in the following.

Table 2 reveals that the carbon emission factors of electricity in each region are in units of kg-CO2/kW-h. The data in Table 1 and Table 2 are from the National Development and Reform Commission, General Rules for Calculating Comprehensive Energy Consumption (GB/T 2589-2020), and Guidelines for Preparing Provincial Greenhouse Gas Inventories (NDRC Climate [2011] No. 1041), and the carbon emission coefficients in Table 1 are calculated as follows.

Table 2 Carbon emission factors of different regions
$${e}_{i}={10}^{-9}\cdot {g}_{i}\times {u}_{i}\times {o}_{i}\times {10}^{3}\times \frac{44}{12}$$
(25)

In the above equation, \(e\) denotes carbon emission factor, \(g\) denotes average low-level heat generation, \(u\) denote carbon content per unit calorific value, and \(o\) denote carbon oxidation rate. According to the carbon emission calculation model in the above section and the carbon emission coefficients in Table 3, the carbon emission of China’s industry for the period from 1991 to 2020 is measured, and the results are as follows:

Table 3 Industrial carbon emissions by year

Figure 1 illustrates that China’s industrial carbon emission trends are divided into four main stages. From 1991 to 2000, industrial carbon emissions grew slowly and leveled off from 2001 to 2011; the growth of industrial carbon emissions accelerated, increasing from 254.99 million tons in 2001 to 660.456 million tons in 2011, an increase of about 260%. After China’s accession to WTO in 2001, light industry and processing manufacturing industries developed rapidly, and China’s share of the global manufacturing industry increased rapidly, but it also increased the consumption of coal, oil, and other energy sources, which led to a rapid increase in total carbon emissions. From 2012 to 2016, China’s industrial carbon emissions rose slowly and trended downward. This is due to the fact that with economic development and frequent outbreaks of global climate extremes, addressing climate change has become a global consensus, China has increased its management of greenhouse gas emissions in the country, and total carbon emissions have shown a slowing trend. From 2017 to 2020, industrial carbon emissions rose rapidly. In recent years, affected by the Covid-19 epidemic, the world’s economies have come to a halt, and China, as a major manufacturing country, has seen its manufacturing capacity rise rather than fall, providing the world with a large supply of goods and raw materials and thus has seen a rapid increase in the growth rate of carbon emissions.

Fig. 1
figure 1

Industrial carbon emission trends

Influencing factors of industrial carbon emissions

Different scholars use different terms to describe the variables that influence industrial carbon emissions. Therefore, filtering out the influencing factors of industrial carbon emissions is the key to accurate prediction of industrial carbon emissions. Learning from the studies of existing scholars, the factors influencing industrial carbon emissions are identified, the characteristics are screened according to the random forest model, the importance of the attributes of each variable is ordered, and ultimately, the industrial carbon emission influencing variables applicable to this study are selected. For this reason, this study’s focus on the scientific identification of the influencing elements of industrial carbon emissions and the appropriate screening is crucial, which determines the scientificity of the results. In this study, the initial selection of influencing variables is based on the IPAT constant model (Kim et al. 2020; Hussain et al. 2022; Zhang et al. 2022), and 12 indicators that have the potential to influence industrial carbon emissions are identified from three aspects: demographic, economic, and technological. The specific indicators are as follows.

Random forest (RFF) is a classifier containing multiple decision trees proposed by Breiman (2001). When using random forest for feature filtering, the training set of any decision tree is about two thirds of the entire training set, and the remaining one third of the data set is out-of-bag data. The out-of-bag data is computationally critical when filtering for feature importance (Altmann et al. 2010). All the nodes of the decision tree are randomly drawn \(\sqrt{n}\) features from the \(n\) vit levy set, and then, one of these features is selected based on the Gini gain maximization principle, and then, the parent \({n}_{p}\) data is divided into the right child node \({n}_{r}\) and the left child \({n}_{l}\). The \(Gini\) gain maximization principle is as follows.

$$\Delta {I}_{G}={I}_{G}\left({n}_{p}\right)-{p}_{l}{I}_{G}\left({n}_{l}\right)-{p}_{r}{I}_{G}\left({n}_{r}\right)$$
(26)

The \(Gini\) gain is maximized when the above equation is maximized. \({I}_{G}\left(n\right)=1-{\sum }_{c=1}^{2}{{p}_{c}}^{2}\) is the \(Gini\) index of node \(n\), \({p}_{c}\) is the proportion of \(c\) class samples on node \(n\), \({p}_{l}\) is the proportion of data \({n}_{p}\) distributed to \({n}_{l}\), and \({p}_{r}\) is the proportion of data distributed to \({n}_{p}\) by parent node \({n}_{r}\).

There are usually two measures for random forest–based feature filtering. One is the Gini index hair, and the other is measured by the correct classification rate of out-of-bag data. In this study, the \(Gini\) index method was used for feature importance screening, and the amount of Gini index variation was as follows.

$$IM{P}_{\text{in }}^{Gini }={I}_{G}\left(n\right)-{I}_{G}\left({n}_{l}\right)-{I}_{G}\left({n}_{r}\right)$$
(27)

\(IM{P}_{\text{in }}^{Gini}\) is the importance of feature \({x}_{i}\) at a node, i.e., the amount of change in the \(Gini\) index before and after partitioning the data to the left and right children at this node. When feature \({x}_{i}\) is partitioned in the \({m}_{th}\) decision tree, the set of partitioned nodes is \(N\). Then, the importance of \({x}_{i}\) on decision tree m is.

$$IM{P}_{{i}_{-}m}^{Gini}={\sum }_{n\in N}IM{P}_{in}^{Gini}$$
(28)

If there are a total of \(M\) decision trees in the entire random forest, the importance of feature \({x}_{i}\) is

$$IM{P}_{i}^{Gini}=\frac{1}{M}{\sum }_{m=1}^{M}IM{P}_{{i}_{-}m}^{Gini}$$
(29)

Based on the \(Gini\) index method for feature importance screening, this study establishes a random forest algorithm for the 12 variables affecting industrial carbon emissions in Table 4 by \(R\) software. The importance of each variable is shown in Fig. 2 below.

Table 4 Industrial carbon emissions influence factor selection
Fig. 2
figure 2

The importance of each influencing factor

Figure 2 explains the importance ranking of each influencing factor as follows: energy emission intensity > industrial employees > urbanization rate > total population > the number of industrial technical papers > gross domestic product > industrial GDP > economic development level > industrial secondary energy consumption > industrial primary energy consumption > industrial structure > industrial pull rate. To avoid redundancy of input variables, the top five influencing factors of importance are selected as input variables in this study to construct the industrial carbon emission prediction model.

Data sources

The National Bureau of Statistics, the China Statistical Yearbook, and the China Energy Statistical Yearbook provide information on industrial energy usage from 1991 to 2020.

Results and discussion

Industrial carbon emission prediction result based on BP neural network

The industrial carbon emissions prediction model is constructed based on the influencing factors screened by random forest, i.e., five variables: energy emission intensity, industrial employees, urbanization rate, total population, and the number of industrial technical papers as input variables of the BP neural network model, using the data from 1991 to 2014 as the training set and the data from 2015 to 2020 as the test set to predict industrial carbon emissions.

Selection of critical functions and parameters

The key to the BP neural network prediction model is selecting the activation function, training function, and other vital parameters. In this paper, the hyperbolic tangent S-type function is chosen as the transfer function for mapping the input layer to the hidden layer; the linear function is chosen as the activation function for mapping the hidden layer to the output layer and is chosen as the training function of the model.

The BP neural network’s learning rate affects the model’s training effect. When the learning rate is low, the model is trained slowly but with reasonable accuracy; however, when the learning rate is significant, the model is trained quickly but with poor convergence. The parameters used in this analysis were a learning rate of 0.1, a maximum number of training sessions of 1000, and an anticipated variation of 0.001 for the training objective.

Absolute error is followed as follows:

$$APE=\left|{Y}_{i}-{\widehat{Y}}_{l}\right|$$
(30)

The relative error is followed as follows:

$$RPE=\left|\frac{{Y}_{i}-{\widehat{Y}}_{i}}{{Y}_{i}}\right|\times 100\%$$
(31)

The mean square error is following as:

$$MSE=\frac{1}{N}{\sum }_{i=1}^{N}{\left|\frac{{Y}_{i}-{\widehat{Y}}_{i}}{{Y}_{i}}\right|}^{2}$$
(32)

where \({Y}_{i}\) is the actual value and \(\widehat{Y}\) is the predicted value. The predicted results are fitted to the curves as follows:

The prediction results and error analysis are shown in Table 5. Figure 3 and Table 5 reveal that for the industrial carbon emissions data from 2015 to 2020, the prediction error is relatively small in the first 2 years, with an average relative error within 5%, when the model’s prediction results are relatively close to the relationship between various influencing factors and industrial carbon emissions. While the prediction results in the last 4 years are poor, with the error gradually becoming more extensive, with an average relative error of over 15%, and in 2019. The relative error of the predicted value reached 18.35%, which deviated significantly from the actual value. This indicates that although the BP neural network can reflect the pattern between individual influencing factors and industrial carbon emissions, the prediction results could be more satisfactory in actual forecasting. Overall, the average relative error of the BP neural network model for industrial carbon dioxide emissions is 12.02%, which indicates that the prediction values returned by the BP neural network deviate from the fundamental values, and the prediction accuracy is low. In addition, some of the initial parameters of the BP neural network are random, which causes the prediction results not to remain consistent each time.

Table 5 BP neural network prediction errors
Fig. 3
figure 3

Prediction fitting curve

Industrial carbon prediction model based on network optimization method support vector

Firstly, the input variables should be selected. The five influencing factors that have a more significant impact on industrial carbon emissions as screened by random forest are used as the input variables of the model, i.e., five variables of energy emission intensity, industrial employees, urbanization rate, total population, and the number of industrial technical papers as the input variables of the support vector machine model. Secondly, different kernel functions significantly impact the learning effect of the support vector machine, so it is essential to choose the appropriate kernel function according to the actual problem. This study uses a Gaussian kernel function with few parameters and strong adaptability. The two main parameters for using a Gaussian kernel function are the penalty factor and the function setting g in the kernel function. This study uses the grid optimization method to select optimal and appropriate parameters automatically. The grid optimization method is an exhaustive search method of parameter value selection and is optimized by the cross-validation method. By arranging the possible parameter values and generating a “grid” of all combinations, cross-validation is used to evaluate each variety’s performance and automatically select the optimal combination of parameters. The regression equation is then used to predict the carbon dioxide emissions of the test set based on the regression equation.

When building an SVM, one typically uses data from 1991 to 2014 for training and then uses data from 2015 to 2020 for testing, with the input variables being energy emission intensity, industrial employees, urbanization rate, total population and a number of industrial technical papers, and the output variable being industrial carbon emissions. The fitted curves of the prediction results are as follows (Fig. 4).

Fig. 4
figure 4

Support vector machine industrial carbon emissions fitting curve

Table 6 reveals that the absolute error of the prediction of industrial carbon emissions by the support vector machine model is below 7%, and the relative error of the prediction value for 2017 is only 1.40%, which is the smallest prediction error value. The support vector machine is able to faithfully portray the association between several variables and commercial carbon output and shows the reasonableness of using the five variables of energy emissions intensity, industrial employees, urbanization rate, total population, and the number of industrial technical papers as input variables for the model. The overall average error of the prediction results of the test set is 3.11%, indicating that the model prediction accuracy is very high and its prediction value is very close to the actual value.

Table 6 Support vector machine prediction errors

Model prediction results in comparison

In this study, the five influencing factors screened by the random forest model are used as input variables, and the measured industrial carbon emission data are used as output variables to construct a BP neural network prediction model and a support vector machine model to predict industrial carbon emissions from 2014 to 2020, respectively. To more comprehensively evaluate the prediction results of the two models, this study compares four aspects: the mean absolute error, mean relative error, mean square error, and R2 of the model predictions. The mean fundamental error can represent the actual prediction error of the model, and the mean relative error can characterize the accuracy and credibility of the model. The mean square error can be evaluated as the deviation between the measured and actual values, and R2 can measure the fitting effect of the model. The specific comparison results are as follows.

According to the findings shown in Table 7 and Fig. 5, the absolute and relative errors generated by the support vector machine model for the 2015–2020 industrial carbon emissions forecast results are much lower than those generated by the BP neural network. The mean square error and average relative error are likewise substantially fewer than those of the BP neural network, showing that the predicted values of the support vector machine vary from the actual values far less often than those predicted by the BP neural network. The value of R2 for the support vector machine is closer to 1, indicating that the support vector machine model is a better fit.

Table 7 Comparison of model errors
Fig. 5
figure 5

Comparison of relative model errors

When the accuracy of the two models’ predictions is compared, the support vector machine emerges as the clear winner over the BP neural network. The support vector machine accurately reflects the relationship between industrial carbon emissions and their influencing factors. For small-sample non-linear data sets, the support vector machine can return more accurate prediction results and also proves the shortcomings of the BP neural network in predicting small-sample problems. The support vector machine is more suitable for predicting industrial carbon emissions. Therefore, the support vector machine is chosen as the model for further industrial carbon emissions prediction.

Industrial carbon emission prediction based on support vector machine decision model

This study uses random forests to screen the impact of industrial carbon emissions. It uses five factors with more significant impact as input variables to construct BP neural network and support vector machine models to forecast industrial carbon dioxide emissions for the past six years. Firstly, the GM (1,1) approach is used to represent the elements that will have an impact on industrial carbon dioxide emissions for the period 2021–2040. Because of this, this study predicts industrial carbon emissions from 2021 to 2040 utilizing a support vector machine model. The prediction results of energy emission intensity, industrial employees, urbanization rate, total population, and the number of industrial technical papers passed the test and satisfied the prediction requirements. The support vector machine prediction model was used to estimate industrial carbon dioxide emissions from 2021 to 2040, and the anticipated values of the five influential components discussed above were utilized as the input variables of the model, and the predicted carbon dioxide emissions data from 2021 to 2040 were obtained. Based on the predicted values output from the model, the effect of industrial carbon emission reduction in China was analyzed in terms of both industrial carbon dioxide emissions and emission intensity, and the specific trends are as follows.

Figure 6 reveals that actual industrialized carbon emissions are shown for 1991–2020, and predicted industrialized carbon emissions are offered for 2021–2040. Industrial carbon emissions from 2021 to 2040 show a trend of growth followed by a decline, peaking in 2030 and starting to decrease yearly after 2031. The prediction results report that China’s industrial carbon emissions will peak in 2030, which coincides with the milestone of achieving an overall “carbon peak” by 2030. To achieve the “carbon peak” target and actively respond to global warming, the industry, which is the critical area of carbon emissions in China, should actively adjust its development strategy, accelerate its energy structure adjustment, and contribute to the national emission reduction efforts.

Fig. 6
figure 6

Industrialized carbon emission prediction

Conclusion and policy implications

The industry is a significant contributor to global warming and regional environmental degradation, and the measurement and prediction of industrial carbon emissions are substantial for policy formulation and adjustment, as well as energy conservation and emission reduction efforts. This study defines the direct and indirect sources of industrial carbon emissions and measures industrial carbon emissions. The factors that affect industrial carbon emissions are selected by random forest feature filtering to obtain five factors that have an enormous impact on industrial carbon emissions. Using the above-influencing factors as input variables, a machine learning-based BP neural network and support vector machine prediction model is constructed to predict the industrial carbon emissions from 2015 to 2020. The GM (1,1) gray prediction method is applied to predict each input variable, and the industrial carbon emissions from 2021 to 2040 are predicted based on the support vector machine model. The main findings are as follows: random forest feature filtering, energy emission intensity, industrial employees, urbanization rate, total population, and industrial technology innovation are the five factors that strongly influence industrial carbon emissions. The average relative error of the BP neural network for industrial carbon emission prediction reaches 12.02%, much larger than the prediction error of the support vector machine. The predicted values of the support vector machine model are closer to the actual values, and the average relative error of the prediction is only 3.11%. Industrial carbon emissions will peak in 2030 and show a declining trend in the early years. As an essential component of carbon emissions, the industry is in line to achieve an overall “carbon peak” by 2030. Based on the influencing factors of industrial carbon emissions and the predicted results of the emission data, this study proposes the following implications:

  1. 1.

    Policymakers should seriously consider climate change and prioritize carbon emission reduction while developing the economy. Moreover, policymakers should increase their policy efforts and use strict environmental regulations to urge industrial enterprises to pay attention to energy conservation and emission reduction, take a new type of industrialization, promote the transformation of economic structure to low pollution, and realize the shift of the economy to green and low-carbon. Meanwhile, while formulating and enforcing policy, decision-makers must take into account not just economic but also regional demographic aspects. The government and industrial enterprises should also strengthen ties and cooperation to promote the people-oriented and low-carbon development concept and implement corresponding incentives and policy support for relevant energy conservation and emission reduction efforts.

  2. 2.

    Policymakers should also carry out publicity activities advocating the concept of green and low-carbon living to raise residents’ awareness and understanding of energy conservation and environmental protection, thereby increasing the proportion of green energy in residents' consumption and promoting the transformation of the energy consumption structure to a cleaner one. Policymakers should use various information channels, such as official government websites and media, to improve the depth of public low-carbon concepts and use general supervision as the main force of supervision to achieve a people-oriented approach. Additionally, policymakers should improve the monitoring mechanism of low-carbon development, incorporate low-carbon indicators into the work assessment of relevant government departments, and make comprehensive considerations based on indicators such as energy and resource consumption and urban environmental improvement. Furthermore, policymakers should implement the development strategy of prioritizing new energy transportation and shared transportation and advocate low-carbon travel for the public to reduce motorized travel and carbon emissions.

  3. 3.

    Policymakers should reduce the use of coal, develop and utilize renewable and clean energy, and promote the transformation of energy structure to green and low-carbon. Next, policymakers should actively adjust the industrial layout and gradually outlaw high-energy and high-pollution industries with low-energy and high-efficiency industries. Simultaneously, policymakers should actively formulate policies conducive to industrial transformation and guide and constrain the development direction of industrial enterprises utilizing laws and taxes to transform and upgrade the industrial structure from low-end manufacturing to high-tech industries. Finally, in order to effectively reduce carbon emissions, policymakers should encourage the development of low-carbon energy-saving technologies through innovative investment, enhance the efficiency of energy utilization technologies, and construct a clean, low-carbon, safe, and efficient energy system.