1 Introduction

Global energy consumption has undergone an explosive growth in the last several decades. Most energy is still extracted from fossil fuels, which pollutes the environment, and has a significant impact on economic and socially sustainable development worldwide (Lin et al., 2020). For a long time, China’s energy structure has been dominated by coal, and energy derived from fossil fuels accounts for 90% of its energy consumption. In 2017, China’s ranked first in the world in total primary energy consumption. With its increasing consumption, the problem of environmental pollution and the goal of sustainable development have assumed greater importance, and the transformation and upgrade of the country’s energy structure has become an urgent issue (Kongkuah et al., 2021). If the balance among energy consumption, economic growth, and environmental protection cannot be achieved, sustainable development will be significantly hindered (Liu & Li, 2011; Jiang et al., 2020b).

To realize sustainable development and improve green economic development, the Chinese government has proposed the dual control of its targets for energy consumption and energy intensity, and this has been incorporated into the assessment of local governments (Li et al., 2018). The government requires an accurate forecast of the trends of energy consumption and its intensity of use in each region of China for setting reasonable dual control targets. However, because of wide disparities across the country in terms of regional social and economic development, China’s regional energy consumption is highly unbalanced (Crompton & Wu, 2005). This makes the setting and achieving of dual control targets challenging. To improve the dual control system, the relevant departments have proposed some measures such as optimizing the allocation of energy indicators, implementing energy indicator trading, and budget management of energy use.

Predicting energy consumption is the basis of energy planning and management (Wang et al., 2018). The precise prediction of energy consumption in China is thus important to the government for setting appropriate dual control targets, optimizing the industrial layout, balancing the supply and the demand for energy, and coordinating the energy planning of the central and local governments. However, most researchers have focused on predicting China’s total energy consumption (Yuan et al., 2016; Hu, 2017b; Xiao et al., 2020b), and few have considered predicting the energy consumption of provincial-level regions. Although a few scholars have proposed prediction models for specific regions (Chen et al., 2019; Wu et al., 2018), no study to date has verified the applicability of these models to other regions of China. In this study, we propose a non-homogeneous, discrete multivariate grey model with adjacent accumulation, named ANDGM(1,N), to accurately predict the energy consumption in 30 provincial-level regions of China.

The remainder of this paper is arranged as follows: We provide a review of the literature in Sect. 2 including an analysis of the factors affecting energy consumption and a review of the application of grey models to energy forecasting. Section 3 provides an introduction of the research methods used in this paper. Section 4 details the application and analysis of the model for predicting energy consumption of China, and Sect. 5 summarizes the conclusions of this study.

2 Literature review

2.1 Factors influencing energy consumption

The close relationship between energy consumption and economic development has been widely demonstrated (Soytas & Sari, 2003). Zhang and Cheng (2009) conducted the Granger causality test among China’s energy consumption, economic growth, and carbon emissions. The results showed that there is a one-way Granger causality between the GDP and energy consumption. Duran Toksarı (2007) proposed an energy demand model that uses the GDP, population, imports, and exports as independent variables to predict Turkey’s energy demand. Wang (2015) used an optimized grey dynamic model to predict China’s clean energy consumption by using the GDP and the cost of pollution as independent variables. Zhao et al. (2020) showed through empirical research that financial development and per capita income have increased energy demand, and opening itself to international trade has increased China’s consumption of non-renewable energy.

The rapid progress of China’s economy in recent decades has been closely linked to its industrialization and urbanization. Sadorsky (2014) studied the impact of industrialization and urbanization on energy consumption in emerging economies such as China using econometric models. The results showed that both urbanization and industrialization have had a significant impact on energy consumption. Yuan et al. (2010) used grey relational analysis to examine the relationship between economic development and energy consumption in different periods in China. The results showed that China’s total energy consumption had been closely related to its GDP and secondary industry added value after 2001. Liu (2009) found a stable relationship among the GDP, population, urbanization, and total energy consumption.

Population significantly affects energy consumption. Islam et al. (2013) found that Malaysia ‘s population, GDP, and financial development have had significant impacts on its energy consumption. Wu et al. (2018) found, by using the grey convex correlation model, that population is the most important factor affecting power consumption in China’s Shandong Province, and established a fractional GM(1,N) model to predict the consumption of electricity. Cheng et al. (2020) applied an improved GM(1, N) model for predicting clean energy consumption while considering the GDP and population.

In summary, the main factors affecting energy consumption are the economy, urbanization, industrialization, and population, as shown in Table 1.

Table 1 Main factors of affecting energy consumption

Although the above factors impact energy consumption, their exact relationship with each other remains unclear (Zhao et al., 2020). Predicting energy consumption can be regarded as a problem concerning an uncertain system that can be solved by a grey system (Liu, 2010).

2.2 Grey prediction for energy consumption

Grey system theory was developed to handle uncertain system with limited information. Grey forecasting models are an important research tool that arose from grey system theory, and have demonstrated its high applicability in many fields such as energy consumption prediction this study focuses on, tourism demand (Hu et al. 2021), CO2 emission (Jiang et al. 2021), and wastewater management (Guo et al. 2022). Grey forecasting models can be divided into multivariable models (such as GM(1,N)) and univariate models (such as GM(1,1)) (Xie & Wang, 2017).

As far as energy consumption forecasting, Katani (2019) introduced the grey Verhulst model and the GM(1,1) model to forecast Ghana’s energy consumption, and found that both models can generate accurate predictions. To avoid estimation errors generated by ordinary least-squares (OLS) method, Hu (2017c) introduced a GM(1,1) model that was optimized by a neural network instead of the OLS to calculate the control variable and developing coefficient. The proposed model delivered good performance in terms of predicting electricity consumption. Xiao et al. (2020a) introduced a modified grey model using the Riccati–Bernoulli sub-ODE method to forecast the consumption of clean energy in China and India. Their findings suggest that the model is more accurate than 12 prevalent models. Given that coal energy is the most important source of energy in China, Jia et al. (2020) applied a residual GM(1,1) model to forecast Gansu’s energy consumption. The results showed that residual correction by Markov chains can improve predictive accuracy. Jiang et al. (2020a) proposed an interval forecasting model that uses a neural network and grey model to predict electricity consumption. The results showed that it can outperform other interval models. Chen et.al.(2021) applied a fractional GM(1,1) model to forecast and analyze the flexibility in the trend of energy consumption in the Beijing–Tianjin–Hebei region.

The above studies have mainly used the univariate grey GM(1,1) model or related modified forms to predict energy consumption. However, such univariate prediction models cannot reflect the influence of environmental changes on the system, and energy consumption is affected by a variety of external factors, described in Sect. 2.1. When the behavioral characteristics of the prediction sequence are affected by such exogenous variables, multivariate grey forecasting models can yield better results. Some scholars have applied multivariate grey models to predict energy consumption. Lao et al. (2021) introduced a GM(1,N) model to forecast China’s energy consumption by using optimized background values that outperformed the traditional GM(1,N) model. Wang and Cao (2021) proposed SMGM(1,m) and BMGM(1,m) models to predict Chinese economic growth, energy consumption, and urbanization. Some scholars have sought to increase the accuracy of forecasts of energy consumption by improving the structure of the traditional grey multivariate model. Cheng et al. (2020) modified the traditional multivariate grey forecasting model by adding a constant term to the whitening differential equation to predict China’s consumption of clean energy. Wang and Hao (2016) used the improved multivariable GMC(1,N) model with optimized parameters to predict industrial energy consumption in China. Xie et al. (2021a) proposed a robust reweighted multivariate grey model to predict the greenhouse gas emissions in European Union member countries. Zhang et al. (2022) developed a new flexible grey multivariate model in energy consumption forecasting. Wang et al. (2022) proposed AGMC(1,N) model to predict energy consumption in 7 regions of China. To increase the number of degrees of freedom of the relevant models, some scholars have introduced the fractional order to the accumulated generation operator of the grey prediction model. Wu et al. (2018) proposed a fractional GMC(1,N) model to predict Shandong’s electricity consumption, and Ma et al. (2019) verified the superiority of the fractional-discrete multivariate grey model in terms of predicting clean energy consumption of China.

The above shows that although some researchers have applied the multivariate grey model to energy forecasting, few have explored its applicability to regions in China. We introduce an adjacent accumulation operator to develop discrete multivariate grey model to predict energy consumption in 30 province-level regions of China. The operator can reveal the internal relationship between old and new information. Liu and Wu (2021) have shown that the univariate discrete grey model with the adjacent accumulation operator can stably predict non-renewable energy consumption. This paper extends the adjacent accumulation operator to the multivariate grey prediction model and verifies its effectiveness at predicting energy consumption in 30 province-level regions of China.

3 Proposed prediction model

We propose a non-homogeneous, discrete multivariate grey prediction model with adjacent accumulation. We select the relevant variables using grey relational analysis (GRA), and then apply the grey wolf optimizer algorithm to find the optimal parameters for the proposed model.

3.1 Steps of computation

The proposed ANDGM (1,N) is a modified form of the DGM (1,N) combined with an adjacent accumulator. The steps for constructing it are as follows:

Step 1: Assume a dependent variable sequence \({X}_{1}^{\left(0\right)}=({x}_{1}^{\left(0\right)}\left(1\right),{x}_{1}^{\left(0\right)}\left(2\right),\ldots,{x}_{1}^{\left(0\right)}\left(n\right))\), and m-1 sequences of explanatory variables \({X}_{i}^{\left(0\right)}=({x}_{i}^{\left(0\right)}\left(1\right),{x}_{i}^{\left(0\right)}\left(2\right),\ldots,{x}_{i}^{\left(0\right)}\left(n\right))\), where\(i=\text{2,3},\ldots,m.\)

Step 2: Obtain the \(r\)-order adjacent accumulation sequence (r-AAGO) of\({X}_{i}^{\left(r\right)}\left(n\right)\). The accumulated generated sequence, \({X}_{i}^{\left(r\right)}\left(n\right)\), is the \(r\)-AAGO of \({X}^{\left(0\right)}\) as follows:

$${x}^{\left(r\right)}\left(k\right)=\frac{1}{r}\times {x}^{\left(0\right)}\left(k\right)+\frac{r-1}{r}\times \sum _{i=2}^{k}{x}^{\left(0\right)}\left(i\right)$$
(1)

where \(k=2,\ldots,n\), and \(r\) is the adjacent accumulation parameter that represents the internal relationship between old and new information. The initial value of the sequence \({x}^{\left(r\right)}\left(1\right)={x}^{\left(0\right)}\left(1\right)\). Therefore, the inverse \(r\)-AAGO can be calculated as:

$${X}^{\left(0\right)}\left(k\right)=r\times {x}^{\left(r\right)}\left(k\right)-(r-1)\times \sum _{i=2}^{k}{x}^{\left(0\right)}\left(i\right),\quad \text{k}=2,\ldots,\text{n}$$
(2)

Step 3: The non-homogeneous multivariate discrete grey model is as follows:

$${x}_{1}^{\left(r\right)}\left(\text{k}\right)={\beta }_{1}{x}_{1}^{\left(r\right)}\left(k-1\right)+\sum _{i=2}^{m}{\beta }_{i}{x}_{i}^{\left(r\right)}(k+1)+v\cdot k+u$$
(3)

where \(k=2,\ldots,n\). The linear parameters \(\alpha ={\left[{\beta }_{1},{\beta }_{2},\ldots,{\beta }_{m},v,u\right]}^{T}\) can be estimated by the least-squares method:

$${\left[{\widehat{\beta }}_{1},{\widehat{\beta }}_{2},{\cdots ,\widehat{\beta }}_{\text{m}},\widehat{v},\widehat{u}\right]}^{\text{T}}= \widehat{\alpha }={(B}^{T}B{)}^{-1} {B}^{T}Y$$
(4)

where B and Y are defined as

$$\text{B}=\left[\begin{array}{cccccc}{X}_{1}^{\left(r\right)}\left(1\right)& {X}_{2}^{\left(r\right)}\left(2\right)& \cdots & {X}_{m}^{\left(r\right)}\left(2\right)& 1& 1\\ {X}_{1}^{\left(r\right)}\left(2\right)& {X}_{2}^{\left(r\right)}\left(3\right)& \cdots & {X}_{m}^{\left(r\right)}\left(3\right)& 2& 1\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ {X}_{1}^{\left(r\right)}(n-1)& {X}_{2}^{\left(r\right)}\left(n\right)& \cdots & {X}_{m}^{\left(r\right)}\left(N\right)& n-1& 1\end{array}\right], \quad Y=\left[\begin{array}{c}{X}_{1}^{\left(r\right)}\left(2\right)\\ {X}_{1}^{\left(r\right)}\left(3\right)\\ \vdots \\ {X}_{1}^{\left(r\right)}\left(n\right)\end{array}\right]$$

Step 4: Present the non-recursive response function. Use the parameters \({\widehat{\beta }}_{1},{\widehat{\beta }}_{2},{\cdots ,\widehat{\beta }}_{\text{m}},\widehat{v},\) and \(\widehat{u}\), to calculate the simulated value of \({\widehat{X}}_{1}^{\left(r\right)}\) as:

$${\widehat{\text{X}}}_{1}^{\left(r\right)}\left(k\right)={X}_{1}^{\left(0\right)}\left(1\right)\cdot {\beta }_{1}^{k-1}+{\sum }_{\tau =2}^{k}(g\left(\tau \right)\cdot {\beta }_{1}^{k-\tau })$$
(5)

where the function g() is obtained as

$$\text{g}\left(\text{k}\right)={\sum }_{\text{i}=2}^{\text{m}}{\beta }_{i}{X}_{i}^{r}\left(k\right)+ v\cdot \left(k-1\right)+u$$
(6)

where \(k=2,\ldots,n\).

Step 5: Accordingly, by using inverse r-AAGO, the final forecasted values, \({\widehat{X}}_{1}^{\left(0\right)}\), can be calculated as follows:

$${\widehat{X}}_{1}^{\left(0\right)}\left(k\right)=r\times {\widehat{X}}_{1}^{\left(r\right)}\left(k\right)-(r-1)\times {\sum }_{i=2}^{k}{\widehat{X}}_{1}^{\left(0\right)}\left(i\right), \text{k}=2,\ldots,\text{n}$$
(7)

where\({\widehat{x}}_{1}^{\left(r\right)}\left(1\right)={x}_{1}^{0}\left(1\right)\)

3.2 Variable selection using grey relational analysis

Because a set of variable may influence the performance of the grey model, we use GRA to select the variables. GRA can determine the extent of influence between sequences according to the similarity of the geometric shapes of curves of the sequence. As a branch of grey system theory, GRA is suitable for solving problems of sequence correlation in grey system(Liu, 2010). Therefore, we apply GRA to screen the sequence of factors closely related to the energy consumption of the province-level regions of China.

Set grey system sequences, \({\text{X}}_{\text{i}}=({x}_{\text{i}}\left(1\right),{x}_{\text{i}}\left(2\right),\ldots,{x}_{\text{i}}\left(n\right))\), where \(i=0, 1, 2,\ldots,m\).

Initial phase value of each sequence

$${\text{X}}_{\text{i}}^{{\prime}}=\frac{{X}_{i}}{{x}_{i}\left(1\right)}=\left({x}_{\text{i}}^{{\prime}}\left(1\right),{x}_{\text{i}}^{{\prime}}\left(2\right),\ldots,{x}_{\text{i}}^{{\prime}}\left(n\right)\right), \quad i=0,1,2, \ldots ,m$$
(8)

For the distinguishing coefficient \({\upxi }\in \left(\text{0,1}\right)\), let

$${\upgamma }\left({x}_{0}\left(k\right),{x}_{i}\left(k\right)\right)= \frac{{{\Delta }}_{\text{m}\text{i}\text{n}}+{{\Delta }}_{\text{m}\text{a}\text{x}}}{{{\Delta }}_{\text{i}}\left(k\right)+{\upxi }{{\Delta }}_{\text{m}\text{a}\text{x}}}$$
(9)
$${\upgamma }\left({X}_{0}\left(k\right),{X}_{i}\left(k\right)\right)= \frac{1}{\text{n}}\sum _{k=1}^{n}{\upgamma }\left({x}_{0}\left(k\right),{x}_{i}\left(k\right)\right)$$
(10)

and

$${{\Delta }}_{\text{i}}\left(k\right)=|{x}_{0}^{{\prime}}\left(k\right)-{x}_{\text{i}}^{{\prime}}\left(k\right)|$$
(11)
$${{\Delta }}_{\text{m}\text{a}\text{x}}= \underset{i}{\text{max}}\underset{k}{\text{max}}{{\Delta }}_{\text{i}}\left(k\right)$$
(12)
$${{\Delta }}_{\text{m}\text{i}\text{n}}= \underset{i}{\text{min}}\underset{k}{\text{min}}{{\Delta }}_{\text{i}}\left(k\right)$$
(13)

\(\gamma \left({X}_{0}\left(k\right),{X}_{i}\left(k\right)\right)\) is the grey relational grade (GRG) between \({X}_{0}\) and \({X}_{i}\). A larger value of \({\upgamma }\) means a stronger relation between \({X}_{0}\left(k\right)\) and \({X}_{i}\left(k\right)\).

The GRG value of the variables is used to determine the independent variable during modeling. Some scholars use independent variables by artificially setting a threshold for the GRG(Zeng et al., 2020), others select the optimal threshold by using a heuristic algorithm to automatically screen the variables(Hu, 2020), and still others choose variables by using the maximum GRG as an independent variable (Wu et al., 2018). We adopt the method proposed by Wu et al. (2018) to select the maximum GRG value as independent variable.

3.3 Measuring forecasting accuracy

Each method of measuring error has certain shortcomings and applicable conditions (Vandeput, 2021), because of which using only one method is risky. We thus used two statistical tests, the root mean-squared percentage error (RMSPE) (Tien, 2012) and the mean absolute percentage error (MAPE), to evaluate the performance of the proposed ANDGM(1,N), defined as follows:

$$\text{RMSPE}=100{\%}\times \sqrt{\frac{1}{m}\times {\sum }_{t=1}^{m}{\left(\frac{{(\widehat{x}}_{1}^{\left(0\right)}\left(t\right)-{x}_{1}^{\left(0\right)}\left(t\right))}{{x}_{1}^{\left(0\right)}\left(t\right)}\right)}^{2}}$$
(14)
$$MAPE=\frac{100{\%}}{m}\times \sum _{t=1}^{m}\frac{|{\widehat{x}}_{1}^{\left(0\right)}\left(t\right)-{x}_{1}^{\left(0\right)}(t\left)\right|}{{x}_{1}^{\left(0\right)}\left(t\right)}$$
(15)

where \(m\) is the number of forecasting periods. The model with the minimum values in the two statistical tests delivers the best performance.

3.4 Parameter optimization

The performance of the proposed ANDGM(1,N) model depends on parameter \(r\). Therefore, in order to achieve the optimal ANDGM(1,N), referring to Wu et al. (2018a), we establish an optimization problem with RMSPE as the objective function to solve the parameters as follows:

$$\text{min } \text{RMSPE}=100{\%}\times \sqrt{\frac{1}{m-1}{\sum }_{t=2}^{m}{\left(\frac{{(\widehat{x}}_{1}^{\left(0\right)}\left(t\right)-{x}_{1}^{\left(0\right)}\left(t\right))}{{x}_{1}^{\left(0\right)}\left(t\right)}\right)}^{2}}$$
(16)
$$s.t.\left\{\begin{array}{l}{\left[{\widehat{\beta }}_{1},{\widehat{\beta }}_{2},{\cdots ,\widehat{\beta }}_{\text{m}},\widehat{v},\widehat{u}\right]}^{\text{T}}={(B}^{T}B{)}^{-1} {B}^{T}Y\\ {X}^{\left(r\right)}\left(1\right)={X}^{\left(0\right)}\left(1\right)\\ {x}^{\left(r\right)}(k+1)=\frac{1}{r}\times {x}^{\left(0\right)}(k+1)+\frac{r-1}{r}\times {\sum }_{i=1}^{k}{x}^{\left(0\right)}\left(i\right)\\ {\widehat{x}}_{1}^{\left(r\right)}\left(k\right)={x}_{1}^{\left(0\right)}\left(1\right)\times {\beta }_{1}^{k-1}+{\sum }_{\tau =2}^{k}(g\left(\tau \right)\bullet {\beta }_{1}^{k-\tau })\\ g\left(k\right)={\sum }_{i=2}^{m}{\beta }_{i}\cdot {x}_{i}^{\left(r\right)}\left(k\right)+v\cdot \left(k-1\right)+u\\ {\widehat{x}}^{\left(0\right)}\left(k+1\right)=r\times {\widehat{x}}^{\left(r\right)}\left(k+1\right)-(r-1){\sum }_{i=1}^{k}{\widehat{x}}^{\left(0\right)}\left(i\right)\end{array}\right.$$
(17)

The above problem is a nonlinear optimization problem that can be solved by a meta-heuristic algorithm, which is a method to find approximate solutions to complex problems. Commonly used meta-heuristic algorithms for parameter optimization include the grey wolf optimizer (GWO), genetic algorithm, whale optimization algorithm, and particle swarm optimization (Hu et al., 2020; Liu & Wu, 2021; Xie et al., 2021b; Xie et al., 2021c).

We apply the GWO algorithm to optimize the parameters of the ANDGM(1,N) model. We use the GWO due to its advantages in accurately solving the function and its speed of convergence speed. It is also simple, easy to use, flexible, extensible, and easy to implement (Mirjalili et al., 2014; Faris et al., 2018). Past studies have shown the capability of the GWO for optimizing problems of energy prediction and optimizing the parameters of the grey model (Ghalambaz et al., 2021; Kong & Ma, 2018; Tian et al., 2020; Xie et al., 2021c). Thus, it is suitable for optimizing the parameters of the proposed ANDGM(1,N) model.

We used the EvoloPy framework, written in Python and proposed by Faris et al. (2018), to realize the GWO algorithm. The pseudocode for constructing ANDGM(1,N) model is as follows:

Algorithm for constructing ANDGM(1,N).

  Input: Original series X(0).

  Output: Best parameters of \(r\).

  1: Set the number of iterative steps, T, number of search agents, S, lower bound, LB, and upper bound, UB.

  2: Initialize the grey wolf population.

  3: Initialize \(\overrightarrow{a}\), which linearly decreases from 2 to 0.

  4: Initialize \(\overrightarrow{A}\) and \(\overrightarrow{C}\) by \(\overrightarrow{a}\).

  5: Initialize the best, second, and third agents.

  6: for \(t\leftarrow 1\); t < T; \(t\leftarrow t+1\) do

     while sub_agent in agent do.

              Update the position.

         Update \(\overrightarrow{A}\), \(\overrightarrow{C}\), and \(\overrightarrow{a}\) for sub-agent in agents

           Compute the parameters \(r\) in Eq. (5) to get\({\widehat{X}}^{\left(r\right)}\left(k\right)\)

           Compute \({\widehat{x}}^{\left(0\right)}\) in Eq. (7)

           Substitute \({\widehat{X}}^{\left(r\right)}\left(k\right)\) and \({\widehat{X}}^{\left(0\right)}\) into Eq. (16) to get the fitness.

    end while.

    update the best, second, and third agents.

  end for.

  return best parameters of r.

4 Simulations

4.1 Forecasting electricity consumption (Case I)

The data used from Wu et al.(2018) are listed in Table 2. The first 10 data items were used for model fitting and the last three for ex-post testing. Since there is no set of best hyperparameters that fits for all models, we followed the suggestions of the related studies (Hu, 2021; Victoria & Maragatham, 2021). Referring to work by Ma et al. (2019), we set the hyperparameters of the GWO as follows: number of search agents = 100, the maximum number of iterative steps = 100, and the LB and UB were taken from the interval [-2, 2]. After the GWO completed learning, the optimal parameter r of ANDGM(1,N) was 0.687331, that of FDGM(1,N) was − 0.00082, and that of FGMC(1,N) was 0.71. The results of ex-post testing (Fig. 1) show that the RMSPE = 5.73% and MAPE = 4.18% for ANDGM (1, N), lower than the other four models considered. The worst performance in terms of the RMSPE and MAPE was delivered by DGM(1,N) (6.883%), and FDGM(1,N) (5.4%), respectively. Because the proposed ANDGM(1,N) delivered the best RMSPE and MAPE, it was the best in terms of accuracy and stability in Case I.

Table 2 Raw data of Shandong’s electricity consumption (\({X}_{1}^{\left(0\right)})\) and total population (\({X}_{2}^{\left(0\right)}\))
Fig. 1
figure 1

Predictive accuracies of models in Case I

4.2 Forecasting clean energy consumption (Case II)

The data from Ma et al.(2019) are listed in Table 3. The first 17 data items were used for model fitting and the last 4 for ex-post testing. The hyperparameters of the GWO were same as in Case I. After it had completed learning, the optimal parameters r of the ANDGM(1,N), FDGM(1,N), and FGMC(1,N) were \(0.883894,-0.48485\), and \(1.24092\), respectively. The results of ex-post testing (Fig. 2) show that the RMSPE = 2.199% and MAPE = 1.638% for ANDGM(1,N), lower than those of the other four models. The FGMC delivered the worst performance in terms of both the RMSPE = 27.211% and MAPE = 25.492%. Because the proposed ANDGM(1,N) delivered the best performance in terms of the RMSPE and MAPE, it is most suitable for Case II.

Table 3 Raw data on clean energy consumption (\({X}_{1}^{\left(0\right)})\), GDP (\({X}_{2}^{\left(0\right)}\)), and effluent charge (\({X}_{3}^{\left(0\right)}\)) in China
Fig. 2
figure 2

Predictive accuracies of models in Case II

4.3 Forecasting energy consumption (CASE III)

4.3.1 Data description and variable selection

The dataset used here covered 30 province-level regions in China, excluding Tibet. Energy consumption was defined as the total amount of energy consumed by the economy and households over a certain period in each region. We selected the GDP, secondary industry added value, ratio of urban population, total population, and volumes of exports and imports as factors, as described in Sect. 2.1.

China’s energy policy has changed significantly since 2014. The government has established a dedicated energy leadership group and has proposed the “Medium- and Long-term Energy Development Plan (2004–2020)” and the “12th Five-Year Plan for Energy Development” (Kong et al., 2020). Therefore, the dataset covered from 2004 to 2019. We used 80% of the data (from 2004 to 2016) for model fitting and 20% (from 2017 to 2019) for ex-post testing. As shown in Table 4, all the variables were obtained from the National Bureau of Statistics of China (http://www.stats.gov.cn/tjsj/ndsj/).

Table 4 The descriptive statistics of the variables considered

We applied the GRA method to measure the relationship between energy consumption and the related factors from 2004 to 2016, and the results are shown in Table 5.

Table 5 Relationship between energy consumption and related variables

A larger \({\gamma }_{i}\) means a greater degree of influence of factor \(i\) on energy consumption. GDP had the greatest influence on energy consumption. As described in Sect. 3.2, we selected it as the independent variable to predict energy consumption.

4.3.2 Results

We used the ANDGM(1,N) model to forecast energy consumption in 30 province-level regions of China, and compared it with four commonly used grey multivariate models: DGM(1,N), GMC(1,N), FDGM(1,N), and FGMC(1, N). GDP was selected as the independent variable. The first 13 data items were used for model fitting and the last 3 for ex-post testing. The hyperparameters of the GWO were same as in Case I. The results of the forecasts in terms of the RMSPE are shown in Table 6, and the minimum RMSPE is highlighted in each line in bold. None of the five models recorded the best performance in terms of energy forecasts in all regions, but the proposed ANDGM(1,N) had the lowest RMSPE in a majority of regions. Figure 3 shows the average performance of all models across the 30 regions. The mean RMSPE of ANDGM(1,N) was 11.154%, better than those of the other models. GMC(1,N) delivered the worst performance.

Table 6 Results of ex-post testing for energy consumption forecasts in RMSPE
Fig. 3
figure 3

Average predictive accuracies in RMSPE

The above analysis shows that the proposed ANDGM(1,N) was superior to the other models in terms of average performance in all the regions considered. To verify its effectiveness, we used the left-tailed Wilcoxon signed-rank test. It is a non-parametric statistic that is applied widely to compare differences between forecasting models (Wilcoxon, 1945; Zhao & Wu, 2020). \({\text{W}}^{-}\) represent the rank sum of absolute values of negative differences between the ANDGM(1,N) model and the other compared models. The null hypothesis here was that the accuracy of ANDGM(1,N) was not higher than those of the other compared models. The null hypothesis was rejected when \(p < 0.05\). The results (Table 7) show that the \(p\) values of DGM(1,N), GMC(1,N), and FGMC(1,N) were all less than 0.05; thus, the null hypothesis was rejected, which means that ANDGM(1,N) significantly outperformed these models.

Table 7 Results of Wilcoxon signed-rank test in terms of RMSPE

We also used the MAPE to verify the stability of the proposed model. The results are shown in Table 8, and the minimum MAPE is highlighted in each line in bold. We compared the results in terms of the RMSPE (Table 6) and the MAPE (Table 8) and found that except in Jiangsu and Anhui, the results were consistent by the rank first models of accuracy performance in each region. ANDGM(1,N) and DGM(1,N) delivered the best performance in terms of the RMSPE and MAPE, respectively, for Anhui, and ANDGM(1,N) and FGMC(1,N) yielded the best results in terms of the RMSPE and MAPE, respectively, for Jiangsu. This might have occurred because the objective function of the optimal parameter selection was to minimize the RMSPE. Therefore, although ANDGM(1,N) had the minimum RMSPE in the terms of prediction of energy consumption in Jiangsu and Anhui, its results in terms of the MAPE were not equally optimal

Table 8 Results of ex-post testing of energy consumption forecasts in terms of the MAPE

To further verify the influence of the results of prediction in terms of the MAPE, Fig. 4 illustrates the average MAPE of the five models in all regions. The Wilcoxon signed-rank test was used to analyze the differences among them (Table 9). The proposed ANDGM(1,N) was still better than the other models in terms of average accuracy and outperformed DGM(1,N), GMC(1,N), and FGMC(1,N) at a statistical significance of 0.05

Overall, the proposed ANDGM(1,N) outperformed the other four models in predicting China’s regional energy consumption in terms of the RMSPE and MAPE

Fig. 4
figure 4

Average predictive accuracies in MAPE

Table 9 Results of Wilcoxon signed-rank test in terms of the MAPE

5 Conclusions

Accurately predicting province-level regional energy consumption is crucial for China’s energy policies. This study introduced the adjacent accumulation operator to the grey multivariate model to construct a non-homogeneous, discrete multivariate prediction model based on adjacent accumulation. We analyzed the factors influencing regional energy consumption and selected the GDP as the independent variable for multivariate prediction using the GRA method. The results showed that the proposed method outperformed four commonly used multivariate grey prediction models—DGM(1,N), GMC(1,N), FDGM(1,N), and FGMC(1,N)—in terms of predictive accuracy and stability. The contributions of this paper are as follows:

(1) An adjacent accumulation operator was introduced to the multivariate grey model to construct a non-homogeneous, discrete multivariate grey prediction model with adjacent accumulation.

(2) We analyzed the factors influencing regional energy consumption in China and calculated correlations between them and energy consumption using GRA. We found that GDP and energy consumption of each region in China remain closely related. The results were used as the basis to choose variables for the prediction model.

(3) We predicted the 30 province-level regional energy consumption in China by using ANDGM(1,N) and verified its superiority over four commonly used grey multivariate models in terms of two measures of precision and the Wilcoxon signed-rank test. The results of this study can provide the basis for “dual-control” energy management of each region in China.

Because official data on regional energy consumption are available for only until 2019, the range of prediction in this paper was set from 2004 to 2019. However, the official GDP data have been updated to the third quarter of 2021. Although the outbreak of COVID-19 has had a significant impact on China’ s economy, the GDP of each region in 2020, except Hubei, has continued to increase. The average GDP growth of each region in the first three quarters of 2021 was above 9% (National Statistical Bureau, 2021). Section 4.3.1 showed a strong relationship between the GDP and energy consumption. The economic development and energy consumption in each region are still intimately related. To maintain economic growth in the post-pandemic period, most regional governments are likely to adopt loose policies for dual control management, because of which total energy consumption will continue to grow. A suitable target and management for dual control should be formulated according to the characteristics of each region, especially the western region that has undergone a significant growth in energy consumption in recent years: Ningxia, Neimenggu, Gansu, and Xinjiang. These regions are rich in clean energy resources but poor in energy technology. Regional governments should control the growth in energy consumption and increase investment in the high-energy technology industry (Chen et al., 2010). Central hinterland areas, such as Shanxi, Hubei, Hebei, and Henan, have large resources of fossil energy, and their industrial structure is dominated by energy-intensive enterprises. Regional governments there should optimize the industrial structure, reduce energy consumption and highly polluting industries, and increase high value-added industries and efficiency of energy use (Wang et al., 2016; Yang & Wang, 2013). Such developed eastern such as Beijing, Tianjin, Shanghai, and Zhejiang are economically strong, highly efficient in terms of energy utilization, and technologically advanced. However, they have insufficient energy resources. Energy supply and demand should be balanced to prevent energy shortages (Hou et al., 2021).

We used only one meta-heuristic algorithm, the GWO algorithm, to optimize the parameters of ANDGM(1,N) without comparing it with other optimization algorithms. This is because we focused on establishing a prediction model suitable for regional energy consumption, instead of comparing the influence of the algorithm on the results of prediction in the context of parameter optimization. The goal of a meta-heuristic algorithm is to find a feasible solution within acceptable time rather than the best solution (Kunche & Reddy, 2016). As we described in Sect. 3.4, commonly used meta-heuristics algorithms are suitable for optimizing the parameters of grey prediction models, and the GWO algorithm has advantages in terms of attaining the global optimal solution, stability, and speed of convergence. Therefore, we used it to optimize the parameters of the proposed ANDGM(1,N). In future work, we intend to compare the influence of parameter optimization on the results of prediction.

Although the proposed ANDGM(1,N) outperformed the other models considered, its predictive accuracy for some regions has room for improvement. A feasible way to improve its accuracy is to apply residual correction (Hu, 2017a). Thus, constructing a residual ANDGM(1,N) may be an interesting direction in future research. Many studies have indicated that model combination is conducive to improving the accuracy of predictive models (Li et al., 2019). Therefore, predicting energy consumption using combined models will be a focus of our future work. Another important indicator of China’s dual control targets is energy intensity, and a model to accurately predict energy intensity should be considered in future work in the area.