1 Introduction

The manufacturing industry can directly reflect a country’s productivity level, and energy is an important material basis for human survival and development. According to Ref National bureau of statistics (2002), in the past decade, the total energy consumption of the manufacturing industry has been on a steady trend, accounting for about 57% of the total energy consumption of the whole country; however, the gross domestic product (GDP) of the manufacturing industry only accounts for about 31% of the total, showing a sharp downward trend. It shows that the energy consumed by the manufacturing industry is not proportional to its contribution to the national GDP, and the consumption structure of the manufacturing energy needs to be further improved.

Natural gas is a kind of high-quality, efficient, and clean low-carbon energy. With the reform of natural gas prices and the vigorous promotion of natural gas development in the 13th five-year plan, the development of natural gas will usher in historic opportunities. According to Ref National bureau of statistics (2002), in the past decade, the consumption of natural gas in the manufacturing industry has been on an upward trend, accounting for about 40% of the national consumption of natural gas, indicating that the 18th national congress of the communist party of China proposed to vigorously promote the construction of ecological civilization and play a positive role in promoting the use of natural gas; however, the consumption of natural gas in manufacturing industry only accounts for about 0.34% of the total energy consumption in the manufacturing industry, and the natural gas consumption of manufacturing industry accounts for less than 0.4% of manufacturing GDP. It shows that the government’s actions on the development of the natural gas manufacturing industry still needs to be accelerated. As a consequence, China is in the stage of reforming the energy consumption system, and the situation of the market and economics is changing fast. This brought more uncertainties to the energy consumption system of China. Meanwhile, under such circumstances, often the newest few data are available for accurate forecasting of energy consumption. Thus a tool which is efficient in dealing with uncertainties with small samples is needed.

Grey system theory proposed by Deng is such a tool which is available to deal with the problems described above (Ref Julong (1986)), in which the grey models play a key role. Unlike the white box models, such as the differential equations in Refs Wang et al. (2019), or the black box models, like the machine learning models in Refs Yang et al. (2019), Pei et al. (2019), Fan et al. (2019), the grey models essentially try to combine the merits of these models in order to take most advantages of the infomation. Moreover,it was proved to be very efficient in small sample modeling for time series forecasting in Ref Lifeng et al. (2013). Within such priority, the grey models have been applied in a wide variety of fields in recent years, such as dollar to euro price forecasting in Ref Kayacan et al. (2010), passenger demand growth forecasting in the air transportation industry in Ref Benítez et al. (2013), the actual cost and the cost at completion of a project forecasting in Ref San Cristóbal et al. (2015), the scrapped vehicles forecasting in Ref Ene and Öztürk (2017), short-term freeway traffic parameter prediction in Ref Bezuglov and Comert (2016) , the e-waste in Washington in Ref Duman et al. (2019), total natural gas consumption forecasting in Ref Zeng et al. (2020), pollutant forecasting in Xiong et al. (2020), traffic flow prediction Ref Xiao et al. (2020), etc. But it was also pointed out by Wu that the conventional grey models based on first-order accumulation is not flexible enough to deal with more complex sequences; thus the fractional order accumulation was introduced for grey models in Ref Lifeng et al. (2013). A series of theoretical analysis was also provided in the following research, such as the sensitivity of initial condition in Ref Lifeng et al. (2015), ability of mining new information in Ref Lifeng and Bin (2017). Within such advantages the fractional grey models soon become popular and were applied in many new fields in recent years, such as transaction counts forecasting in Ref Gatabazi et al. (2019), and even the new coronavirus (COVID-19) cases forecasting in Ref Utkucan and Tezcan (2020).

On the other hand, the fractional order grey models are also suitable for energy forecasting with its high flexibility and effectiveness in small sample modeling. Wu et al. proposed the FGM(1,1) and made a more accurate prediction of the coal mine drainage volume in Ref Lifeng et al. (2014). Shaikh et al. constructed China’s natural gas consumption forecasting model by utilizing two optimized nonlinear grey models: the Grey Verhulst Model and the Nonlinear Grey Bernoulli Model in Ref Shaikh et al. (2017). Wang et al. established a novel hybrid forecasting model based on an improved grey forecasting mode optimized by a multi-objective ant lion optimization algorithm and solved the problem of accuracy and stability of annual power consumption data in Ref Wang et al. (2018). Wu et al. used the GM(1,1) model with the fractional order accumulation (FGM(1,1)) to predict the future trend of air quality, and the results can be directly exploited in the decision-making processes for air quality management in Ref Lifeng et al. (2018). Moonchai et al. proposed a novel method based on the modification of the multivariate grey forecasting model and applied it to the consumption forecast of renewable energy in Ref Moonchai and Chutsagulprom (2020). Based on the new information priority principle and combined with grey buffer operator technology, Zeng B. realized the scientific forecast of shale gas production in my country in Ref Zeng et al. (2020). Utkucan S. built a fractional nonlinear grey Bernoulli model, briefly as FANGBM(1,1) to forecast of Turkey’s total renewable and hydro energy in Ref Şahin (2020).

However, it should be noticed that the existing fractional order grey models only use a unified fractional order. As will be discussed in this work, such operation will limit the advantages of the fractional order accumulation, leading to less flexibility of the fractional grey models. Io present a more flexible modeling formulation, a time-delayed fractional discrete grey model with multiple fractional orders (TDF-\(\hbox {DGM}_M\)) is established in this work. The particle swarm optimization (PSO) is used to calculate the optimal orders \(r_1\) and \(r_2\) of TDF-\(\hbox {DGM}_M\) model. Four cases were used to verify the validity and accuracy of the model. Finally, the TDF-\(\hbox {DGM}_M\) model is used to predict the gas consumption of China’s manufacturing industry.

The rest of this paper is organized as follows: Brief overview of background is shown in Sect. 2; a brief introduction of the fractional grey model (FGM) is presented in Sect. 3; the representation and modeling procedures of the TDF-\(\hbox {DGM}_M\) are described in Sect. 4; relationship and difference between the TDF-\(\hbox {DGM}_M\) and FTDGM model is analyzed in Sect. 5; the PSO for optimizing the proposed model is presented in Sect. 6; four case studies to verify the validity of the model are shown in Sect. 7; the case study of forecasting the natural gas consumption in China’s manufacturing industry is shown in Sect. 8, and the conclusions are drawn in Sect. 9.

2 Brief overview of background

By consulting relevant data in Ref National bureau of statistics (2002), we have collected the development trend of the GDP of various industries, as shown in Fig. 1.

Fig. 1
figure 1

The gross domestic product change trend of countrywide each industry

Fig. 2
figure 2

Trends in energy consumption across the countrywide each industry

As can be seen from the Fig. 1, the manufacturing industry accounts for a large proportion of GDP in comparison with other industries, maintaining at about 31%, and its GDP has been on a steady rise, but the proportion is declining. Given the great contribution of the manufacturing industry to GDP, the development trend of national total energy consumption and energy consumption of individual industries is shown in Fig. 2. Compared with other industries, the manufacturing industry accounts for a very large proportion of the total energy consumption in China, which is maintained at about 57%, and its energy consumption shows a sharp rise. It shows that the energy consumed by the manufacturing industry is not proportional to its contribution to the national GDP, and the consumption structure of the manufacturing energy needs to be further improved.

The ratio between the GDP of various domestic industries and energy consumption and its development trend is shown in Fig. 3. It can be seen from Fig. 3 that both the national ratio and the manufacturing ratio are relatively low. It is precisely because the GDP of the manufacturing industry, mining industry, and other industries are not in direct proportion to the energy consumption so that the national GDP is not in direct proportion to the total energy consumption. For short, although manufacturing contributes a lot to GDP, its energy consumption is larger. Thus it is clear that its GDP contribution is relatively less inefficient.

Fig. 3
figure 3

Trends in the ratio of GDP to energy consumption across the country and in various industries

Since both the gross domestic product and the energy consumption account for a very large proportion in the manufacturing industry in China, the changing trend of various energy consumption in the manufacturing industry, and the energy consumption structure in the manufacturing industry are further considered as shown in Fig. 4. Among them, other energy includes most of the polluting energy such as coal. As can be seen from the Fig. 4, in the energy consumption structure of the manufacturing industry, unclean energy sources such as coal in other energy, coke, crude oil account for a large proportion, while the consumption of clean energy sources, such as natural gas, accounts for a small proportion. Therefore, the energy consumption structure of the manufacturing industry is not in an optimal situation.

Fig. 4
figure 4

Various energy consumption trends in the manufacturing industry

3 The grey model with fractional order accumulation

3.1 Definitions of the fractional order accumulation

Definition 1

(See Ref Lifeng et al. (2013)) Let \(X^{(0)}=\Big ( x^{(0)}(1),x^{(0)}(2),...,x^{(0)}(n)\Big ) \) be an original sequence. The corresponding r-order fractional order accumulation (FOA) \(X^{(r)}=\Big ( x^{(r)}(1),x^{(r)}(2),...,x^{(r)}(n)\Big ) \) is defined as

$$\begin{aligned} x^{(r)}(k)=\sum _{i=1}^{k}{\left( {\begin{array}{c}k-i+r-1\\ k-i\end{array}}\right) }x^{(0)}(i),k=1,2,...,n, \end{aligned}$$
(1)

where

$$\begin{aligned} \left( {\begin{array}{c}k-i+r-1\\ k-i\end{array}}\right) =\frac{(k-i+r-1)(k-i+r-2)...(r+1)r}{(k-i)!} \end{aligned}$$

is the general Newton binomial coefficient, and r is the order of the FOA, which is often a non-negative real number. Particularly, \(\left( {\begin{array}{c}r-1\\ 0\end{array}}\right) =1,\left( {\begin{array}{c}k-1\\ k\end{array}}\right) =0,k=1,2,...,n\).

Definition 2

(See Ref Lifeng et al. (2013)) Let \(X^{(0)}=\Big ( x^{(0)}(1),x^{(0)}(2),...,x^{(0)}(n)\Big ) \) be an original sequence, where \(x^{(0)}(k)\) is the value at time. Then the corresponding r-order fractional order inverse accumulation (IFOA) \(X^{(r)}=\Big ( x^{(r)}(1),x^{(r)}(2),...,x^{(r)}(n)\Big ) \) is defined as

$$\begin{aligned} x^{(-r)}(k)=\sum _{i=1}^{k}{\left( {\begin{array}{c}k-i-r-1\\ k-i\end{array}}\right) }x^{(0)}(i),k=1,2,...,n, \end{aligned}$$
(2)

where

$$\begin{aligned} \left( {\begin{array}{c}k-i-r-1\\ k-i\end{array}}\right) =\frac{(k-i-r-1)(k-i-r-2)...(-r+1)(-r)}{(k-i)!}. \end{aligned}$$

Particularly, \(\left( {\begin{array}{c}r-1\\ 0\end{array}}\right) =1, \left( {\begin{array}{c}k-1\\ k\end{array}}\right) =0, k=1,2,...,n.\)

3.2 The fractional order grey model

Let the r-order accumulation sequence of the non-negative sequence \(X^{(0)}=\Big ( x^{(0)}(1),x^{(0)}(2),...,x^{(0)}(n) \Big ) \) be \(X^{(r)}=\Big ( x^{(r)}(1),x^{(r)}(2),...,x^{(r)}(n)\Big )\). In Ref Lifeng et al. (2014), the fractional order additive grey model is represented as the following differential equation:

$$\begin{aligned} \frac{dx^{(r)}(t)}{dt}+ax^{(r)}(t)=b, \end{aligned}$$
(3)

which is often called the whitening equation of the FGM. The discrete form is often represented as the following difference equation:

$$\begin{aligned} x^{(r)}(k)-x^{(r)}(k-1)+az^{(r)}(k)=b, \end{aligned}$$
(4)

where

$$\begin{aligned} z^{(r)}(k)=\frac{1}{2}[x^{(r)}(k)+x^{(r)}(k-1)],k=2,3,...,n \end{aligned}$$

is called the background value.

Once given the fractional order r, the linear parameters a, b of the FGM are often estimated by the least squares method as

$$\begin{aligned}{}[a,b]^{T}=(B^{T}B)^{-1}B^{T}Y, \end{aligned}$$
(5)

where

$$\begin{aligned} B=\left[ \begin{array}{ccc} -z^{(r)}(2) &{} 1 \\ -z^{(r)}(3) &{} 1 \\ \vdots &{} \vdots \\ -z^{(r)}(n) &{} 1 \end{array} \right] , Y=\left[ \begin{array}{ccc} x^{(r)}(2)-x^{(r)}(1) \\ x^{(r)}(3)-x^{(r)}(2)\\ \vdots \\ x^{(r)}(n)-x^{(r)}(n-1) \end{array} \right] . \end{aligned}$$

Set \(x^{(r)}(1)=x^{(0)}(1)\), the solution of the Eq. (4) be given by

$$\begin{aligned} {\hat{x}^{(r)}}(k)=[x^{(0)}(1)-\frac{b}{a}]e^{-ak}+\frac{b}{a},k=1,2,...,n. \end{aligned}$$
(6)

The restored values \(\hat{x}^{(0)}\) can be obtained by the r-order IFOA as

$$\begin{aligned} {\hat{x}^{(0)}}(k)=\sum _{i=1}^{k}\left( {\begin{array}{c}k-i-r-1\\ k-i\end{array}}\right) {\hat{x}^{(r)}}(i). \end{aligned}$$

4 Time-delayed fractional discrete grey model with multiple fractional order

Let the \(r_1\)-order accumulation sequence of the non-negative sequence \(X^{(0)}=\Big ( x^{(0)}(1),x^{(0)}(2),...,x^{(0)}(n)\Big )\) be \(X^{(r_1)}=\Big (x^{(r_1)}(1),x^{(r_1)}(2),...,x^{(r_1)}(n)\Big )\). Let the \(r_2\)-order cumulative sequence of the sequence \(N^{(0)}=(1,2,...,n)\) be \(N^{(r_2)}=\Big ( 1^{(r_2)},2^{(r_2)},...,n^{(r_2)}\Big )\).

Considering the fractional time-delayed effect, the Eq. (3) can be extended to

$$\begin{aligned} \frac{dx^{(r_1)}(t)}{dt}+ax^{(r_1)}(t)=bt^{(r_2)}+c. \end{aligned}$$
(7)

The derivative in Eq. (7) can be approximated by

$$\begin{aligned} \left. \frac{d x^{(r_1)}(t)}{d t}\right| _{t=k} \approx \left. \lim _{\Delta t\rightarrow 1}\frac{\Delta x^{(r_1)}(t)}{\Delta t}\right| _{t=k} = \frac{x^{(r_1)}(k+1)-x^{(r_1)}(k)}{(k+1)-k}=x^{(r_1)}(k+1)-x^{(r_1)}(k). \end{aligned}$$
(8)

Substituting Eq. (8) into Eq. (7), we have

$$\begin{aligned} x^{(r_1)}(k+1)-x^{(r_1)}(k)+ax^{(r_1)}(k)=bk^{(r_2)}+c, \end{aligned}$$

that is

$$\begin{aligned} x^{(r_1)}(k+1)=(1+a)x^{(r_1)}(k)+bk^{(r_2)}+c. \end{aligned}$$

Let \(\beta _1=1+a, \beta _2=b, \beta _3=c\); then we get the basic form of TDF-\(\hbox {DGM}_M\) as

$$\begin{aligned} x^{(r_1)}(k+1)=\beta _1x^{(r_1)}(k)+\beta _2k^{(r_2)}+\beta _3,k=2,3,...,n-1. \end{aligned}$$
(9)

Once given the fractional order \(r_1\) and \(r_2\), the linear parameters \(\beta _1, \beta _2, \beta _3\) of the TDF-\(\hbox {DGM}_M\) can be estimated by the least squares method as

$$\begin{aligned} {[}\beta _1,\beta _2,\beta _3]^{T}=(B^{T}B)^{-1}B^{T}Y, \end{aligned}$$
(10)

where

$$\begin{aligned} B=\left[ \begin{array}{ccc} x^{(r_1)}(1) &{} 1^{(r_2)} &{} 1 \\ x^{(r_1)}(2) &{} 2^{(r_2)} &{} 1\\ \vdots &{} \vdots &{} \vdots \\ x^{(r_1)}(n-1) &{} (n-1)^{(r_2)} &{} 1 \end{array} \right] , Y=\left[ \begin{array}{ccc} x^{(r_1)}(2) \\ x^{(r_1)}(3)\\ \vdots \\ x^{(r_1)}(n) \end{array} \right] . \end{aligned}$$

Set \(\hat{x}^{(r_1)}(1)=x^{(0)}(1)\); by recursively solving the Eq. (9), the discrete response function of TDF-\(\hbox {DGM}_M\) can be obtained as

$$\begin{aligned} {\hat{x}^{(r_1)}}(k+1)={\hat{\beta }_1^k}x^{(0)}(1)+\hat{\beta }_2\sum _{i=1}^{k}{\hat{\beta }_1^{k-i}i^{(r_2)}}\mathrm{{+}}\frac{1-\hat{\beta }_1^k}{1-\hat{\beta }_1}\hat{\beta }_3,k=2,3...,n-1. \end{aligned}$$
(11)

The restored values \(\hat{x}^{(0)}(k)\) can be obtained using the \(r_1\)-order IFOA as

$$\begin{aligned} {\hat{x}^{(0)}}(k)=\sum _{i=1}^{k}\left( {\begin{array}{c}k-i-r_1-1\\ k-i\end{array}}\right) {\hat{x}^{(r_1)}}(i). \end{aligned}$$
(12)

The detailed computational processes are summarized in Algorithm 1.

figure a

5 Relationship and difference between the TDF-\(\hbox {DGM}_M\) and FTDGM model

As described above, the proposed TDF-\(\hbox {DGM}_M\) is derived from a whitening equation of a grey system using the discrete modeling technique. To further analyze the properties of this model, another similar time-delayed model FTDGM in Ref Ma et al. (2019) is used for theoretical comparison, including the modeling mechanism, unbiasedness, and flexibility.

5.1 Difference in modeling mechanism

For convenience, the modeling details of the FTDGM in Ref Ma et al. (2019) and the proposed TDF-\(\hbox {DGM}_M\) are summarized in Table 1.

First, it can be noticed that the TDF-\(\hbox {DGM}_M\) is essentially a more general formulation of FTDGM as it can yield FTDGM when \(r_1 =r_2\). And this generality will make it more flexible which will be discussed in the last subsection in this section.

Second, the basic form of the FTDGM is obtained by integrating and discretizing the two ends of its whitening equation. However, the basic form of the TDF-\(\hbox {DGM}_M\) is obtained by discretizing the derivatives of its whitening equation. This will make the modeling procedures of the TDF-\(\hbox {DGM}_M\) easier to implement. As shown in the last second row, the solution of the FTDGM is obtained by solving the whitening equation through the general solution formula of the ordinary differential equation, and its discrete-time response function is obtained from this solution by a numerical formula. However, the solution of the TDF-\(\hbox {DGM}_M\) is obtained by recursing its basic form directly, making it more convenient for practical application.

Table 1 Comparison of the modeling procedures of TDF-\(\hbox {DGM}_M\) the FTDGM

5.2 Difference in unbiasedness

Actually, a general analysis of the unbiasedness of the fractional discrete multivariate grey model has been proved in Ref Ma et al. (2019). Similarly, the analysis method can also be used in this work as the proposed model TDF-\(\hbox {DGM}_M\) also used a similar methodology, namely, the discrete modeling technique and fractional order accumulation.

According to Ref Ma et al. (2019), a grey model which is an unbiased model should satisfy the condition that its response function should satisfy its discrete formulation. For FTDGM, there should hold equality when substituting its discrete function to its basic form. However, the left-side of the FTDGM is actually

$$\begin{aligned} \begin{aligned} L_C(k)&=x^{(r)}(k+1)-x^{(r)}(k)+az^{(r)}(k)\\&=x^{(r)}(k+1)-x^{(r)}(k)+\frac{a}{2}[x^{(r)}(k+1)+x^{(r)}(k)]\\&=\frac{a+2}{2}\Big (x^{(0)}(1)e^{-ka}+\sum ^{k}_{s=1}{\frac{1}{2}[f(s+1)+f(s)]e^{a(s-k-\frac{1}{2})}}\Big )\\&\quad +\frac{a-2}{2}\Big (x^{(0)}(1)e^{a(1-k)}+\sum ^{k-1}_{s=1}{\frac{1}{2}[f(s+1)+f(s)]e^{a(s-k+\frac{1}{2})}}\Big )\\&=\frac{a+2}{2}\Big (x^{(0)}(1)e^{-ka}+\sum ^{k-1}_{s=1}{\frac{1}{2}[b(s+1)^{(r)}+bs^{(r)}+2c]e^{a(s-k-\frac{1}{2})}}\\&\quad +\frac{1}{2}[b(k+1)^{(r)}+bk^{(r)}+2c]e^{-\frac{1}{2}a}\Big )\\&\quad +\frac{a-2}{2}\Big (x^{(0)}(1)e^{-ka}e^{a}+\sum ^{k-1}_{s=1}{\frac{1}{2}[b(s+1)^{(r)}+bs^{(r)}+2c]e^{a(s-k+\frac{1}{2})}e^{a}}\Big )\\&=[\frac{a+2}{2}+\frac{a-2}{2}e^a]\Big (x^{(0)}(1)e^{-ka}+\sum ^{k-1}_{s=1}{\frac{1}{2}[b(s+1)^{(r)}+bs^{(r)}+2c]e^{a(s-k-\frac{1}{2})}}\Big )\\&\quad +\frac{a+2}{4}[b(s+1)^{(r)}+bs^{(r)}+2c]e^{\frac{a}{2}} \end{aligned} \end{aligned}$$

And the right-side of the FTDGM basic form in Table 1 is

$$\begin{aligned} \begin{aligned} R_C(k)&=bm^{(r)}(k)+c\\&=\frac{b}{2}[k^{(r)}+(k+1)^{(r)}]+c\\&=\frac{1}{2}[bk^{(r)}+b(k+1)^{(r)}+2c] \end{aligned} \end{aligned}$$

Obviously, we have

$$\begin{aligned} L_C(k)\ne R_C(k). \end{aligned}$$

And when |a| is small, the discrete response function the FTDGM approximates its basic form. On the contrary, when |a| is larger, the discrete response function is more different from its basic form, which leads to the larger error of the FTDGM. Thus it is obvious that the FTDGM is a biased model.

Similarly, we can also check the unbiasedness of the proposed TDF-\(\hbox {DGM}_M\). Substituting discrete solution into the left-side of the TDF-\(\hbox {DGM}_M\) basic form in Table 1, there is

$$\begin{aligned} \begin{aligned} L_D(k)&=x^{(r_1)}(k+1)-\beta _1x^{(r_1)}(k)\\&=\beta _1^kx^{(0)}(1)+\beta _2\sum _{i=1}^{k}{\beta _1^{k-i}i^{(r_2)}}\mathrm{{+}}\frac{1-\beta _1^k}{1-\beta _1}\beta _3\\&-\beta _1[\beta _1^{k-1}x^{(0)}(1)+\beta _2\sum _{i=1}^{k-1}{\beta _1^{k-i-1}i^{(r_2)}}\mathrm{{+}}\frac{1-\beta _1^{k-1}}{1-\beta _1}\beta _3]\\&=\beta _2\sum _{i=1}^{k}{\beta _1^{k-i}i^{(r_2)}}\mathrm{{-}}\beta _2\sum _{i=1}^{k-1}{\beta _1^{k-i}i^{(r_2)}}\mathrm{{+}}\frac{1-\beta _1^k}{1-\beta _1}\beta _3-\frac{\beta _1-\beta _1^k}{1-\beta _1}\beta _3\\&=\beta _2k^{(r_2)}+\beta _3\\&=R_D(k) \end{aligned} \end{aligned}$$

In short we have:

$$\begin{aligned} L_D(k)=R_D(k). \end{aligned}$$

The above discussions mean that the solution and the basic form of the TDF-\(\hbox {DGM}_M\) are equivalent. Thus the TDF-\(\hbox {DGM}_M\) is an unbiased model.

For a better explanation, several numerical tests are presented to show the unbiasedness of these two models. For FTDGM, the original series \(X^{(0)}\) is generated using its response function in Table 1 as the ideal data. The parameters a is given in the interval[-2,2] by the step of 0.01, and r is given in the interval[0.01,2] by the step of 0.01, respectively. The other parameters b and c are randomly generated in the interval(0,5) by the uniform distribution, and the initial point \(x^{(0)}(1)\) is randomly generated in the interval(0,1) by the uniform distribution, respectively. Ten points are generated for each series, in which the first six points are used for modeling, and the rest four points are used for testing. Then the FTDGM models are established based on these ideal data, and the mean absolute percentage error (MAPE) for testing is as shown in Fig. 5.

Fig. 5
figure 5

Testing MAPEs of FTDGM with different values of r and a

Figure 5 clearly illustrates the biasedness of the FTDGM. It also clear that when |a| is small, errors of FTDGM are smaller with smaller |a|, and they are larger with larger |a|.

Similar to the above experiment of FTDGM, the series \(X^{(0)}\) is generated by the discrete solution of the TDF-\(\hbox {DGM}_M\) as ideal data. To make the verification results comparable and more intuitive, the fractional order is set to be equivalent \(r=r_1=r_2\). Then the parameter \(\beta _1\) is given in the interval[-2,2] by the step of 0.01, and r is given in the interval[0.01,2] by the step of 0.01, respectively. The other parameters \(\beta _2\) and \(\beta _3\) are randomly generated in the interval(0,5) by the uniform distribution, and the initial point \(x^{(0)}(1)\) is randomly generated in the interval(0,1) by the uniform distribution, respectively. Data scale and divisions for modeling and testing are set to be the same as the above experiment. Then the TDF-\(\hbox {DGM}_M\) models (\(r=r_1=r_2\)) are established for these ideal data, and the MAPEs for testing are as shown in Fig. 6. It can be clearly seen that all the MAPEs of TDF-\(\hbox {DGM}_M\) are smaller than \(10^{-8}\), which are only truncated errors caused by computer precision. And the parameters do not affect the accuracy of the TDF-\(\hbox {DGM}_M\).

Fig. 6
figure 6

Testing MAPEs by TDF-\(\hbox {DGM}_M\) with different values of \(r(=r_1=r2)\) and \(\beta _1\)

5.3 Difference in flexibility

As mentioned above, the multiple fractional order will make the TDF-\(\hbox {DGM}_M\) more flexible. This subsection mainly discusses the flexibility of the multivariate fractional order of FTDGM to illustrate the flexibility of the multivariate fractional order of TDF-\(\hbox {DGM}_M\). Recalling the analysis in Ref Ma et al. (2019) that the fractional time-delayed term is actually a function more than integer order polynomials, \(i.\ e.\)

$$\begin{aligned} k^{(r)}=\left\{ \begin{array}{cc} k &{}\quad r=0 \\ \frac{1}{2} k(k+1) &{}\quad r=1 \\ \frac{1}{6} k(k+1)(k+2) &{}\quad r=2 \end{array}\right. \end{aligned}$$
(13)

It can be noticed that if the r is completely free, then the form of \(k^{(r)}\) can be richer, and this will make the model more flexible. However, the FTDGM uses a unified fractional order for the time-delayed term \(k^{(r)}\) and \(x^{(r)}\), this makes the variation of the time-delayed term not be a free polynomial, and further limits the flexibility of the FTDGM.

For more intuitive analysis, a simple example is illustrated to show such flexibility with multiple fractional orders of these two models. Figure 7 plots the cases of these two models with unified fractional order and multiple fractional order. Out of interest, we also tried to make FTDGM with multiple orders in this case. It is clear to see that if the \(r_2\) of the time-delayed term changes the produced curves of both FTDGM and TDF-\(\hbox {DGM}_M\) have more shapes, and this property will provide more possibilities for the models to better fit the sample data. Further, it can be easily deduced that this property can make the models more flexible and make them capable to deal with more complex time series.

Fig. 7
figure 7

Output series by FTDGM and TDF-\(\hbox {DGM}_M\) when \(r_2=r_1\) and \(r_2\ne r_1\)

6 Optimization of the fractional order \(r_1\) and \(r_2\) based on particle swarm optimization

6.1 Formulating the nonlinear optimization problem for \(r_1\) and \(r_2\)

The main idea of finding the optimal value of \(r_1\) and \(r_2\) is to minimize the errors of the TDF-\(\hbox {DGM}_M\) with independent fractional orders. Generally, we use the MAPE as the main criteria and then the optimization problem for finding the optimal \(r_1\) and \(r_2\) can be formulated as

$$\begin{aligned}&\min J(r_1,r_2)=\frac{1}{V} \sum _{k=1}^{V}|\frac{{\hat{x}^{(0)}}(k)-x^{(0)}(k)}{x^{(0)}(k)}| \mathrm{{\times }} 100\%\nonumber \\&\quad s.\ t. \left\{ \begin{array}{l} {[\beta _1,\beta _2,\beta _3]^{T}=(B^{T}B)^{-1}B^{T}Y} \\ {B=\left[ \begin{array}{cccc} x^{(r_1)}(1) &{} x^{(r_1)}(2) &{} \cdots &{} x^{(r_1)}(n-1) \\ 1^{(r_2)} &{} 2^{(r_2)} &{} \cdots &{}(n-1)^{(r_2)} \\ 1 &{} 1 &{} \cdots &{} 1 \end{array} \right] ^T} \\ {Y=[x^{(r_1)}(2),x^{(r_1)}(3),\ldots ,x^{(r_1)}(n)]^T} \\ {x^{(r_1)}(k+1)=\beta _1x^{(r_1)}(k)+\beta _2k^{(r_2)}+\beta _3,k=1,2,...,n-1} \\ {{\hat{x}^{(r_1)}}(k+1)={\hat{\beta _1}^k}x^{(0)}(1) + \hat{\beta _2} \sum _{i=0}^{k-1}{\hat{\beta _1}^i}(k-i)^{(r_2)} }\\ {\mathrm{{+}} \frac{1-\hat{\beta _1}^k}{1-\hat{\beta _1}}\hat{\beta _3},k=2,3,...,n-1} \\ {{\hat{x}^{(0)}}(k)=\sum _{i=1}^kC_{k-i-r_1-1}^{k-i}{\hat{x}^{(r_1)}}(i),k=1,2,...,n} \end{array} \right. , \end{aligned}$$
(14)

where V represents the number of data points used for estimating parameters \(\beta _1, \beta _2\), and \(\beta _3\), that is used for modeling.

It can be seen that the objective function is a nonlinear function of \(r_1\) and \(r_2\), and there exist several nonlinear constraints; thus this optimization problem is essentially nonlinear programming. The explicit expression of the objective function and the constraints are very complex, and thus it cannot be solved analytically. It should be noticed that such formulation is often used for the optimization of the existing nonlinear grey models in Ref Pei et al. (2018), Zheng-Xin (2014).

Fig. 8
figure 8

The flowchart of TDF-\(\hbox {DGM}_M\) model based on the PSO

6.2 Solving the nonlinear programming using particle swarm optimization

Particle swarm optimization is a population-based stochastic optimization technique developed by Eberhart and Kennedy in 1995, inspired by the social behavior of bird flocking or fish schooling. In the past several years, the PSO has been successfully applied in many research and application areas. It is demonstrated that the PSO gets better results in a faster and cheaper way compared with other methods. Another reason for choosing PSO is that there are few parameters to adjust. One version with slight variations works well in a wide variety of applications.

In each iteration, the particle updates its speed and position through the individual extremum and the group extremum. The change of them is defined as

$$\begin{aligned} B=\left\{ \begin{array}{l} V^{k+1}_{id}+c_1r_1(P^{k}_{id}-X^{k}_{id})+c_2r_2(P^{k}_{gd}-X^{k}_{id})\\ X^{k+1}_{id}=X^{k}_{id}+V^{k+1}_{id} \end{array}\right. \end{aligned}$$
(15)

where \(\omega \) is the weight of inertia, \(d=1,2,...,D\), \(i=1,2,...,n\), k is the current iteration number, \(V_{id}\) is the speed of the particle, \(c_1\) and \(c_2\) are non-negative constant which was called the acceleration factor, and set to \(c_1=c_2=2\) normally, the random number of \(r_1\) and \(r_2\) distribution in the interval[-2,2]. To prevent the blind searching of particles, it is generally recommended to limit their position and speed to a certain range \([-X_{max},X_{max}], [-V_{max}, -V_{max}]\).

The overall calculation steps of the TDF-\(\hbox {DGM}_M\) model based on the PSO can be briefly summarized in Fig. 8.

7 Validation

In this paper, the parameter optimization problem has changed from the one-dimensional nonlinear programming problem of FGM to the multi-dimensional nonlinear programming problem of TDF-\(\hbox {DGM}_M\), which makes parameter optimization more difficult. So we used four cases to verify the validity and accuracy of the TDF-\(\hbox {DGM}_M\). For all cases, the PSO will be compared with Grey Wolf Optimizer (GWO) and Genetic Algorithm (GA), and the TDF-\(\hbox {DGM}_M\) will be compared to other grey models, including the FTDGM, the time-delayed fractional discrete grey model with unique fractional order (\(\hbox {TDFDGM}_U\)), the fractional nonhomogeneous discrete grey model (FNDGM), the fractional discrete grey model (FDGM), the GM(1,1), the nonhomogeneous discrete grey model (NDGM), and the discrete grey model (DGM). Its data sources are shown in Table 2. The population size of PSO is set at 30, the maximum number of iterations is taken as the stop condition and set as 100 times, and each experiment is repeated 100 times. The GWO and GA parameter settings are the same as PSO. All the calculations have been done in Matlab 2015a.

Table 2 Raw data and relevant information of the four validation cases
Table 3 For the four validation cases, the minimum MAPE and the corresponding \(r_1\) and \(r_2\), and the meantime in the 100 experiments of TDF-\(\hbox {DGM}_M\) with PSO, GWO, and GA
Fig. 9
figure 9

The optimal parameters and MAPE of TDF-\(\hbox {DGM}_M\) in each trial of Case 1

Fig. 10
figure 10

The optimal parameters and MAPE of TDF-\(\hbox {DGM}_M\) in each trial of Case 2

Fig. 11
figure 11

The optimal parameters and MAPE of TDF-\(\hbox {DGM}_M\) in each trial of Case 3

Fig. 12
figure 12

The optimal parameters and MAPE of TDF-\(\hbox {DGM}_M\) in each trial of Case 4

Fig. 13
figure 13

The change of MAPE of TDF-\(\hbox {DGM}_M\) in the process of PSO, GWO, and GA

Fig. 14
figure 14

The change of \(r_1\) and \(r_2\) of TDF-\(\hbox {DGM}_M\) in the process of PSO

To compare the performance of the PSO, GWO, and GA, the minimum MAPE and the corresponding \(r_1\) and \(r_2\) of the TDF-\(\hbox {DGM}_M\) among the 100 trials obtained by the three algorithms are presented in Table 3. Meanwhile, the average time of 100 experiments is also given in the table. Besides, the optimal parameters and MAPE in each trial are shown in Figs. 9, 10, 11, and 12.

From Table 3, it can be seen that the PSO can achieve the best optimization effect and smaller objective function value in the same 100 trials, which shows that its convergence is better than the GWO and GA. In the four cases, the average test time of PSO is shorter, which shows that its convergence speed is faster than the GWO and GA. It can be seen from Figs. 10, 11, and 12 that the PSO has a bit slightly higher stability than GWO and GA in Case 2, Case 3, and Case 4. It is interesting to see in Fig. 9 that fluctuations of MAPE by PSO, GWO, and GA are similar, but that of PSO is more stable with smaller MAPE values.

Also, a set of optimization results in 100 trials is shown in Fig. 13. According to the experimental results from Fig. 13, the PSO has the fastest convergence speed and best convergence comparing with GWO and GA. This shows that the PSO needs fewer iterations to converge to the optimal value.

Above all, the PSO is finally selected to optimize the \(r_1\) and \(r_2\) of the above four cases. To make the grey models comparative, the PSO is used to optimize the parameters of the above eight grey models. The initial population size is set as 30, the stop criteria is set as \(10^{-6}\), and the maximum number of iteration is set as 500 times. Among them, the processes of \(r_1\) and \(r_2\) for optimizing TDF-\(\hbox {DGM}_M\) parameters by PSO are shown in Fig. 14. Then we calculated the MAPEs of fitting and prediction of the eight models, and the MAPEs of fitting and prediction of each case were obtained, as shown in Table 4.

According to Table 4, except for the fitting error of case4, the MAPEs of fitting and prediction of the discrete grey model with the independent fractional time-delayed term is lower than that of the other models in the above four cases. So the discrete grey model with the independent fractional time-delayed term is more appropriate for the four cases.

8 Application in forecasting natural gas consumption of the manufacturing industry of China

The raw data are collected from the statistics of energy consumption by industry in China’s statistical yearbook in the range of 2007–2017 (http://www.stats.gov.cn/tjsj/ndsj/).

Table 5 indicates that the consumption of natural gas in the national manufacturing industry is increasing year by year. The data from 2006 to 2010 will be used to build the models, and the data from 2011 to 2015 will be used to test their out-of-sample performance. The minimum MAPE and the corresponding \(r_1\) and \(r_2\) of the TDF-\(\hbox {DGM}_M\) among the 100 trials obtained by the three algorithms are presented in Table 6. Also, the average time of 100 experiments is given in the table.

It can be seen in Table 6 that the PSO has higher accuracy than GWO and GA, and the PSO has faster convergence speed. The optimized parameters of TDF-\(\hbox {DGM}_M\), FTDGM, \(\hbox {TDFDGM}_U\), FNDGM, FDGM by PSO are \(r_1\)=2, \(r_2\)=-0.8679, r=-0.1156, r=-0.3203, r=-2, r=0.8565, respectively. The fitting and prediction results of the eight models are shown in Table 7. And the absolute value of fitting error and prediction error of the eight models are shown in Fig. 15.

Fig. 15
figure 15

The absolute value of fitting error and prediction error of the eight models

Table 7 shows that the MAPE of TDF-\(\hbox {DGM}_M\) is smaller than that of the others. Figure 15 shows that the errors of the TDF-\(\hbox {DGM}_M\) are better than that of the others. So the TDF-\(\hbox {DGM}_M\) is more appropriate for forecasting the data of national manufacturing gas consumption. Thus, the fitting and prediction results of the eight models are plotted in Fig. 15.

Fig. 16
figure 16

Plots of fitting and forecasting values of the natural gas consumption of the manufacturing industry of China by the eight models

Figure 16 further illustrates the details of the fitting effect and prediction performance of these models. It is clear to see that the predicted values of TDF-\(\hbox {DGM}_M\) are much closer to the raw data, while most other models failed to catch the overall trend of the testing values. It is also very interesting to see that all these models perform quite well in fitting, especially the fitting errors of \(\hbox {TDFDGM}_U\) are smaller than \(1e-2\)%. Thus it is obvious that these models have over-fitted the sample data. On the contrast, this further presents the higher generality of the TDF-\(\hbox {DGM}_M\).

9 Conclusions

In this paper, a novel time-delayed fractional grey model with multiple fractional order, abbreviated as TDF-\(\hbox {DGM}_M\), was proposed and the PSO algorithm was employed to select its optimal values of two independent fractional orders. Results of the numerical validation with four real-world data sets were used to show the effectiveness of PSO and the priority of TDF-\(\hbox {DGM}_M\) over the seven existing grey models.

Table 4 The MAPEs obtained by fitting and predicting four validation cases with eight models, respectively
Table 5 The raw data of national manufacturing gas consumption (Billion cubic meters)
Table 6 For the natural gas in the national manufacturing industry of China, the minimum MAPE and the corresponding \(r_1\) and \(r_2\), and the meantime in the 100 experiments of TDF-\(\hbox {DGM}_M\) with PSO, GWO, and GA
Table 7 The fitting and prediction results of forecasting the natural gas consumption of the manufacturing industry of China by the eight models

Real-world application of forecasting the natural gas consumption of the manufacturing industry of China was executed with real-world data. The results showed that the proposed TDF-\(\hbox {DGM}_M\) model was significantly better than the other seven existing models. And it is also very interesting to see that the TDF-\(\hbox {DGM}_M\) was also more effective than its special form \(\hbox {TDFDGM}_U\) with unified fractional order. Further the results obtained in this paper illustrated that the TDF-\(\hbox {DGM}_M\) was eligible to forecast the natural gas consumption of the manufacturing industry of China.

What’s more, the methodology used to build the TDF-\(\hbox {DGM}_M\) model can also be regarded as a new way of the fractional grey modeling technique, which can be expected to build more fractional grey models with higher accuracy in the future.