Wind speed forecast based on combined theory, multi-objective optimisation, and sub-model selection

Fu, Tonglin; Zhang, Shenghui

doi:10.1007/s00500-022-07334-y

Wind speed forecast based on combined theory, multi-objective optimisation, and sub-model selection

Application of soft computing
Published: 23 August 2022

Volume 26, pages 13615–13638, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

225 Accesses
1 Citation
Explore all metrics

Abstract

Wind energy is the primary energy source for a sustainable and pollution-free global power supply. However, because of its characteristic irregularity, nonlinearity, non-stationarity, randomness, and intermittency, previous studies have only focused on stability or accuracy, and the forecast performances of their models were poor. Moreover, in previous research, the selection of sub-models used for the combined model was not considered, which weakened the generalisability. Therefore, to further improve the forecast accuracy and stability of the wind speed forecasting model, and to solve the problem of sub-model selection in the combined model, this study developed a wind speed forecasting model using data pre-processing, a multi-objective optimisation algorithm, and sub-model selection for the combined model. Simulation experiments showed that our combined model not only improved the forecasting accuracy and stability but also chose different sub-models and different weights of the combined model for different data; this improved the model generalisability. Specifically, the MAPEs of our model are less than 4.96%, 4.60%, and 5.25% in one-, two-, and three-step forecast. Thus, the proposed combined model is demonstrated as an effective tool for grid dispatching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A newly combination model based on data denoising strategy and advanced optimization algorithm for short-term wind speed prediction

Article 06 January 2022

An innovative forecasting model to predict wind energy

Article 31 May 2022

ICEEMDAN-Informer-GWO: a hybrid model for accurate wind speed prediction

Article 02 May 2024

Data availability

Data not available due to legal and commercial restrictions.

References

Abdollahzade M, Miranian A, Hassani H, Iranmanesh H (2015) A new hybrid enhanced local linear neuro-fuzzy model based on the optimized singular spectrum analysis and its application for nonlinear and chaotic time series forecasting. Inf Sci 295:107–125
Article Google Scholar
Aguilar Vargas S, Telles Esteves GR, Medina Maçaira P, Quaresma Bastos B, Cyrino Oliveira FL, Castro Souza R (2019) Wind power generation: a review and a research agenda. J Clean Prod 218:850–870
Article Google Scholar
Bates JM, Granger CWJ (2001) The combination of forecasts. In: Essays in econometrics. Cambridge University Press, Cambridge, pp 451–468
Brown BG, Katz RW, Murphy AH (1984) Time series models to simulate and forecast wind speed and wind power. J Appl Meteorol 23:1184–1195
Article Google Scholar
Bruninx K, Bergh KVD, Delarue E et al (2016) Optimization and allocation of spinning reserves in a low-carbon framework, IEEE Power and Energy Society General Meeting (PESGM). IEEE Trans Power Syst 31(2):872–882
Article Google Scholar
Chang W-Y (2014) A literature review of wind forecasting methods. Power Energy Eng 2:161–168
Article Google Scholar
Contreras J, Espinola R, Nogales F, Conejo A (2003) ARIMA models to predict next-day electricity prices. IEEE Trans Power Syst 18(3):1014–1020
Article Google Scholar
Damousis IG, Alexiadis MC, Theocharis JB, Dokopoulos PS (2004) A fuzzy model for wind speed prediction and power generation in wind parks using spatial correlation. IEEE Trans Energy Convers 19(2):352–361
Article Google Scholar
Deb K, Jain H (2014) An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans Evol Comput 18(4):577–601
Article Google Scholar
Diebold FX, Mariano R (1995) Comparing predictive accuracy. J Bus Econ Stat 20(1):134–144
Article MathSciNet Google Scholar
Dorvlo AS, Jervase JA, Al-Lawati A (2002) Solar radiation estimation using artificial neural networks. Appl Energy 71(4):307–319
Article Google Scholar
Egrioglu E, Aladag CH, Günay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597
MathSciNet MATH Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Enrique R, David B, Jorge-Juan B, Ana P (2019) Review of wind energy technology and associated market and economic conditions in Spain. Renew Sustain Energy Rev 101:415–427
Article Google Scholar
Fried L, Qiao L, Sawyer S (2021) Global wind report, global wind energy council. https://gwec.net/members-area-market-intelligence/reports/
Fu T, Zhang S, Wang C (2020) Application and research for electricity price forecasting system based on multi-objective optimization and sub-models selection strategy. Soft Comput 24(20):15611–15637
Article Google Scholar
Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. Ieee-Inns-Enns Int Jt Conf Neural Netw 3:189–194
Article Google Scholar
Global Wind Energy Council. Global wind statistics (2019), p. 2019 www.gwec.net/wpcontent/uploads/vip/GWEC_PRstats2018_EN_WEB.pdf.
Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232
Article MathSciNet Google Scholar
Grigonyte E, Butkeviciute E (2016) Short-term wind speed forecasting using ARIMA model. Energetika 62(1–2):45–55
Google Scholar
Guo ZH, Wu J, Lu HY, Wang JZ (2011) A case study on a hybrid wind speed forecasting method using BP neural network. Knowl Based Syst 24:1048–1056
Article Google Scholar
Heng J, Hong Y, Hu J, Wang S (2022) Probabilistic and deterministic wind speed forecasting based on non-parametric approaches and wind characteristics information. Appl Energy 306:118029
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
İnan G, GöKtepe AB, Ramyar K et al (2007) Prediction of sulfate expansion of PC mortar using adaptive neuro-fuzzy methodology. Build Environ 42(3):1264–1269
Article Google Scholar
Jang J-SR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–685
Article Google Scholar
Laumanns M, Thiele L, Zitzler E (2006) An efficient, adaptive parameter variation scheme for metaheuristics based on the epsilon-constraint method. Eur J Oper Res 169(3):932–942
Article MathSciNet MATH Google Scholar
Lei M, Shiyan L, Chuanwen J, Hongling L, Yan Z (2009) A review on the forecasting of wind speed and generated power. Renew Sustain Energy Rev 13(4):915–920
Article Google Scholar
Li G, Shi J (2010) On comparing three artificial neural networks for wind speed forecasting. Appl Energy 87(7):2313–2320
Article Google Scholar
Liu M, Ling YY (2003) Using fuzzy neural network approach to estimate contractors’ markup. Build Environ 38(11):1303–1308
Article Google Scholar
Liu Z, Jiang P, Wang J, Zhang L (2022) Ensemble system for short term carbon dioxide emissions forecasting based on multi-objective tangent search algorithm. J Environ Manag 302:113951
Article Google Scholar
Meng K, Yang H, Dong ZY, Guo W, Wen F, Xu Z (2016) Flexible operational planning framework considering multiple wind energy forecasting service providers. IEEE Trans Sustain Energy 7(2):708–717
Article Google Scholar
Neshat M, Adeli A, Sepidnam G (2012) Predication of concrete mix design using adaptive neural fuzzy inference systems and fuzzy inference systems. Int J Adv Manuf Technol 63(1–4):373–390
Article Google Scholar
Niu T, Wang J, Zhang K, Du P (2018) Multi-step-ahead wind speed forecasting based on optimal feature selection and a modified bat algorithm with the cognition strategy. Renew Energy 118:213–229
Article Google Scholar
Riahy G, Abedi M (2008) Short term wind speed forecasting for wind turbine applications using linear prediction method. Renew Energy 33:35–41
Article Google Scholar
Schwenker F, Kestler HA, Palm G (2001) Three learning phases for radial-basis- function networks. Neural Netw 14(4–5):439–458
Article MATH Google Scholar
Sfetsos A (2000) A comparison of various forecasting techniques applied to mean hourly wind speed time series. Renew Energy 21(1):23–35
Article Google Scholar
Shamshad A, Bawadi M, Hussin WW, Majid T, Sanusi S (2005) First and second order Markov chain models for synthetic generation of wind speed time series. Energy 30(5):693–708
Article Google Scholar
Smith DA, Mehta KC (1993) Investigation of stationary and nonstationary wind data using classical Box-Jenkins models. J Wind Eng Indus Aerodyn 49:319–328
Article Google Scholar
Soman SS, Zareipour H, Malik O et al (2010) A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium (NAPS). IEEE, 2010. 1–8.
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576
Article Google Scholar
Torres JL, Garca A, Blas MD, DeFrancisco A (2005) Forecast of hourly average wind speed with arma models in navarre (Spain). Sol Energy 79(1):65–77
Article Google Scholar
Vapnik V (1997) The nature of statistic learning theory. Springer, Berlin
Google Scholar
Wang X, Sideratos G, Hatziargyriou N et al (2004) Wind speed forecasting for power system operational planning. In: International conference on probabilistic methods applied to power systems. IEEE, pp 470–474
Wang J, Zhang W, Wang J et al (2014) A novel hybrid approach for wind speed prediction. Inf Sci 273:304–318
Article Google Scholar
Wang S, Zhang N, Wu L et al (2016) Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew Energy 94:629–636
Article Google Scholar
Wang JZ, Yang WD, Du P, Niu T (2018) A novel hybrid forecasting system of wind speed based on a newly developed multi-objective sine cosine algorithm. Energy Convers Manag 163:134–150
Article Google Scholar
Wang C, Zhang S, Xiao L, Fu T (2021) Wind speed forecasting based on multi-objective grey wolf optimisation algorithm, weighted information criterion, and wind energy conversion system: a case study in Eastern China. Energy Convers Manage 243:114402
Article Google Scholar
Wu J, Hsu C, Chen H (2009) An expert system of price forecasting for used cars using adaptive neuro-fuzzy inference. Expert Syst Appl 36(4):7809–7817
Article Google Scholar
Xiao L, Wang J, Dong Y, Wu J (2015) Combined forecasting models for wind energy forecasting: a case study in China. Renew Sustain Energy Rev 44:271–288. https://doi.org/10.1016/j.rser.2014.12.012
Article Google Scholar
Xiao L, Dong Y, Dong Y (2018) An improved combination approach based on Adaboost algorithm for wind speed time series forecasting. Energy Convers Manage 160:273–288
Article Google Scholar
Yan J, Li F, Liu Y, Gu C (2017) Novel cost model for balancing wind power fore- casting uncertainty. IEEE Trans Energy Convers 32(1):318–329
Article Google Scholar
Yang Y, Chen Y, Wang Y, Li C, Li L (2016) Modelling a combined method based on ANFIS and neural network improved by DE algorithm: a case study for short- term electricity demand forecasting. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2016.07.053
Article Google Scholar
Yu C, Li Y, Zhang M (2017) An improved wavelet transform using singular spectrum analysis for wind speed forecasting based on elman neural network. Energy Convers Manag 148:895–904
Article Google Scholar
Zhang W, Qu Z, Zhang K, Mao W, Ma Y, Fan X (2017) A combined model based on CEEMDAN and modified flower pollination algorithm for wind speed forecasting. Energy Convers Manag 136:439–451
Article Google Scholar
Zhang S, Wang J, Guo Z (2018) Research on combined model based on multi-objective optimization and application in time series forecast. Soft Comput. https://doi.org/10.1007/s00500-018-03690-w
Article Google Scholar
Zhang S, Wang C, Liao P, Xiao L, Fu T (2022) Wind speed forecasting based on model selection, fuzzy cluster, and multi-objective algorithm and wind energy simulation by Betz's theory. Expert Syst Appl 116509

Download references

Acknowledgements

This work was supported by Western Project of the National Social Science Foundation of China (Grant No.18XTJ003).

Funding

The work is funded by 'Western Project of the National Social Science Foundation of China (Grant No.18XTJ003).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Longdong University, Qingyang, Gansu, China
Tonglin Fu
State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science Organization, University of Macau, Macau, China
Shenghui Zhang

Authors

Tonglin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shenghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shenghui Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

For the data, only with a better understanding of the features of the data we can better select the model to prepare for future work. In order to achieve better results, we must consider the characteristics of the data. Generally speaking, the linear model has a better fitting effect for linear data, as the nonlinear model does for nonlinear data.

Only when we understand the characteristics of the data we can achieve good results in future forecasting work. For the data, it is not just linear or nonlinear, but both. Therefore, it is necessary to judge the linear nonlinearity of the data used in this paper, so we constructed the above experiments.

From the results of Tables 7 and 8, wind speed data are both linear and nonlinear by hypothesis test. So, the linear models and nonlinear models considered in our proposed forecasting model are correct and necessary.

Table 7 Testing wind speed data by adjusting to linear functions or nonlinear functions

Full size table

Table 8 The explanations of the test parameters

Full size table

1.1 Basic methods and theories

1.1.1 Support vector machines

The support vector machine (SVM) method is a machine learning method that was proposed by Vapnik (1997) and the nonlinear function can be written as follows:

$$ f \left( x \right) = \omega^{T} \phi \left( x \right) + b $$

(8)

where $\phi \left( x \right)$ is the kernel function that maps the data from low-dimensional space to high dimensional space. ${\upomega }$ and b are the coefficient and threshold, respectively.

Using loss function $\varepsilon$ to optimize regression function, and the best regression function is found by the minimum value of the loss function. $\varepsilon$ is as follows:

$$ \varepsilon = \min \frac{1}{2}\omega^{T} \omega + C\mathop \sum \limits_{i = 1}^{n} \left( {\xi_{i} + \hat{\xi }_{i} } \right) $$

(9)

Constraints:

$$ \left\{ {\begin{array}{*{20}l} {y_{i} - \left[ {\omega , \phi \left( {x_{i} } \right)} \right] - b \le \xi_{i} + \varepsilon } \hfill \\ {\left[ {\omega , \phi \left( {x_{i} } \right)} \right] + b - y_{i} \le \hat{\xi }_{i} + \varepsilon } \hfill \\ {\xi_{i} \ge 0,\hat{\xi }_{i} \ge 0 } \hfill \\ \end{array} } \right. $$

(10)

where C is a penalty factor: $\xi_{i}$, $ \hat{\xi }_{i}$ are slack variables, b is the offset, and $x_{i}$, $y_{i}$ are the input and output, respectively.

After the operation, the linear regression function can be obtained as follows:

$$ f\left( x \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\beta_{i} - \alpha_{i} } \right)K\left( {x_{g} x_{i} } \right) + b $$

(11)

where $ K\left( {x_{g} x_{i} } \right)$ is the kernel function of SVM.

1.1.2 Long short-term memory network

The long short-term memory (LSTM) network is a recurrent neural network (RNN), which was proposed by Hochreiter and Schmidhuber (Hochreiter and Schmidhuber 1997) in 1997 and can be written as follows (Greff et al. 2017):

$$ z^{t} = g(\omega_{z} x^{t} + r_{z} y^{t - 1} + b_{z} ) $$

(12)

$$ i^{t} = \sigma (\omega_{i} x^{t} + r_{i} y^{t - 1} + p_{i} c^{t - 1} + b_{i} ) $$

(13)

$$ f ^{t} = \sigma (\omega_{f} x^{t} + r_{f} y^{t - 1} + p_{f} c^{t - 1} + b_{f} ) $$

(14)

$$ c^{t} = z^{t} i^{t} + c^{t - 1} f^{t} $$

(15)

$$ o^{t} = \sigma (\omega_{o} x^{t} + r_{o} y^{t - 1} + p_{o} c^{t} + b_{o} ) $$

(16)

$$ y^{t} = h(c^{t} )o^{t} $$

(17)

$$ \sigma \left( x \right) = \frac{1}{{1 + e^{ - x} }} $$

(18)

$$ g\left( x \right) = h\left( x \right) = \tanh (x) $$

(19)

where t is the time; N is the cells of LSTM; $\omega_{z} , \omega_{i} , \omega_{f} , \omega_{o} \in R^{N \times M} $ are the input weights; $r_{z} , r_{i} , r_{f} , r_{o} \in R^{N \times M}$ are the recurrent weights; $p_{i} , p_{f} , p_{o} \in R^{N}$ are the peephole weights (Gers and Schmidhuber 2000); $b_{z} , b_{i} , b_{f} , b_{o} \in R^{N}$ are the biases; $ g\left( x \right), h\left( x \right)$ and $ \sigma \left( x \right)$ are activation functions; $z^{t}$ is the activation of the input block; $i^{t}$ is the activation of the input gate; $f ^{t}$ is the activation of the forget gate; $c^{t}$ is the cell state at time t; $o^{t}$ is the activation of the output gate; and $y^{t}$ is the output of the cell at time t.

1.1.3 Autoregressive integrated moving average

The autoregressive integrated moving average (ARIMA) model is one of the most popular forecasting models in the wind speed forecasting field (Contreras et al. 2003) and can be written as follows:

$$ \phi \left( B \right)\left( {1 - B} \right)^{d} X_{t} = \theta (B)\varepsilon_{t} $$

(20)

$$ \phi \left( B \right) = 1 - \phi_{1} B - \phi_{2} B^{2} - \cdots \phi_{P} B^{P} $$

(21)

$$ \theta \left( B \right) = 1 - \theta_{1} B - \theta_{2} B^{2} - \cdots - \theta_{q} B^{q} $$

(22)

These above formulas are recorded as ARIMA (p, d, q), where $B^{q} X_{t} = X_{t - q}$, $X_{t}$ is a time series at time t, $\varepsilon_{t}$ is the random error at time t, and B is the backward shift operator.

1.1.4 Back-propagation neural network

Back-propagation neural network (BPNN) is a widely used multi-layer feedforward neural network that is based on a gradient descent method that minimizes the sum of the squared errors between the actual output value and the expected output value. The output function is between 0 and 1, which can convert input to output to achieve continuous non-linear mapping (Guo et al. 2011) and can be written as follows:

The topology of the BPNN is as follows:

$$ X^{\prime } = \left\{ {X_{i}^{\prime } } \right\} = 2\frac{{X_{i} - X_{i\min } }}{{X_{i\max } - X_{i\min } }} - 1, \left( {i = 1, 2, \ldots , n} \right), X^{\prime } \subset [ - 1, 1] $$

(23)

where X_min and X_max are the minimum and maximum value of the input array or output vectors, and $X_{i}^{{\prime }}$ denotes the real value of each vector.

Step 1. Calculate the outputs of all hidden layer nodes:

$$ y_{j} = f (\mathop \sum \limits_{i} w_{ji} x_{i} + b_{j} ) = f({\text{net}}_{j} ) (i = 1, \ldots , n; j = 1, \ldots , 2n) $$

(24)

$$ {\text{net}}_{j} = \mathop \sum \limits_{i} w_{ji} x_{i} + b_{j} , ( j = 1, \ldots , 2n) $$

(25)

where the activation value of node j is ${\text{net}}_{j}$, $w_{ji}$ represents the connection weight from input node i to hidden node j, b_j represents the bias of neuron j, y_j represents the output of hidden layer node j, and f is the activation function of a node, which is usually a sigmoid function.

Step 2. Calculate the output data of the neural network:

$$ O_{1} = f_{0} \left( {\mathop \sum \limits_{j} w_{0j} y_{i} + b_{0} } \right), (i = 1, \ldots , 2n) $$

(26)

where $w_{0j}$ represents the connection threshold from hidden node j to the output node, b₀ represents the bias of the neuron, O₁ represents the output data of the network, and f₀ is the activation function of the output layer node.

Step 3. Minimize the global error via the training algorithm:

$$ {\text{Mean}}\,{\text{Square}}\,{\text{Error}} = \frac{1}{m}\sum (O_{1} - Z)^{2} $$

(27)

where Z represents the real data vector of the output, and m represents the number of outputs.

1.1.5 Generalized regression neural network

Generalized regression neural networks (GRNN) were proposed by Specht (1991), and the theoretical basis was nonlinear kernel regression analysis. The network was based on nonlinear regression theory and consisted of four layers of neurons: the input layer, pattern layer, summation layer, and output layer.

Definition 1

Setting the joint probability density function of the random variable x and y to f (x, y), the observed value of x is X. Thus, the estimation value as follows:

$$ \hat{Y} = E\left[ {\left. y \right|X} \right] = \mathop \int \limits_{ - \infty }^{ + \infty } xf (X, y){\text{d}}y/\mathop \int \limits_{ - \infty }^{ + \infty } f (X, y){\text{d}}y $$

(28)

Definition 2

To set the probability density function f (x, y), which is unknown, but can be obtained using a nonparametric estimation, we use the sample observation values of x and y:

$$ \hat{f} (x, y) = \frac{1}{{n\delta^{p + 1} (2\pi )^{(p + 1)/2} }}\mathop \sum \limits_{t = 1}^{n} \exp \left[ { - \frac{{(X - X_{t} )^{T} (X - X_{t} )}}{{2\delta^{2} }}} \right]\exp \left[ { - \frac{{(Y - Y_{t} )}}{{2\delta^{2} }}} \right] $$

(29)

where X_t and Y_t are the sample observation values of x and y, respectively; $\delta$ is the smoothing parameter; n is the number of samples; and p is the dimensionality of the random vector x. Here, we can calculate Y using f (x, y) instead of f (x, y) in formula (30). Finally,

$$ \hat{Y}(X) = \mathop \sum \limits_{t = 1}^{n} Y_{t} \exp \left[ { - \frac{{(X - X_{t} )^{T} (X - X_{t} )}}{{2\delta^{2} }}} \right]/\mathop \sum \limits_{t = 1}^{n} \exp \left[ { - \frac{{(X - X_{t} )^{T} (X - X_{t} )}}{{2\delta^{2} }}} \right] $$

(30)

where $\hat{Y}(X)$ is the weighted average of all sample observations of $Y_{t}$, and every weight factor $Y_{t}$ is the Euclidean squared distance index value between the corresponding samples $X_{t}$ and X.

1.1.6 Radial basis function neural network

The radial basis function neural network (RBFNN) (Schwenker et al. 2001) is an efficient feedforward neural network that exhibits better approximation performance and global optimal ability than other feedforward networks. The structure of the neural network is simple and the training speed is fast. An RBFNN also includes three layers: an input layer, a hidden layer with a nonlinear activation function (the radial basis function), and an output layer.

The RBFNN was proposed by Schwenker, and this network exhibits better approximation performance and global optimal ability than other feedforward networks. The neural network also has a simple structure and fast training speed. There are three layers: an input layer, an output layer, and a hidden layer with a nonlinear activation function.

Definition 1

The modelled input is a real-number vector $x \in {\mathbb{R}}^{n}$. Then, the output vector of the neural network is a scalar function of the input vector, $\varphi :{\mathbb{R}}^{n} \to {\mathbb{R}}$, given by

$$ \varphi (x) = \mathop \sum \limits_{i = 1}^{N} a_{i} \rho (\left\| {x - c_{i} } \right\|), $$

(31)

where N represents the total number of neurons in the hidden layer, the centre vector of the ith neuron is represented by ${\text{c}}_{{\text{i}}}$, and ${\text{a}}_{{\text{i}}}$ is the weight of neuron i in the linear output neuron.

Definition 2

A radial basis function is a scalar function that is radially symmetric. Thus, it is defined as a monotone function of the Euclidean distance between any point x in space and $c_{i}$ of a centre vector. The most commonly used radial basis function is the Gauss kernel function, given as follows:

$$ k(x - c_{i} ) = \exp \left\{ {\frac{{\left\| {x - c_{i} } \right\|^{2} }}{{\left( {2\sigma } \right)^{2} }}} \right\} $$

(32)

where $c_{i}$ is a centre vector and $\sigma$ is a width parameter that controls the radial scope of the function.

1.1.7 Adaptive network-based fuzzy inference system

Jang (1993) combined the best features of the fuzzy system and neural network to construct an adaptive network-based fuzzy inference system (ANFIS). ANFIS integrates the human inference style of fuzzy inference system (FIS) by using input–output sets and a set of if–then fuzzy rules. FIS (Neshat et al. 2012) has structured knowledge, in which each fuzzy rule describes the current behaviour of the system; however, it lacks adaptability to changes in the external environment. Therefore, the concept of neural network learning and FIS are combined in ANFIS (İnan et al. 2007). ANFIS is a method that uses neural network learning and fuzzy inference to simulate complex nonlinear mapping. This method has the ability to deal with the uncertain noisy and imprecise environments (Liu and Ling 2003). ANFIS uses the training process of the neural network to adjust the membership function and the related parameters close to the expected data set (Wu et al. 2009).

Layer 1: This layer is the input layer, which is responsible for the fuzziness of the input signal. Each node I is a node function represented by square node:

$$ O_{i}^{1} = \mu_{{A_{i} }} \left( x \right), i = 1,2\quad {\text{or}}\quad O_{i}^{1} = \mu_{{B_{i} }} \left( y \right), i = 1,2 $$

(33)

where x (or y) is the input of node i, A_i, B_i are fuzzy sets, and $O_{i}^{1}$ is the membership function value of A_i and B_i, indicating the degree to which X and Y belong to A_i and B_i. Usually, the $\mu_{{A_{i} }}$ and $\mu_{{B_{i} }}$ are chosen as bell-shaped functions or Gaussian functions. The membership function has some parameters; these parameters are called premise parameters.

Layer 2: The nodes in this layer are responsible for multiplying the input signals and calculating the firing strength of each rule. The output is

$$ O_{i}^{2} = \omega_{i} = \mu_{{A_{i} }} \left( x \right) \times \mu_{{B_{i} }} \left( y \right), i = 1,2 $$

(34)

where the output of each node represents the credibility of the rule.

Layer 3: This layer normalizes all applicability, each node is represented by N. The ratio of the ith rule’s firing strengths to the sum of all rules’ firing strengths is calculated by the ith node:

$$ O_{i}^{3} = \overline{\omega }_{i} = \omega_{i} /\left( {\omega_{1} + \omega_{2} } \right) i = 1,2. $$

(35)

Layer 4: Calculating the output of the fuzzy rule, the output is

$$ O_{i}^{4} = \overline{\omega }_{i} f_{i} = (p_{i} x + q_{i} y + r_{i} ) i = 1,2 $$

(36)

where $\overline{\omega }_{i}$ is the output of layer 3 and {p_i, q_i, r_i} is the parameter set (consequent parameters).

Layer 5: The single node of this layer is a fixed node that calculates the total output of all input signals:

$$ O_{i}^{5} = \mathop \sum \limits_{i} \overline{\omega }_{i} f_{i} . $$

(37)

1.1.8 Extreme learning machine

Extreme learning machine (ELM) is a machine learning algorithm proposed by Huang, which was designed for single-layer feedforward neural networks (SLFNNs) (Huang et al. 2006). The main feature of ELM is that the parameters of hidden layer nodes can be randomly generated without adjustment. The learning process only needs to calculate the output weights.

Definition 3

An SLFNN consists of three parts: an input layer, hidden layer, and output layer. The output function of the hidden layer is given as follows:

$$ f_{L} = \mathop \sum \limits_{i = 1}^{l} \beta_{i} h_{i} (x) = h(x) \beta , $$

(38)

where x is the input vector, $\beta$ is the output weight of the ith hidden node, and h(x) is the hidden layer output mapping, called the activation function, defined as follows:

$$ h(x) = G(a_{i} , b_{i} , x). $$

(39)

Here, $b_{i}$ is the parameter of the feature mapping (also called the node parameter), and $a_{i}$ is called the input weight. In the calculation, the parameter of the feature mapping is randomly initialized and is not adjusted. Hence, the feature mapping of the ELM is also random.

1.1.9 Elman neural network

The Elman neural network is a machine learning algorithm proposed by Elman designed for SLFNN. In addition to the input layer, the hidden layer, and the output layer, it also has a special contact unit. The contact unit is used to memorize the previous output value of the hidden layer unit. It can be considered as a delay operator. Therefore, the feedforward link part can be corrected for the connection weight, whilst the recursive part is fixed, that is, the learning correction cannot be performed. The mathematic model of Elman can be written as follows (Elman 1990):

$$ x\left( k \right) = f (W^{ I1} x_{c} \left( k \right) + (W^{ I2} u\left( {k - 1} \right)) $$

(40)

$$ x_{c} \left( k \right) = \alpha x_{c} \left( {k - 1} \right) + x (k - 1) $$

(41)

$$ y\left( k \right) = W^{ I3} x(k) $$

(42)

where f (x) is a sigmoid function, and $0 \le \alpha < 1$ is a self-connected feedback gain operator. If the $\alpha = 0$, then the network is a standard Elman neural network, if the $\alpha \ne 0$, then the network is a modified Elman neural network, u is the input data with n-dimensional vector, x is a hidden layer output, X_c is the hidden layer output with an n-dimensional vector, y is the output for the network with an m-dimensional vector, and W^I1, W^I2, and W^I3, are connection weights with $n \times n$-, $n \times q$-, and $m \times m$-dimensional matrices, respectively.

We set the actual output of the k-step system to be $y_{d} (k)$, define the error function as $ E\left( k \right) = \frac{1}{2}(y_{d} (k) - y(k))^{T} (y_{d} (k) - y(k))$, and let derivative E be the connection weights A, W^I1, W^I2, W^I3. The learning algorithm of an Elman network can be obtained using the gradient descent method:

$$ \Delta w_{ij}^{I3} = \eta_{3} \delta_{i}^{0} x_{j} \left( k \right), i = 1, 2, \ldots , m; j = 1, 2, \ldots , n $$

(43)

$$ \Delta w_{jq}^{I2} = \eta_{2} \delta_{j}^{h} u_{q} \left( {k - 1} \right), j = 1, 2, \ldots , n; q = 1, 2, \ldots , r $$

(44)

$$ \Delta w_{jl}^{I1} = \eta_{1} \mathop \sum \limits_{i = 1}^{m} (\delta_{i}^{0} w_{ij}^{I3} )\frac{{\partial x_{j} (k)}}{{\partial w_{jl}^{I1} }}, j = 1, 2, \ldots , n; l = 1, 2, \ldots , n $$

(45)

where r is the node number of the input layer, n is the node number of the hidden layer and unit layer, and m is the node number of the output layer. $\eta_{1}$, $\eta_{2}$, and $\eta_{3}$ are the learning steps of $W^{I1}$, $W^{I2}$, and $W^{I3}$, respectively.

$$ \delta_{i}^{0} = (y_{d,i} \left( k \right) - y_{i} (k)) $$

(46)

$$ \delta_{j}^{h} = \mathop \sum \limits_{i = 1}^{m} (\delta_{i}^{0} w_{ij}^{I3} ) f_{j}^{\prime } ( \cdot ) $$

(47)

$$ \frac{{\partial x_{j} (k)}}{{\partial w_{jl}^{I1} }} = f_{j}^{{\prime }} ( \cdot )x_{l} \left( {k - 1} \right) + \alpha \frac{{\partial x_{j} (k - 1)}}{{\partial w_{jl}^{I1} }} $$

(48)

1.1.10 Weighted information criterion (WIC)

The weighted information criterion (WIC) was initially proposed to find the best ANN model (Egrioglu et al. 2008). In this paper, WIC was applied to four real-time series datasets in order to measure the forecasting performance of the examined sub-models’ architectures and to decide the sub-models of the combined model. The number of input nodes depends on the number of sub-models of the combined model. The mean absolute percentage error (MAPE), root mean square error (RMSE), Akaike information criterion (AIC), Bayesian information criterion (BIC), and direction accuracy (DA) were chosen as the model selection criteria. The MAPE and RMSE were used to detect deviations between the actual values and the forecasting values. The AIC and BIC were used for penalizing large models. The DA measured the forecasting direction accuracy. Although the MAPE and RMSE can also be used individually to select a model, the models they select tend not to be sufficiently detailed. These criteria are calculated as follows:

$$ {\text{MAPE}} = \frac{1}{T}\mathop \sum \limits_{i = 1}^{T} \left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right| $$

(49)

$$ {\text{RMSE}} = \sqrt {\frac{1}{T}\mathop \sum \limits_{i = 1}^{T} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } $$

(50)

$$ {\text{AIC}} = \log \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{T} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{T}} \right) + \frac{2m}{T} $$

(51)

$$ {\text{BIC}} = \log \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{T} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{T}} \right) + \frac{m\log (T)}{T} $$

(52)

$$ {\text{DA}} = \frac{1}{T}\mathop \sum \limits_{i = 1}^{T} a_{i} , a_{i} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\quad \left( {y_{i + 1} - \hat{y}_{i} } \right)\left( {\hat{y}_{i + 1} - y_{i} } \right) > 0} \hfill \\ 0 \hfill & {\text{otherwise }} \hfill \\ \end{array} } \right. $$

(53)

where y is the actual value, $\hat{y}$ is the forecasted value, T is the total number of data items, and m is the number of ANN weights.

In this paper, we used a special criterion called the modified direction accuracy (MDA), which was proposed according to the special direction accuracy criterion. The MDA criterion is calculated in the following way:

$$ A_{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {y_{j + 1} - y_{j} \le 0} \hfill \\ 0 \hfill & {y_{j + 1} - y_{j} > 0} \hfill \\ \end{array} } \right. $$

(54)

$$ F_{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {\hat{y}_{j + 1} - \hat{y}_{j} \le 0} \hfill \\ 0 \hfill & {\hat{y}_{j + 1} - \hat{y}_{j} > 0} \hfill \\ \end{array} } \right. $$

(55)

$$ D_{j} = (A_{j} - F_{j} )^{2} $$

(56)

$$ {\text{MDA}} = \frac{{\mathop \sum \nolimits_{j = 1}^{T - 1} D_{j} }}{T - 1}. $$

(57)

The following describes the algorithm of model selection strategy based on WIC:

(1) All of the structure of the sub-models, which consist of the combined model are determined. For instance, we have five input layer nodes, one output layer node, and 15 hidden layer nodes. Thus, the total number of possible structures is 18.

(2) The best weights of AIC, BIC, RMSE, MAPE, DA, and MDA are determined using the training data and calculated with the training data.

(3) The five criteria must be standardized for neural network structures.

(4) Then, the WIC is calculated as follows:

WIC = 0.2 × (MAPE + RMSE) + 0.1 × (AIC + BIC) + 0.2 × (MDA + (1 – DA)).

(5) We choose an architecture with the minimum WIC.

1.1.11 The theory of the combined model

The combination forecasting theory indicates (Bates and Granger 2001) that if the M forecasting models can solve a certain forecasting problem, the weight coefficients should be appropriately selected, and then the results of the M forecasting methods are added to obtain a new model named the combined model. The results of the combined model are better than the M models. Assuming that the actual time series data is presented in the form of y, the number of sample points is i, y_i is the forecasting value obtained by the ith forecast model, the forecast error is e, and the weight coefficient of the ith forecasting model is w, then the general combined forecast model can be expressed as follows:

$$ y_{t} = \mathop \sum \limits_{i = 1}^{M} \omega_{i} \left( {\hat{y}_{it} + e_{it} } \right),\quad t = 1, 2, \ldots , L, $$

(58)

$$ \hat{y}_{t} = \mathop \sum \limits_{i = 1}^{M} \hat{\omega }_{i} \hat{y}_{it} ,\quad t = 1, 2, \ldots , L, $$

(59)

where $\hat{\omega }_{i}$ is the estimated value of $\omega_{i}$ and represents the weight of each single model, $\hat{y}_{t}$ is the forecasting value of the combined model. Determining the weight coefficient of each model is a key step in establishing a combined forecasting model. Then, by solving the optimization problem of the combination model, an optimal combination model can be obtained. Then this optimization problem can be expressed as:

$$ {\text{Min}}\mathop \sum \limits_{t = 1}^{L} \left| {y_{t} - \hat{y}_{t} } \right|, \quad {\text{s.t}}{. }\mathop \sum \limits_{i = 1}^{M} \omega_{i} = 1, 0 \le \omega_{i} \le 1, i = 1, 2, \ldots ,M. $$

(60)

When the predefined absolute error or the maximum number of iterations are reached, the optimization process is stopped.

1.1.12 Multi-objective optimization theory

The optimization problems of the objective function with multiple measurement indexes in the domain of definition can be solved by multi-objective optimizations. The objective function checks and balances the shortcomings of the forecast model in many aspects by assigning weights to each measurement index, to improve the forecasting accuracy and stability. Generally speaking, multi-objective optimization problems (MOPs) can be divided into two categories: constrained problems and non-constrained problems. A constrained problem with j inequality and k equality constraints can be expressed as (Laumanns et al. 2006):

$$ \begin{aligned} & {\text{Minimize}}\quad F\left( x \right) = \left( { f_{1} \left( x \right), f_{2} \left( x \right), \ldots , f_{M} \left( x \right) } \right)^{T} \\ & \quad {\text{s.t}}{.}\quad g_{j} \left( x \right) \ge 0, j = 1, 2, \ldots , J,\quad x \in \Omega \\ & \quad \quad h_{k} \left( x \right) = 0, k = 1,2, \ldots , K, \\ \end{aligned} $$

(61)

where M is the number of objectives, $x = (x_{1} , x_{2} , \ldots , x_{n} )^{T}$ is the decision vector, and n is the number of decision variables. In (61), $\Omega = \mathop \prod \nolimits_{i = 1}^{n} \left[ {x_{i}^{L} , x_{i}^{U} } \right] \subseteq R^{n}$ is called the decision space, where $x_{i}^{L}$ and $x_{i}^{U}$ are the lower and upper limits of the decision variables, respectively.

When the inequality and equality constraints in (61) are omitted, an unconstrained multi-objective problem is obtained, which is expressed as follows

$$ \begin{aligned} & {\text{Minimize}}\quad F\left( x \right) = \left( { f_{1} \left( x \right), f_{2} \left( x \right), \ldots , f_{M} \left( x \right) } \right)^{T} \\ & \quad {\text{s.t}}{.}\quad x \in \Omega . \\ \end{aligned} $$

(62)

And this method was widely used in many fields like carbon dioxide emissions forecast (Liu et al. 2022), electricity price forecast (Fu et al. 2020) and electricity demand forecast (Zhang et al. 2018).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fu, T., Zhang, S. Wind speed forecast based on combined theory, multi-objective optimisation, and sub-model selection. Soft Comput 26, 13615–13638 (2022). https://doi.org/10.1007/s00500-022-07334-y

Download citation

Accepted: 22 June 2022
Published: 23 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00500-022-07334-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wind speed forecast based on combined theory, multi-objective optimisation, and sub-model selection

Abstract

Access this article

Similar content being viewed by others

A newly combination model based on data denoising strategy and advanced optimization algorithm for short-term wind speed prediction

An innovative forecasting model to predict wind energy

ICEEMDAN-Informer-GWO: a hybrid model for accurate wind speed prediction

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Basic methods and theories

1.1.1 Support vector machines

1.1.2 Long short-term memory network

1.1.3 Autoregressive integrated moving average

1.1.4 Back-propagation neural network

1.1.5 Generalized regression neural network

Definition 1

Definition 2

1.1.6 Radial basis function neural network

Definition 1

Definition 2

1.1.7 Adaptive network-based fuzzy inference system

1.1.8 Extreme learning machine

Definition 3

1.1.9 Elman neural network

1.1.10 Weighted information criterion (WIC)

1.1.11 The theory of the combined model

1.1.12 Multi-objective optimization theory

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation