Wind power forecasting in distribution networks using non-parametric models and regression trees

Nikolaidis, Pavlos

doi:10.1007/s43937-022-00011-z

Wind power forecasting in distribution networks using non-parametric models and regression trees

Research
Open access
Published: 14 October 2022

Volume 2, article number 6, (2022)
Cite this article

Download PDF

You have full access to this open access article

Discover Energy Aims and scope Submit manuscript

Wind power forecasting in distribution networks using non-parametric models and regression trees

Download PDF

Pavlos Nikolaidis¹

2159 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Renewable resources provide viable and advantageous solutions up to a certain integration share. At higher penetration levels, they violate the conventional generation constraints, leading to decentralized uncertainty with respect to bi-directional power flows. This generates an increasing need for smart tools able to predict their output with high accuracy, based on easily accessible input data for forecasting. Based on actual data with respect to load demand and wind power generation, this work presents a realization of decision trees that target on a continuous response, also known as regression trees. Utilizing the speed and direction of wind, the ambient temperature, relative humidity, renewable capacity and renewable energy source curtailment as predictors in distribution networks of different regions, the proposed configuration is able to predict the generated power with high accuracy. According to the obtained results under distinguished scenarios, the inclusion of temperature and humidity to the predictive list greatly improves the accuracy in terms of mean square error, root mean square error and mean absolute range normalized error, whereas the renewable availability offer no relevant changes. However, in the forthcoming de-carbonized power systems, the impact of curtailed energy will play an important role in expert forecasting systems where the input/output association must be modelled with high resolution.

Forecasting of Electricity Demand and Renewable Energy Generation for Grid Stability

Modeling and Forecasting Renewable Energy Resources for Sustainable Power Generation: Basic Concepts and Predictive Model Results

Reliability-constrained transmission expansion planning based on simultaneous forecasting method of loads and renewable generations

Article 06 July 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The decreasing fossil-fuel resources and climate friendliness, impose the utilization of renewable energy resources (RES) in diverse fields of the daily lifetime [1]. Towards decarbonisation, several energy sectors are led to electrification in a global attempt to increase the renewable electricity integration in existing power networks. However, the intermittency and uncertainty with respect to RES potential cause serious challenges to modern power systems, their planners and operators. By lowering the contribution of controllable generation, the capability of adjusting production to consumption (via frequency regulation) is decreased unevenly. Hence, the addition of renewable portions in total generation calls for enhanced flexibility levels (translated as spinning reserve margins) in order to retain system stability and reliability, which in turn raise the total generation cost [2].

Alongside, the sustainable targets have already been set and, in order to ameliorate their expensive requirements in spinning reserve, some efforts have been made towards the determination of the optimal RES share per case. Up to a certain share, renewable resources provide viable and advantageous solutions, while in considerably higher levels they become infeasible. To this end, various electricity storage (ES) technologies are examined, taking into account the annual income, emission avoidance and RES increase against the cost [3]. Different scientific opinions on the aspect do exist, stating that any attempt to determine the optimal ES system size that does not explore the sustainable view of a probable application and future raw-material reserves cannot be taken into account [4]. The uncertainty worsens in 100% RES paradigms, where hydrogen from renewable production processes has to replace hydrocarbon fuels. Issues like feedstock cost and content variation, security of supply, transportation and domestic storage and utilization need to be study in depth [5].

On a radically different axis, the stochastic and variable impact of RES on their potential availability can be greatly improved via the distinguished concept of prediction. Utilizing artificial intelligence tools in recent years, forecasting has gained a relevant accuracy level in the field of RES as well. The general mechanism finds ready application on the prediction of solar irradiation, wind speed, river stage and flood level, sea-wave height, tides and currents, geothermal wells gushing water and biomass yield. Based on the multiple stages followed to result in electricity and its global potential, wind lies between the most promising and worth investigating renewable resources [6]. According to [7], traditional renewables including hydro, wind, solar, biomass and geothermal represent ~ 2000 GW (or 30% of worldwide installed capacity), supplying near the 25% of total electricity during 2017. Wind power contributed with 90% onshore and 10% offshore energy to the one quarter of RES capacity, following hydro which constitutes ~ 16% of renewable power or 4% of total electricity production. In this context, wind can be regarded as a major resource towards a carbon-free and sustainable economy, with massive expansion expectations as high as 977 GW in 2030 (905 GW onshore and 72 GW offshore wind power).

Numerical assessments on wind potential and installation arrangements in urbanized areas reveal that a considerably high share will take place in distribution networks [8]. As a result, turbulence intensity between existing obstacles, ground geometry and presence of upstream objects, must be taken into account. The intermittency issues are also examined in Cai and Bréon [9]. The great temporal variability of the wind power in higher penetration levels is proposed to be mitigated through spatial aggregation techniques. These techniques must consider multi-input prediction tools with the consolidation of both frequent and infrequent events at regional and national scale. In general, wind power forecasting models can be divided per applied methodology into persistence, physical, statistical and hybrid [10]. Persistence and physical methods do not need to be trained with historical data. Instead, they depend on highly correlated or physical data and, thus, they become favourable for short-term forecasting [11]. On the contrary, statistical approaches are based on developing the non-linear and linear relationships between historical data (such as wind speed, wind direction and temperature, etc.) and the generated power, to train a forecast model. A comprehensive review on the multi-objective optimization technologies in wind prediction is made in Liu et al. [12], with a reference to four critical data pre-processing techniques.

The development of artificial intelligence and machine learning techniques certainly benefits the advancement of energy prediction. Hybridizing the artificial neural networks and regression techniques, the non-linearity in the wind-speed time series can be addressed [13]. Auto-regressive, moving average, least absolute shrinkage selector operator, k nearest neighbour, gradient boosting decision tree, random forest, Gaussian process, quantile and support vector are some of the most relevant regression algorithms available for combination with the family of artificial neural networks (ANNs) (including perceptron, feed-forward, multi-layer perceptron, radial based, convolutional, recurrent, long short-term memory, deep belief and auto-encoder) [13,14,15,16].

A further, very common family that operates satisfactorily with forecasting tasks is well-known as fuzzy systems. Fuzzy logic models are suitable for load prognosis when the historical data are expressed by linguistic terms [17]. They are introduced to accelerate the speed of convergence and lower the computational times usually presented in ANNs. Specifically, rather than making use of individual variants and targets, they can exploit single-valued ensembles and evaluate their membership under the concept of multiple linear regression to construct a final prediction [18]. Otherwise, the non-linear stochastic relationship problem can be handled by utilizing weighted fuzzy time series and combined linear and non-parametric algorithms [19]. Exploiting the merits of ANNs and fuzzy systems, the hybrid approaches offer impressive trade-offs between accuracy and convergence time. However, in multi-predictor prognosis tasks they are comparable with another family named regression trees.

Regression trees constitutes a method based on the use of a decision tree as a predictive model, particularly used in data mining and automatic learning. The authors in [20] presented four regression trees methods, namely normal, pruned, boosted and bagged, for 1–6 h global horizontal irradiation prediction. A 30-min ahead wave-height forecasting for electricity generation was carried out in [21], by making use of hybridized multiple regression algorithms including regression trees. The employed regression trees constructed on a binary decision structure to establish the association among predictors and response variables. An innovative home-energy consumption predictive model is demonstrated in Nie et al. [22]. Based on experimental evaluations, the energy consumption by using the gradient boosting regression tree algorithm is predicted with greater accuracy, compared to alternative approaches in terms of performance indices like Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE).

Table 1 lists some representative models for wind power prediction. Although a huge research work has been made around forecasting, regression trees have been under-explored in the direction of wind power prediction. Some limited studies investigated the performance of regression trees in wind-speed for very short-term problems [23] and probabilistic wind-direction forecasting [24]. Based on the existing literature, most studies are focused on the prognosis of wind based on weather conditions with the aim of providing the wind power output relying only to the manufacturer’s power curve. This may offer accepted results considering only one turbine per time. The spatial correlation is not taken into account, though it belongs within the critical parameters in wind farms and urban generating complexes. For example, the impact of a probable turbulence at specific wind directions must be consolidated if the prognosis target concerns an aggregated wind-power output. A relevant work found in Barber and Nordborg [25], studied these incoherent phenomena and concluded that both wind turbulence intensity and shear, along with site-specific tip deflection are becoming quite important, while the wind-turbine blades are getting longer and more flexible. In spite of the fact that the total wind energy production of peninsular Spain is predicted in Torres-Barrán et al. [26], the forecasts depend on input patterns given for several weather variables, at each one of the points that caver the areas under study. However, identifying the increased occurrence of wind turbines in the forthcoming distribution networks, the RES producers must be facilitated in making day-ahead forecasts using open-access data available online.

Table 1 Representative wind-power prediction models

Full size table

Towards this direction, in this work a new formulation is developed to extract the most critical parameters affecting the aggregated contribution of wind parks. To associate the actual aggregated wind penetration with the predictors, the curtailment of RES is also considered. The rest of the paper is organized as follows. In Sect. 2, emphasis is given on the formulation of the problem to determine the most critical parameters towards model training. In Sect. 3, the minimum required input variables for forecasting are indicated and explained based on mathematical framework. The extensive experimental evaluations and obtained results are analysed and discussed in Sect. 4, providing the input/output characteristics of different geographical areas. Finally, the conclusions are drawn in Sect. 5.

2 Mathematical framework

The prospects of increasing the contribution of wind in distributed generation are continuously growing in recent years. Moving from the centralized generational schemes to dispersed production patterns, there is a need for developing expert systems able to accurately predict and properly manage the power flows in a bi-directional manner. In this smart-grid procedure, forecasting holds a major role which is often decisive for the maximum penetration of wind in the energy mixture.

It is well known that the power extracted by a specific wind turbine (P_wt) is proportional to the wind speed (v), the spanned area by the rotor blades (A) and the air density (ρ_air). Since the amount of kinetic energy of the air cannot be entirely acquired, the Betz limit C_P ~ 59.26% is applied to form the following equation [27]:

$$P_{wt} = \frac{1}{2}C_{P} \rho_{air} Av^{2}$$

(1)

According to the type and scale of the wind turbine, most manufacturers define the kinetic energy converted into electricity as a function of the rated power (P_r) and wind speed (v_r) with respect to the cut-in (v_i) and cut-out (v_o) speed. In this regard, the following formulation can be used [28]:

$$P_{wt} \left( t \right) = \left\{ {\begin{array}{*{20}c} {P_{r} ,} & { v_{r} \le v\left( t \right) \le v_{o} } \\ {P_{r} \frac{{v\left( t \right) - v_{i} }}{{v_{r} - v_{i} }},} & {v_{i} < v\left( t \right) < v_{r} } \\ {0,} & {v\left( t \right) \le v_{i} \;or\;v\left( t \right) > v_{o} } \\ \end{array} } \right.$$

(2)

This way, the total wind power (P_w) obtained by a wind park or a complex of wind parks consisted of N wind turbines can be calculated by Eq. 3, only if the wind speed v(t) is known (or predicted) for each individual wind turbine.

$$P_{w} \left( t \right) = \mathop \sum \limits_{i = 1}^{N} P_{wt} \left( t \right)$$

(3)

At the desired height, the parameter of air density plays a critical role on the estimation of wind power in aggregated generation plants. To examine how the latter is affected by weather and site-specific conditions, the following analysis takes place [29].

Air density is defined as the mass per unit volume of the Earth's atmosphere. Like pressure (p), it decreases with increasing altitude. It also changes with fluctuating temperature (T) or relative humidity (RH). At sea level and 15 °C, air has a density approximately equal to 1.225 kg/m³. At the point of connection between the wind turbine shaft and blades, its value decreases according to the prevailing weather conditions. The density of dry air can be evaluated considering the ideal gas law, expressed as a function of T and p.

$$\rho = \frac{p}{{R_{s} T}}$$

(4)

Measuring the absolute pressure in Pascal (Pa) and temperature in Kelvin degrees (K), the specific gas constant of dry air (R_s) is 287.052874 J/(kg K) assuming a mean molar mass for dry air of 28.964917 g/mol. Hence, this quantity can slightly vary depending on the molecular composition of air at a particular location.

Adding water vapour to the air (making the air moist), decreases its density, which may seem counterintuitive at first. This is because the molecular mass of water (18 g/mol) is less than the molecular mass of dry air (~ 29 g/mol). For any gas, at a given temperature and pressure, the number of molecules present is constant for a given volume according to Avogadro’s Law. Thus, when water molecules (water vapour) are added to a given volume of air, the dry air molecules must decrease by the same number, in order to prevent the pressure or temperature from rising. Hence, the mass per unit volume of the gas decreases. In other words, the density of moist air can be calculated as a mixture of ideal gases. In this case, the partial pressure of water vapor (p_v) is known as vapor pressure and, using this method, the deviation error in density becomes less than 0.2% in the range from − 10 °C to 50 °C. The density of humid air (ρ_ha) is found by Eq. 5.

$$\rho_{ha} = \frac{{\rho_{d} }}{{R_{d} T}} + \frac{{\rho_{v} }}{{R_{v} T}} = \frac{{\rho_{d} M_{d} + \rho_{v} M_{v} }}{RT},$$

(5)

where ρ_d is the partial pressure of dry air, T is the ambient temperature in K, R_d and R_v are the specific gas constants equal to 287.053 J/(kg K) and 461.495 J/(kg K) for dry and vapour air, respectively, M_d and M_v correspond to the molar masses of dry air (28.964 g/mol) and water vapour (18.016 g/mol), and R constitutes the universal gas constant of 8.314 J/(K mol).

The vapor pressure of water can be estimated from the saturation vapor pressure (p_sat) and relative humidity as $p_{v} = RH \cdot p_{sat}$. In turns, the saturation vapor pressure at any given temperature is basically the vapor pressure when RH equals to 100% [30]. The formula used to define p_sat as a function of temperature is:

$$p_{sat} = 6.1078.10^{{\frac{{7.5\left( {T + 273.15} \right)}}{T}}}$$

(6)

It is noted that, the extracted result is given in hPa which is equivalent to 100 Pa or 1 mbar. Then, the partial pressure of dry air can be estimated by making use of the vapour and absolute observed pressures as $p_{d} = p - p_{v}$. At this point, the altitude (h) can be consolidated, introducing additional parameters and constants with reference to sea level. These accounts for the sea-level atmospheric pressure p_o (101.325 kPA), the sea-level standard temperature T_o (288.15 K), earth-surface gravitational acceleration g (9.81 m/s²) and temperature lapse rate L (0.0065 K/m). Consequently, the temperature and pressure can be determined based on the following equations.

$$T = T_{o} - L \cdot h$$

(7)

$$p = p_{o} \left( {1 - \frac{Lh}{{T_{o} }}} \right)^{{\frac{{gM_{d} }}{RL}}}$$

(8)

The air density can be calculated by the molar form according to the ideal gas law as:

$$\rho_{air} = \frac{pM}{{RT}},$$

(9)

where M is the actual molar mass of air (found by R_s/R) and p is measured in Pa. Combining the aforementioned formulation, the final form of the air pressure as a function of ambient temperature in °C (T_c), relative humidity (RH) and altitude (h) is expressed as follows:

$$\rho_{air} = \frac{1}{{287.05\left( {Tc + 273.15} \right)}}\left[ {101325\left( {1 - 0.0226h} \right)^{5.256} - 230.87484RH \cdot 10^{{\frac{7.5Tc}{{237.3 + Tc}}}} } \right]$$

(10)

Referring back to the definition of wind power, one can observe that the wind direction does not affect the power output from a turbine. Nonetheless, it is very important for the spatial association between the allocation of turbines and the aggregated power. In the wake of a wind turbine, each region is independently affected based on the wind inflow. This generates the requirement to study the power generation considering distinct turbulent locations based on the varying wind direction [31]. Finally, the total contribution of wind is limited by another factor, imposed by the distribution system operators (DSOs). This factor stems from the overall system inertia for frequency regulation and depends on the optimal unit commitment schedule. It constitutes a national percentage for RES curtailment, giving a set-point to producers for the maximum accepted power output, regardless of whether a unit (or a whole park) is fully available for electricity provision or not.

3 Data harvesting and processing

Up to here, the most critical parameters that can change the production rate of wind were taken into account, excluding the planned outages and unexpected interruptions of any kind (e.g. for maintenance, upgrade, failure, etc.) at a single or a group of wind turbines. To construct an appropriate list for the least predictors needed to make forecasts, let’s take a look back to the formulation of wind production (Eq. 1). From this definition, the required predictors are all parameters that cannot be controlled by the wind turbine, its manufacturer or the owner.

Although the exploited energy from wind can be regulated via pitch (β) control, the yaw angle (γ) and gear-box ratio (λ), these parameters are controlled by the wind turbine mechanisms. The spanned area from the blades is a constant feature based on the manufacturer’s blade length and the height and exact location of installation is up to the plant owner. To this end, the uncontrollable variables that are needed to be collected are the wind speed, wind direction and air pressure. Wind direction is measured in degrees (°) in the range of 0–360°, defining a wind blowing from the north with 0° (or 360°) and another one blowing from east as 90°. Turning back to Eq. 10, one can be seen is that air density depends on the controllable parameter of altitude as an extension of the turbine’s height, and the uncontrollable variables of temperature and humidity which form the rest two predictors.

Since a reduction in aggregated generation, as a consequence of a partial outage on a separated unit or a group of units, is very difficult to be determined, a time-varying system-capacity factor (c_sys) can be used as a predictor towards accurate forecasts. Together with the RES curtailment factor (f_RES), can define the total available capacity of wind generation under consideration. These factors are defined as:

$$c_{sys} \left( t \right) = \frac{{P_{wind}^{actual} \left( t \right)}}{{P_{wind}^{cap} \left( t \right)}}$$

(11)

$$f_{RES} \left( t \right) = \frac{{P_{RES}^{cut} \left( t \right)}}{{P_{RES}^{cap} \left( t \right)}},$$

(12)

where $P_{wind}^{actual}$ denotes the current availability of wind systems, $P_{wind}^{cap}$ the capacity of wind systems in total, $P_{RES}^{cut}$ the curtailed RES and $P_{RES}^{cap}$ the total capability of RES.

To gain a broader overview with respect to the magnitude of RES curtailment, the term of spinning reserve (SR) has to be defined. Spinning reserve refers to the available power capacity that can be directly attributed to the system from the committed (online) conventional generating units. Due to some operational constraints, such as minimum elapsed times between consecutive generator start-ups or shut-downs, rate of change (upward and downward) of their power output, maximum achievable number of status change at a time and uninterruptible operation of some must-run units, this amount of power must be guaranteed in order to retain system stability and reliability levels. Thus, the excess power from RES is curtailed in an attempt to sustain the required SR, without violating the imposed constraints [32, 33].

SR consists of a constant, a bi-basic and a variable part. The constant value of SR accounts for an unexpected failure on a remarkable generator being online. The bi-basic quantity, takes a high value during peak hours and a lower during off-peak, whereas a time-dependent portion is used to include the uncertainty due to RES contribution. In any case, this contribution is restricted by the minimum capacity (P^min) of the must-run units M and hence, the excess RES is curtailed to avoid their de-commitment. This can be formulated so that [34]:

$$P_{RES}^{cut} \left( t \right) + SR\left( t \right) \ge P_{load} \left( t \right) - \left[ {\mathop \sum \limits_{m = 1}^{M} P_{m}^{\min } \left( t \right) + \mathop \sum \limits_{i = 1}^{G} P_{i} \left( t \right) + P_{RES}^{cap} \left( t \right)} \right],\;\forall t \in T,$$

(13)

where P_i expresses the actual power by each generator i and must-run generator m, while P_load represents the actual load demand during the interval t. Identifying that the aggregated wind power is expressed by Eq. 14, the last predictor needed is the actual deviation between real generating power and minimum capacity of the must-run units, when the rest of conventional generators are off-line.

$$P_{w} \left( t \right) = c_{sys} f_{RES} \mathop \sum \limits_{i = 1}^{N} P_{wt} \left( t \right)$$

(14)

4 Experimental evaluation

In order to evaluate the performance of the proposed approach, the implementation of regression trees is carried out considering three case studies for the paradigm of Cyprus. The case studies take place under three geographical paradigms to represent the distributed-generation diversity of the island. The input/output characteristics are explained along with the forecaster development. The obtained models are compared based on the metrics of MAE, RMSE and MARNE.

To justify the selection of the most appropriate input features, Pearson method and mutual information are utilized. To estimate the correlation coefficients and mutual dependency between the input features and wind-power output, the Pearson method takes place. Based on Eq. 15, the correlation coefficient $\rho \in \left[ { - 1, + 1} \right]$ considering the predictors x_i and output Y_i, and their mean values $\overline{x}$ and $\overline{Y}$, respectively, is defined as follows [35]:

$$\rho = \frac{{\sum \left( {x_{i} - \overline{x}} \right)\left( {Y_{i} - \overline{Y}} \right)}}{{\sqrt {\sum \left( {x_{i} - \overline{x}} \right)^{2} \sum \left( {Y_{i} - \overline{Y}} \right)^{2} } }}.$$

(15)

Treating the predictor/target pairs as random variables (X,Y), their mutual dependencies can be obtained based on Eq. 16, taking into account the number of their respective states S_n and S_m, their joint P(x_n,Y_m) and marginal probabilities P(x_n) and P(Y_m). The mutual information $I \in \left[ {0, + \infty } \right)$ is determined as [36]:

$$I\left( {X;Y} \right) = \mathop \sum \limits_{n = 1}^{{S_{n} }} \mathop \sum \limits_{m = 1}^{{S_{m} }} P\left( {x_{n} ,Y_{m} } \right)\log \frac{{P\left( {x_{n} ,Y_{m} } \right)}}{{P\left( {x_{n} } \right)P\left( {Y_{m} } \right)}}$$

(16)

Between the input features of wind speed, wind direction, generator rotor speed, pitch angle, yaw angle, ambient temperature, relative humidity, air pressure, gear ratio and turbine-hub height, those that exhibit the highest correlation ($\rho \ge \left| {0.20} \right|$) and dependency (${\rm I} \ge 0.25$) to the wind power are only wind speed and air pressure.

4.1 Input/output variables

Actual data for the wind power generated from three different wind farms were obtained from the Cyprus energy regulatory authority (CERA). The wind farms are located east, west and in the mountains in the centre of the country. Hence, they are connected to independent distribution networks which are under the administrative and operational ownership of different districts. Distinguishing the plants by location to eastern, western and central mountainous, their respective power output during the year of 2020 can be seen in Fig. 1. As can be observed, there does not exist a relationship between the produced power and seasons, months, day-type or time.

To depict the inputs, Fig. 2 shows the respective histograms of wind direction to the mentioned sites, while the average monthly wind speeds are included in Fig. 3. It is obvious that they can vary according to the altitude and morphology of the area. In a similar way, the ambient temperature varies with location and season, as illustrated in Fig. 4. The inherent association between temperature, wind speed and therefore the wind power, cannot be graphically represented with ease. The same occurs with humidity which provides an extremely non-linear relationship with air density based on Eq. 10. However, as in the case of temperature, it follows a pretty steady seasonal pattern that can be represented by Fig. 5 for spring, autumn, winter and summer.

While someone can state that wind speed and wind power, in turns, demonstrate a stochastic behaviour difficult to be predicted, the aspect of RES curtailment remains at the same side. To verify this randomness, the definition of Eq. 13 can be used. The clear dependence of curtailment on load demand, is essentially the main reason of added uncertainty. Load demand is strictly correlated with the day-type, day length, season, electricity price and fuel price, daily activities and weather. This way, it deteriorates the satisfaction of conventional generating constraints and consequent RES contribution. To offer a graphical explanation of this phenomenon, Fig. 6 contains the minimum capacity limits of the must-run units, violated by the actual demand during a week in April 2020. As a result, the integrated RES across the curtailed RES is included in Fig. 7, for the same period.

4.2 Training approach

In general, analytical models are presented computationally intensive, leading to infeasible exploration in most cases. Therefore, more generic and simple training approaches are required, in order to forecast the key targets, namely outlet wind power, based on non-exhaustive data-driven models. Identifying the trends in data, these techniques are able of capturing the underlying physical behaviour of the system, avoiding the need for pre-defined, detailed information about its specific features. As a modern algorithm of machine learning, regression trees offer excellent solutions in predictive modelling problems.

To generate a trained model able to predict new datasets in a repetitive manner, the regression tree makes use of logical rules to split a complex problem into several simpler sub-problems, easier to be interpreted. Specifically, it makes decisions based on conditions that are hierarchically applied from “root” to “leaf” of a tree [37]. A simple configuration of a decision tree considering two predictors (i.e. wind speed and wind direction) and one target (wind power) is provided in Fig. 8. From the root node, a splitting process is repeated until the maximum depth of the tree is achieved (restricted to 3 in this case). Each leaf induces a simple regression model which can be pruned to reduce complexity and improve the capability of the tree model.

Denoting the recursively grown tree with ${\mathcal{T}}$, the multivariate ${\varvec{x}} \in {\mathcal{X}}$ can be mapped to the target variable ${\mathcal{Y}}$ based on a tree-node membership m. ${\varvec{x}}$ expresses a d-dimensional vector with real valued variables (predictors) so that ${\varvec{x}} = \left( {x_{1} , \ldots ,x_{v} } \right) \in {\mathbb{R}}^{d}$. Considering the learning data ${\mathcal{L}} = \left\{ {\left( {Y_{i} ,x_{i} } \right):i = 1, \ldots ,n} \right\}$, the regression tree is grown from ${\mathcal{L}}$ according to recursive, conditional splits (in the form of x_i ≤ c and x_i > c for binary decisions) chosen from the observed values of x_i [38]. Hence, for M terminal nodes the appropriate mapping function ${\mathcal{T}}:{\mathcal{X}} \to \left\{ {1, \ldots ,M} \right\}$ can be defined as:

$${\mathcal{T}}\left( x \right) = \mathop \sum \limits_{m = 1}^{M} mB_{m} \left( x \right),$$

(17)

where B_m denotes a product spline as a basis function of the number of splits (length) L_m, the splitting variable x_l,m and the splitting value c_l,m so that:

$$B_{m} \left( x \right) = \mathop \prod \limits_{l = 1}^{{L_{m} }} \left( {x_{l,m} - c_{l,m} } \right)$$

(18)

In our realization, a more complex and deeper regression tree is actually used, able to interpret up to six predicting features, namely wind speed, wind direction, temperature, humidity, system capacity factor and RES curtailment factor, over one response, the target of wind power production. In the scope of averting overfitting and facilitating in learning dependencies and dataset outliers, the inclusion of some hyper-parameters like turbine-hub height, turbine-blade radius and spatial distribution could take place and optimized (e.g. with gradient boosting optimization method) towards the accuracy improvement and algorithm generalization during the model validation stage.

To ameliorate for the uncertainties in observed values, due to steep fluctuations of the wind speed, the dissimilarities between different time-series datasets are measured by means of clusters with common features. As a result, the final stage of forecasting consists of a post-process correction procedure explained as follows. First, the daily, observed wind power is categorized into clusters so as to minimize the variance, which is defined as the within-cluster sum of squares of Eq. 19.

$$\mathop {{\text{argmin}}}\limits_{S} \mathop \sum \limits_{i = 1}^{k} \mathop \sum \limits_{{x \in S_{i} }} Y - \mu_{i}^{2}$$

(19)

It is noted that Y represents a dataset of n observed d-dimensional (d = 24) vectors (e.g. Y₁, Y₂,…, Y_n), that are to be partitioned into k ≤ n clusters such that S = {S₁, S₂,…, S_k}, based on their mean value (point in S_i). To this end, a linear regression approach is applied to each cluster, in order to determine the fitting (or corrective) coefficients b and a so that [39]:

$$P_{predicted} = bP_{observed} + a$$

(20)

4.3 Model performance

To properly determine the model performance, three scenarios are utilized by increasing the dimensionality in terms of consolidated variables. Specifically, the first scenario makes use of historical data with respect to wind speed and wind direction at the western coast, eastern coast and central mountainous terrains. The data obtained from CERA and regard the years of 2020 and 2021. For training purpose, the data size is 2 years and accounts for 60 min sampling rate. For validating the proposed model, the hourly granularity is considered for 6 months. In the second scenario, the variables of ambient temperature and relative humidity are also consolidated, while the third scenario also includes the predictors of RES capacity and RES curtailment.

To assess the scenarios at each location, the metrics of mean absolute error (MAE), root mean squared error (RMSE) and mean absolute range normalized error (MARNE) are taken into account exploiting Eqs. 21–23.

$$MAE = \frac{1}{\tau }\mathop \sum \limits_{t = 1}^{\tau } \left| {P_{a} \left( t \right) - P_{p} \left( t \right)} \right|$$

(21)

$$RMSE = \frac{1}{\tau }\sqrt {\mathop \sum \limits_{t = 1}^{\tau } \left( {P_{a} \left( t \right) - P_{p} \left( t \right)} \right)^{2} }$$

(22)

$$MARNE = \frac{1}{\tau }\mathop \sum \limits_{t = 1}^{\tau } \frac{{\left| {P_{a} \left( t \right) - P_{p} \left( t \right)} \right|}}{{\mathop {\max }\limits_{t} P_{a} \left( t \right)}} \times 100$$

(23)

A paradigm of wind power forecasting at western regions is depicted in Fig. 9. According to the simulations performed in MATAB MathWorks R2020b, the performance of the underlying regression trees is improved by adding predictors in model training process. In scenario 1, the wind power accuracy in western areas is rated at 6.1306, 0.0734 and 7.6357 in terms of MAE, RMSE and MARNE, respectively. These values greatly improve during scenario 2, where they were amounted at the respective 5.1920, 0.0649 and 6.4667. The aggregated 41 wind turbines of total 82 MW capacity, do not add considerable improvements by the insertion of RES availability and RES curtailment.

In the eastern coast, a wind farm with 21 wind turbines and 32.5 MW capacity shows improved predictive accuracy. Considering only the speed and direction of wind, the forecasted power was predicted with 2.4175 MAE, 0.0292 RMSE and 7.2084 MARNE. It is worth noting that, even more improvements were observed when the predictors of ambient temperature and relative humidity taken into account. The accuracy improved by 29.95%, 20.66% and 29.94%, respectively. Νeither in this case was an appreciable change in the results during scenario 3. The forecasted performance is included in Fig. 10, where the actual across the predicted wind power is illustrated.

The central mountainous area composes three wind farms with total installed capacity rated at 40 MW. A daily forecast utilizing the proposed model configuration is shown in Fig. 11. The MAE, RMSE and MARNE metrics account for 4.1147, 0.0466 and 9.8884 and decrease to the respective values of 4.4821, 0.0416 and 8.3681 in scenario 2 and 3. For completeness sake, the experimental evaluation of the proposed model is listed by comparative metric and location in Table 2. Based on the comprehensive analysis, the predictive accuracy greatly improves by adding the predictors of ambient temperature and humidity to the vectorised wind speed and wind direction matrix. In the current form of isolated power network, the available capacity of RES and RES curtailment do not appear to possess a significant impact on predictions. However, in higher levels of wind penetration, their impact can impose extreme deviations from actual profiles, allowing regression trees to achieve increased improvements by their consolidation as predictors.

Table 2 Comparative performance metrics by location

Full size table

5 Conclusions

In this work, the performance of regression trees has been assessed towards wind power forecasting in distribution networks. The configuration was tested for the case of Cyprus, dividing the system by region into western coast, eastern coast and central mountainous. Five wind farms in total, accounting for around 82 MW, 31.5 MW and 44 MW per region, were taken into consideration.

Following a non-parametric mathematical formulation of air density, the minimum required predictors were identified successfully. The performance of the proposed configuration is investigated in terms of mean absolute error, root mean square error and mean absolute range normalized error. Specifically, experimental evaluations are performed considering three scenarios per location. The first scenario takes into account only the wind speed and wind direction data as predictors for the years of 2020 and 2021. In the second scenario, the variables of ambient temperature and relative humidity are also included. The final scenario (scenario 3) accounts for the availability of renewable resources systems and renewable energy curtailment, in an attempt to increase the object fitting performance.

According to the experimental simulations, improvements can occur by consolidating the predictors of temperature and humidity, independent from the location. The improvements in predictive error are in the order of 29.95%, 20.66% and 29.94%, for the respective performance metrics. Applying wind power forecast with six predictors (wind speed, wind direction, ambient temperature, relative humidity, renewable capacity availability and renewable energy curtailment), the imposed improvements did not show reasonable changes. Especially in the case of eastern coast, no improvement was indicated since the aggregated power regarded only one wind farm with total 31.5 MW installed capacity. Nevertheless, the results relating to the mountainous region, where the wind power from three different wind farms of total 44 MW capability, reveal that the increasing contribution of wind in the forthcoming distribution networks will undeniably require the more predictors. In this case, the renewable capacity availability and renewable curtailment, based on spinning reserve provision, possess critical role in wind power forecast modelling and performance.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Nikolaidis P. Sustainable routes for renewable energy carriers in modern energy systems. In: Bioenergy research: commercial opportunities & challenges. Springer: Singapore; 2021. p. 239–65.
Chapter Google Scholar
Nikolaidis P, Fotiou S, Kasparis T, Poullikkas A. Dynamic analysis of high-response storage systems to minimize the generation ramping requirements. IET Conf Publ. 2020;2020(CP780):398–403. https://doi.org/10.1049/icp.2021.1215.
Article Google Scholar
Nikolaidis P, Poullikkas A. A thorough emission-cost analysis of the gradual replacement of carbon-rich fuels with carbon-free energy carriers in modern power plants: the case of cyprus. Sustainability. 2022;14(17):10800. https://doi.org/10.3390/su141710800.
Article Google Scholar
Nikolaidis P, Chatzis S, Poullikkas A. Life cycle cost analysis of electricity storage facilities in flexible power systems. Int J Sustain Energy. 2019;38(8):752–72. https://doi.org/10.1080/14786451.2019.1579815.
Article Google Scholar
Nikolaidis P, Poullikkas A. A comparative review of electrical energy storage systems for better sustainability. J Power Technol. 2017;97(3):220–45.
Google Scholar
Hong T, Pinson P, Wang Y, Weron R, Yang D, Zareipour H. Energy forecasting: a review and outlook. IEEE Open Access J Power Energy. 2020;7(October):376–88. https://doi.org/10.1109/OAJPE.2020.3029979.
Article Google Scholar
Bandoc G, Prăvălie R, Patriche C, Degeratu M. Spatial assessment of wind power potential at global scale. A geographical approach. J Clean Prod. 2018;200:1065–86. https://doi.org/10.1016/j.jclepro.2018.07.288.
Article Google Scholar
Juan YH, Wen CY, Chen WY, Yang AS. Numerical assessments of wind power potential and installation arrangements in realistic highly urbanized areas. Renew Sustain Energy Rev. 2021;135: 110165. https://doi.org/10.1016/j.rser.2020.110165.
Article Google Scholar
Cai Y, Bréon FM. Wind power potential and intermittency issues in the context of climate change. Energy Convers Manag. 2021. https://doi.org/10.1016/j.enconman.2021.114276.
Article Google Scholar
Chang W-Y. A literature review of wind forecasting methods. J Power Energy Eng. 2014;02(04):161–8. https://doi.org/10.4236/jpee.2014.24023.
Article Google Scholar
Hanifi S, Liu X, Lin Z, Lotfian S. A critical review of wind power forecasting. Energies. 2020;13(15):1–24.
Article Google Scholar
Liu H, Li Y, Duan Z, Chen C. A review on multi-objective optimization framework in wind energy forecasting techniques and applications. Energy Convers Manag. 2020. https://doi.org/10.1016/j.enconman.2020.113324.
Article Google Scholar
Dhiman HS, Deb D, Guerrero JM. Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renew Sustain Energy Rev. 2019;108:369–79. https://doi.org/10.1016/j.rser.2019.04.002.
Article Google Scholar
Wang Y, Zou R, Liu F, Zhang L, Liu Q. A review of wind speed and wind power forecasting with deep neural networks. Appl Energy. 2021;304: 117766. https://doi.org/10.1016/j.apenergy.2021.117766.
Article Google Scholar
Jiang P, Liu Z, Niu X, Zhang L. A combined forecasting system based on statistical method, artificial neural networks, and deep learning methods for short-term wind speed forecasting. Energy. 2021;217: 119361. https://doi.org/10.1016/j.energy.2020.119361.
Article Google Scholar
Jiajun H, Chuanjin Y, Yongle L, Huoyue X. Ultra-short term wind prediction with wavelet transform, deep belief network and ensemble learning. Energy Convers Manag. 2020;205(2019): 112418. https://doi.org/10.1016/j.enconman.2019.112418.
Article Google Scholar
Pai PF. Hybrid ellipsoidal fuzzy systems in forecasting regional electricity loads. Energy Convers Manag. 2006;47(15–16):2283–9. https://doi.org/10.1016/j.enconman.2005.11.017.
Article Google Scholar
Zhao J, Guo ZH, Su ZY, Zhao ZY, Xiao X, Liu F. An improved multi-step forecasting model based on WRF ensembles and creative fuzzy systems for wind speed. Appl Energy. 2016;162:808–26. https://doi.org/10.1016/j.apenergy.2015.10.145.
Article Google Scholar
Sulandari W, Subanar, Lee MH, Rodrigues PC. Indonesian electricity load forecasting using singular spectrum analysis, fuzzy systems and neural networks. Energy. 2020;190: 116408. https://doi.org/10.1016/j.energy.2019.116408.
Article Google Scholar
Voyant C, Motte F, Notton G, Fouilloy A, Nivet ML, Duchaud JL. Prediction intervals for global solar irradiation forecasting using regression trees methods. Renew Energy. 2018;126:332–40. https://doi.org/10.1016/j.renene.2018.03.055.
Article Google Scholar
Ali M, Prasad R, Xiang Y, Deo RC. Near real-time significant wave height forecasting with hybridized multiple linear regression algorithms. Renew Sustain Energy Rev. 2020;132: 110003. https://doi.org/10.1016/j.rser.2020.110003.
Article Google Scholar
Nie P, Roccotelli M, Fanti MP, Ming Z, Li Z. Prediction of home energy consumption based on gradient boosting regression tree. Energy Rep. 2021;7:1246–55. https://doi.org/10.1016/j.egyr.2021.02.006.
Article Google Scholar
Troncoso A, Salcedo-Sanz S, Casanova-Mateo C, Riquelme JC, Prieto L. Local models-based regression trees for very short-term wind speed prediction. Renew Energy. 2015;81:589–98. https://doi.org/10.1016/j.renene.2015.03.071.
Article Google Scholar
Lang MN, Schlosser L, Hothorn T, Mayr GJ, Stauffer R, Zeileis A. Circular regression trees and forests with an application to probabilistic wind direction forecasting. J R Stat Soc Ser C Appl Stat. 2020;69(5):1357–74. https://doi.org/10.1111/rssc.12437.
Article MathSciNet Google Scholar
Barber S, Nordborg H. Improving site-dependent power curve prediction accuracy using regression trees. J Phys Conf Ser. 2020. https://doi.org/10.1088/1742-6596/1618/6/062003.
Article Google Scholar
Torres-Barrán A, Alonso Á, Dorronsoro JR. Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing. 2019;326–327:151–60. https://doi.org/10.1016/j.neucom.2017.05.104.
Article Google Scholar
Brauns J, Turek T. Alkaline water electrolysis powered by renewable energy: a review. Processes. 2020. https://doi.org/10.3390/pr8020248.
Article Google Scholar
Maleki A, Rosen MA, Pourfayaz F. Optimal operation of a grid-connected hybrid renewable energy system for residential applications. Sustain. 2017. https://doi.org/10.3390/su9081314.
Article Google Scholar
Nikolaidis P, Partaourides H. A model predictive control for the dynamical forecast of operating reserves in frequency regulation services. Forecasting. 2021;3(1):228–41. https://doi.org/10.3390/forecast3010014.
Article Google Scholar
Michaelides S, Lane J, Kasparis T. Effect of vertical air motion on disdrometer derived Z-R coefficients. Atmosphere. 2019. https://doi.org/10.3390/atmos10020077.
Article Google Scholar
Neunaber I, Hölling M, Stevens RJAM, Schepers G, Peinke J. Distinct turbulent regions in the wake of a wind turbine and their inflow-dependent locations: the creation of a wake map. Energies. 2020;13(20):1–20. https://doi.org/10.3390/en13205392.
Article Google Scholar
Nikolaidis P, Poullikkas A. Co-optimization of active power curtailment, load shedding and spinning reserve deficits through hybrid approach: comparison of electrochemical storage technologies. IET Renew Power Gener. 2022;16(1):92–104. https://doi.org/10.1049/rpg2.12339.
Article Google Scholar
Nikolaidis P, Poullikkas A. A novel cluster-based spinning reserve dynamic model for wind and PV power reinforcement. Energy. 2021;234: 121270. https://doi.org/10.1016/j.energy.2021.121270.
Article Google Scholar
Nikolaidis P, Poullikkas A. Evolutionary priority-based dynamic programming for the adaptive integration of intermittent distributed energy resources in low-inertia power systems. Eng. 2021;2(4):643–60. https://doi.org/10.3390/eng2040041.
Article Google Scholar
Livera A, Paphitis G, Theristis M, Lopez-Lorente J, Makrides G, Georghiou GE. Photovoltaic system health-state architecture for data-driven failure detection. Solar. 2022;2(1):81–98. https://doi.org/10.3390/solar2010006.
Article Google Scholar
Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Top. 2004;69(6):16. https://doi.org/10.1103/PhysRevE.69.066138.
Article MathSciNet Google Scholar
Ahmad MW, Reynolds J, Rezgui Y. Predictive modelling for solar thermal energy systems: a comparison of support vector regression, random forest, extra trees and regression trees. J Clean Prod. 2018;203:810–21. https://doi.org/10.1016/j.jclepro.2018.08.207.
Article Google Scholar
Ishwaran H. Variable importance in binary regression trees and forests. Electron J Stat. 2007;1:519–37. https://doi.org/10.1214/07-EJS039.
Article MathSciNet MATH Google Scholar
Theocharides S, Makrides G, Livera A, Theristis M, Kaimakis P, Georghiou GE. Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl Energy. 2020;268:1–22. https://doi.org/10.1016/j.apenergy.2020.115023.
Article Google Scholar
Gallego C, Pinson P, Madsen H, Costa A, Cuerva A. Influence of local wind speed and direction on wind power dynamics—application to offshore very short-term forecasting. Appl Energy. 2011;88(11):4087–96. https://doi.org/10.1016/j.apenergy.2011.04.051.
Article Google Scholar
Bilal B et al. Wind turbine power output prediction model design based on artificial neural networks and climatic spatiotemporal data. In: Proceedings of 2018 IEEE international conference on industrial technology (ICIT). 2018;2018:1085–1092. https://doi.org/10.1109/ICIT.2018.8352329.
Sideratos G, Hatziargyriou ND. An advanced statistical method for wind power forecasting. IEEE Trans Power Syst. 2007;22(1):258–65. https://doi.org/10.1109/TPWRS.2006.889078.
Article Google Scholar
Abhinav R, Pindoriya NM, Wu J, Long C. Short-term wind power forecasting using wavelet-based neural network. Energy Procedia. 2017;142:455–60. https://doi.org/10.1016/j.egypro.2017.12.071.
Article Google Scholar
Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A novel bidirectional mechanism based on time series model for wind power forecasting. Appl Energy. 2016;177:793–803. https://doi.org/10.1016/j.apenergy.2016.03.096.
Article Google Scholar
De Giorgi MG, Ficarella A, Tarantino M. Assessment of the benefits of numerical weather predictions in wind power forecasting based on statistical methods. Energy. 2011;36(7):3968–78. https://doi.org/10.1016/j.energy.2011.05.006.
Article Google Scholar
Qu Z, Mao W, Zhang K, Zhang W, Li Z. Multi-step wind speed forecasting based on a hybrid decomposition technique and an improved back-propagation neural network. Renew Energy. 2019;133:919–29. https://doi.org/10.1016/j.renene.2018.10.043.
Article Google Scholar
Carolin Mabel M, Fernandez E. Analysis of wind power generation and prediction using ANN: a case study. Renew Energy. 2008;33(5):986–92. https://doi.org/10.1016/j.renene.2007.06.013.
Article Google Scholar
Badari Narayana P, Manjunatha R, Hemachandra Reddy K. Wind energy forecasting using radial basis function neural networks. Int J Res Eng Technol. 2015;04(12):274–9. https://doi.org/10.15623/ijret.2015.0412054.
Article Google Scholar
He B, et al. A combined model for short-term wind power forecasting based on the analysis of numerical weather prediction data. Energy Rep. 2022;8:929–39. https://doi.org/10.1016/j.egyr.2021.10.102.
Article Google Scholar
Liu J, Wang X, Lu Y. A novel hybrid methodology for short-term wind power forecasting based on adaptive neuro-fuzzy inference system. Renew Energy. 2017;103:620–9. https://doi.org/10.1016/j.renene.2016.10.074.
Article Google Scholar
Hong YY, Rioflorido CLPP. A hybrid deep learning-based neural network for 24-h ahead wind power forecasting. Appl Energy. 2019;250(May):530–9. https://doi.org/10.1016/j.apenergy.2019.05.044.
Article Google Scholar
Zhang J, Yan J, Infield D, Liu Y, Sang Lien F. Short-term forecasting and uncertainty analysis of wind turbine power based on long short-term memory network and Gaussian mixture model. Appl Energy. 2019;241:229–44. https://doi.org/10.1016/j.apenergy.2019.03.044.
Article Google Scholar
Lin Z, Liu X, Collu M. Wind power prediction based on high-frequency SCADA data along with isolation forest and deep learning neural networks. Int J Electr Power Energy Syst. 2020;118: 105835. https://doi.org/10.1016/j.ijepes.2020.105835.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Cyprus University of Technology, P. O. Box 50329, 3603, Limassol, Cyprus
Pavlos Nikolaidis

Authors

Pavlos Nikolaidis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

PN wrote the main manuscript text and prepared all figures. The author reviewed the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Pavlos Nikolaidis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nikolaidis, P. Wind power forecasting in distribution networks using non-parametric models and regression trees. Discov Energy 2, 6 (2022). https://doi.org/10.1007/s43937-022-00011-z

Download citation

Received: 15 September 2022
Accepted: 08 October 2022
Published: 14 October 2022
DOI: https://doi.org/10.1007/s43937-022-00011-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Wind power forecasting in distribution networks using non-parametric models and regression trees

Abstract

Similar content being viewed by others

Forecasting of Electricity Demand and Renewable Energy Generation for Grid Stability

Modeling and Forecasting Renewable Energy Resources for Sustainable Power Generation: Basic Concepts and Predictive Model Results

Reliability-constrained transmission expansion planning based on simultaneous forecasting method of loads and renewable generations

1 Introduction

2 Mathematical framework

3 Data harvesting and processing