Short-term load forecasting of power systems by gene expression programming
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s00521-010-0444-y
- Cite this article as:
- Sadat Hosseini, S.S. & Gandomi, A.H. Neural Comput & Applic (2012) 21: 377. doi:10.1007/s00521-010-0444-y
- 14 Citations
- 403 Views
Abstract
Short-term load forecasting is a popular topic in the electric power industry due to its essentiality in energy system planning and operation. Load forecasting is important in deregulated power systems since an improvement of a few percentages in the prediction accuracy will bring benefits worth of millions of dollars. In this study, a promising variant of genetic programming, namely gene expression programming (GEP), is utilized to improve the accuracy and enhance the robustness of load forecasting results. With the use of the GEP technique, accurate relationships were obtained to correlate the peak and total loads to average, maximum and lowest temperatures of day. The presented model is applied to forecast short-term load using the actual data from a North American electric utility. A multiple least squares regression analysis was performed using the same variables and same data sets to benchmark the GEP models. For more verification, a subsequent parametric study was also carried out. The observed agreement between the predicted and measured peak and total load values indicates that the proposed correlations are capable of effectively forecasting the short-term load. The GEP-based formulas are relatively short, simple and particularly valuable for providing an analysis tool accessible to practicing engineers.
Keywords
Short-term load forecastingGene expression programmingFormulation1 Introduction
Short-term load forecasting (STLF) is a vital instrument in power system planning, operation and control. Overestimation of electricity demand will cause over conservative operation leading to the start-up of too many units or excessive energy purchase. Conversely, underestimation may result in over risky operation, persuading insufficient preparation of spinning reserve, and causes the system to operate in vulnerable region to the disturbance. In the deregulated electricity market, an improvement of a few percentages in load forecasting accuracy would bring benefits worth millions of dollars [1]. With the rise of deregulation and free competition of the electric power industry in many countries, load forecasting becomes more important. Load forecasts are critical to the energy transactions in competitive electricity markets [2]. The forecast errors have considerable implications for profits, market shares and, ultimately, share-holder value. System operators have to use as much as possible reliable data, namely on load forecast results, having in mind that uncertainty is a key issue to most decisions [3]. Besides, load forecast is often a key datum for electricity price forecast [4]. However, the electric load is increasingly becoming difficult to forecast due to the variability and nonstationarity of load series, especially in the electricity markets [5].
STLF is aimed at predicting a system load over an interval of 1 day or 1 week. Various methods for power system load forecasting have been proposed in the last few decades. Early methods included exponential smoothing [6], regression [7], Box–Jenkins models [8], Kalman filter [9], state space model [10], time series methods [10, 11] and then modified support vector machines [12].
By extending developments in computational software and hardware, several alternative computer-aided data mining approaches have been developed. The idea is that a pattern recognition system learns adaptively from experience and extracts various discriminators, each appropriate for its purpose. Recently, much research has been carried out on the application of artificial intelligence (AI) techniques to the load forecasting problem. The AI-based techniques such as pattern recognition [13], expert system [14–16], fuzzy expert system [17], fuzzy time series [18], neural networks (NN) [10, 19, 20] and fuzzy-NNs [21, 22] have been proposed for load forecasting. Despite the acceptable performance of the AI techniques, they have some fundamental disadvantages that limit them to be used by several researchers. They usually do not give a certain function to calculate the outcome using input values. Hence, a better understanding of the nature of the derived relationship between the different interrelated input and output data is not provided by them. Most of the AI techniques are appropriate to be used as a part of a computer program and is not suitable for practical calculations.
Another alternative AI-based approach, which is based on the data alone to determine the structure and parameters of the model, is known as genetic programming (GP) [23, 24]. GP may generally be defined as a supervised machine learning technique that searches a program space instead of a data space [24]. Many researchers have employed GP and its variants to find out any complex relationships between experimental data [24–27]. Gene expression programming (GEP) [28] is a recent extension to GP that evolves computer programs of different sizes and shapes encoded in linear chromosomes of fixed length. The GEP chromosomes are composed of multiple genes, each gene encoding a smaller subprogram. Based on the numerical experiments, the GEP approach is able to significantly outperform similar techniques and can be utilized as efficient alternatives to the traditional GP [28, 29].
In this paper, the GEP approach is proposed for STLF. Generalized relationships were obtained to correlate the peak and total loads to the average, peak and lowest temperatures. The GEP is able to perform nonlinear modeling and adaptation. It does not need assumption of any functional relationship between load and weather variables in advance. Applications of the GEP technique to predict the STLF test results are conspicuous by their near absence. A linear regression analysis was performed to benchmark the GEP-based correlations. The formulas evolved by GEP can reliably be employed in STLF by other researchers.
2 Genetic programming
2.1 Gene expression programming
An ET can inversely be converted into a K-expression by recording the nodes from left to right in each layer of the ET, from root layer down to the deepest one to form the string. As previously mentioned, GEP genes have fixed length, which is predetermined for a given problem. Thus, what varies in GEP is not the length of genes but the size of the corresponding ETs. This means that there exist a certain number of redundant elements, which are not useful for the genome mapping. Hence, the valid length of a K-expression may be equal or less than the length of the GEP gene. To guarantee the validity of a randomly selected genome, GEP employs a head–tail method. Each GEP gene is composed of a head and a tail. The head may contain both function and terminal symbols, whereas the tail may contain terminal symbols only [28].
3 Short-term load forecasting models
3.1 Model construction using GEP
- K
Day of predicted load
- PL (MW)
Peak load at day k (Max{L(1, d),…, L(24, d)})
- TL (MW)
Peak load at day k\( \left( {\sum\nolimits_{h = 1}^{24} {L(h,\,k)} } \right) \)
- T_{1} (°F)
Average temperature at day k
- T_{2} (°F)
Peak temperature at day k
- T_{3} (°F)
Lowest temperature at day k
- L(h, k)
Load at hour h on day k.
Parameter settings for the GEP algorithm
Parameters | Settings |
---|---|
Number of generation | 20,000–120,000 |
Number of chromosomes | 100 |
Number of genes | 1, 2, 3 |
Head size | 3, 5, 8 |
Linking function | ×, + |
Fitness function error type | MAE |
Mutation rate | 0.044 |
Inversion rate | 0.1 |
One-point recombination rate | 0.3, 0.5 |
Two-point recombination rate | 0.3, 0.5 |
Gene recombination rate | 0.1 |
Gene transposition rate | 0.1 |
Function set | +, −, ×, /, √, log, sin, cos, tan, exp |
- (i)
Providing the best fitness value on the training set of data
- (ii)
Providing the best fitness value on a test set of unseen data.
3.2 Model construction using regression analysis
In the conventional material modeling process, regression analysis is an important tool for building a model. In this study, a multivariable least squares regression (MLSR) [34] analysis was performed to have an idea about the predictive power of the GEP technique, in comparison to a classical statistical approach. The LSR method is extensively used in regression analysis primarily because of its interesting nature. LSR minimizes the sum-of-squared residuals for each equation, accounting for any cross-equation restrictions on the parameters of the system. If there are no such restrictions, this technique is identical to estimating each equation using single-equation ordinary least squares. The LSR models were developed using the same input variables as GEP. Eviews software package [35] was used to perform the regression analysis.
3.3 Model development using GRNN
ANNs have emerged as a result of simulation of biological nervous system. The ANN method was founded in the early 1940s by McCulloch and co-workers [36]. The first researches were focused on building simple neural networks to model simple logic functions. Generalized regression neural network (GRNN) is a class of ANN architectures proposed by Specht [37]. GRNN is one variant of the radial basis function (RBF) network.
3.4 Performance measures
3.5 Database
As in other research area, in STLF, it is important to allow the reproduction of one’s results. The only way of doing is using public domain data sets. Our test case includes hourly load and temperature data from a North American electric utility in 1988, which can be found at https://www.ee.washington.edu/class/559/00_archived/. These data sets were also considered in [39]. Typically, the electric load shows a different type of behavior for each day of the week. On weekdays, the behavior tends to be similar, while on Saturdays and Sundays, it is quite different. When it is used as input for a forecasting model, the information related to the day of the week may influence the results. Thus, our focus is on a normal weekday (i.e., no holiday or weekends).
Descriptive statistics of the variables used in the model development
Parameter | Input | Output | |||
---|---|---|---|---|---|
T_{1} (°F) | T_{2} (°F) | T_{3} (°F) | PL (MW) | TL (MW) | |
Mean | 53.26 | 61.72 | 45.96 | 2409.67 | 46321.04 |
Standard error | 0.62 | 0.80 | 0.50 | 27.56 | 477.65 |
Median | 53.5 | 61 | 47 | 2372 | 43755 |
Standard deviation | 9.83 | 12.73 | 7.95 | 436.59 | 7567.45 |
Sample variance | 96.61 | 162.10 | 63.13 | 190610.60 | 57266312.64 |
Kurtosis | −0.51 | −0.71 | −0.24 | −0.82 | −0.32 |
Skewness | −0.09 | 0.18 | −0.55 | 0.53 | 0.87 |
Range | 45.79 | 56 | 37 | 1680 | 28868 |
Minimum | 30.79 | 36 | 25 | 1858 | 37499 |
Maximum | 76.58 | 92 | 62 | 3538 | 66367 |
Sum | 13368.17 | 15491 | 11536 | 604828 | 11626580 |
4 Results
4.1 GEP-based formulation for peak and total loads
4.2 MLSR-based formulations for peak load
4.3 GRNN-based model for peak and total loads
5 Discussion
Overall performance of the proposed models for peak and total load forecasting
Model | Train performance | Test performance | ||||
---|---|---|---|---|---|---|
R | MSE | MAE | R | MSE | MAE | |
GEP (8) (PL) | 0.942877 | 23553.12874 | 125.075788 | 0.946252 | 22664.06939 | 111.562959 |
GRNN (PL) | 0.94447 | 1289428.997 | 128.9428997 | 0.94939 | 1251980.342 | 125.1980342 |
MLSR (10) (PL) | 0.934286 | 23990.19471 | 126.3462496 | 0.94939 | 18929.99523 | 110.3804682 |
GEP (9) (TL) | 0.944943 | 6226994.492 | 2046.788505 | 0.964702 | 4153219.58 | 1583.486015 |
GRNN (TL) | 0.969267 | 3427741.533 | 1343.240354 | 0.968649 | 3796184.405 | 1513.190926 |
MLSR (11) (TL) | 0.910624 | 9656423.898 | 2593.723413 | 0.93256 | 7771130.247 | 2113.406441 |
6 Parametric analysis
7 Conclusions
- (i)
It was observed that the GEP-based correlations are capable of predicting the peak and total loads with high accuracy. The proposed nonlinear GEP models produced considerably better outcomes over the linear regression-based models.
- (ii)
The proposed models simultaneously take into account the role of several important factors representing the load behavior.
- (iii)
Unlike the traditional methods, GEP does not require any simplifying assumptions in developing the models.
- (iv)
In addition to the acceptable accuracy, the proposed GEP-based formulas are relatively short and simple.
- (v)
The sensitivity of the proposed correlations to the variation of influencing parameters was evaluated through a parametric study. In most cases, the peak and total loads decrease with increasing the average, maximum and lowest temperatures of day.
- (vi)
The proposed GEP-based correlations give the user an insight into the relationship between input and output data. An interesting feature of the GEP approach is in the possibility of getting more than one correlation for a complex phenomenon by selecting various parameters and function sets involved in its predictive algorithm.
- (vii)
As more data become available, including those for other years or conditions, the proposed models can be improved to make more accurate predictions for a wider range.
The above conclusions confirm the efficiency of the developed models for their reliable applications to short-term load forecasting. GEP is quite robust in nonlinear relationship modeling. However, the underlying assumption that the input parameters are reliable is not always the case. Since fuzzy logic can provide a systematic method to deal with imprecise and incomplete information, the process of developing a hybrid fuzzy–GEP model for such problems can be a suitable topic for further studies.
Acknowledgments
The authors are thankful to Prof. Otávio A.S. Carpinteiro (Federal University of Itajubá) for his support and providing the database.