Neural Computing and Applications

, Volume 21, Issue 2, pp 377–389

Short-term load forecasting of power systems by gene expression programming


  • Seyyed Soheil Sadat Hosseini
    • Tafresh University
    • Tafresh University
Original Article

DOI: 10.1007/s00521-010-0444-y

Cite this article as:
Sadat Hosseini, S.S. & Gandomi, A.H. Neural Comput & Applic (2012) 21: 377. doi:10.1007/s00521-010-0444-y


Short-term load forecasting is a popular topic in the electric power industry due to its essentiality in energy system planning and operation. Load forecasting is important in deregulated power systems since an improvement of a few percentages in the prediction accuracy will bring benefits worth of millions of dollars. In this study, a promising variant of genetic programming, namely gene expression programming (GEP), is utilized to improve the accuracy and enhance the robustness of load forecasting results. With the use of the GEP technique, accurate relationships were obtained to correlate the peak and total loads to average, maximum and lowest temperatures of day. The presented model is applied to forecast short-term load using the actual data from a North American electric utility. A multiple least squares regression analysis was performed using the same variables and same data sets to benchmark the GEP models. For more verification, a subsequent parametric study was also carried out. The observed agreement between the predicted and measured peak and total load values indicates that the proposed correlations are capable of effectively forecasting the short-term load. The GEP-based formulas are relatively short, simple and particularly valuable for providing an analysis tool accessible to practicing engineers.


Short-term load forecastingGene expression programmingFormulation

1 Introduction

Short-term load forecasting (STLF) is a vital instrument in power system planning, operation and control. Overestimation of electricity demand will cause over conservative operation leading to the start-up of too many units or excessive energy purchase. Conversely, underestimation may result in over risky operation, persuading insufficient preparation of spinning reserve, and causes the system to operate in vulnerable region to the disturbance. In the deregulated electricity market, an improvement of a few percentages in load forecasting accuracy would bring benefits worth millions of dollars [1]. With the rise of deregulation and free competition of the electric power industry in many countries, load forecasting becomes more important. Load forecasts are critical to the energy transactions in competitive electricity markets [2]. The forecast errors have considerable implications for profits, market shares and, ultimately, share-holder value. System operators have to use as much as possible reliable data, namely on load forecast results, having in mind that uncertainty is a key issue to most decisions [3]. Besides, load forecast is often a key datum for electricity price forecast [4]. However, the electric load is increasingly becoming difficult to forecast due to the variability and nonstationarity of load series, especially in the electricity markets [5].

STLF is aimed at predicting a system load over an interval of 1 day or 1 week. Various methods for power system load forecasting have been proposed in the last few decades. Early methods included exponential smoothing [6], regression [7], Box–Jenkins models [8], Kalman filter [9], state space model [10], time series methods [10, 11] and then modified support vector machines [12].

By extending developments in computational software and hardware, several alternative computer-aided data mining approaches have been developed. The idea is that a pattern recognition system learns adaptively from experience and extracts various discriminators, each appropriate for its purpose. Recently, much research has been carried out on the application of artificial intelligence (AI) techniques to the load forecasting problem. The AI-based techniques such as pattern recognition [13], expert system [1416], fuzzy expert system [17], fuzzy time series [18], neural networks (NN) [10, 19, 20] and fuzzy-NNs [21, 22] have been proposed for load forecasting. Despite the acceptable performance of the AI techniques, they have some fundamental disadvantages that limit them to be used by several researchers. They usually do not give a certain function to calculate the outcome using input values. Hence, a better understanding of the nature of the derived relationship between the different interrelated input and output data is not provided by them. Most of the AI techniques are appropriate to be used as a part of a computer program and is not suitable for practical calculations.

Another alternative AI-based approach, which is based on the data alone to determine the structure and parameters of the model, is known as genetic programming (GP) [23, 24]. GP may generally be defined as a supervised machine learning technique that searches a program space instead of a data space [24]. Many researchers have employed GP and its variants to find out any complex relationships between experimental data [2427]. Gene expression programming (GEP) [28] is a recent extension to GP that evolves computer programs of different sizes and shapes encoded in linear chromosomes of fixed length. The GEP chromosomes are composed of multiple genes, each gene encoding a smaller subprogram. Based on the numerical experiments, the GEP approach is able to significantly outperform similar techniques and can be utilized as efficient alternatives to the traditional GP [28, 29].

In this paper, the GEP approach is proposed for STLF. Generalized relationships were obtained to correlate the peak and total loads to the average, peak and lowest temperatures. The GEP is able to perform nonlinear modeling and adaptation. It does not need assumption of any functional relationship between load and weather variables in advance. Applications of the GEP technique to predict the STLF test results are conspicuous by their near absence. A linear regression analysis was performed to benchmark the GEP-based correlations. The formulas evolved by GEP can reliably be employed in STLF by other researchers.

2 Genetic programming

GP is a symbolic optimization technique that creates computer programs to solve a problem using the principle of Darwinian natural selection. GP was introduced by Koza [23] as an extension of genetic algorithms (GAs). In GP, a random population of individuals (trees) is created to achieve high diversity. While common optimization techniques represent the potential solutions as numbers (vectors of real numbers), the symbolic optimization algorithms present the potential solutions by structural ordering of several symbols. A population member in GP is a hierarchically structured tree comprising functions and terminals. The functions and terminals are selected from a set of functions and a set of terminals. For example, function set F can contain the basic arithmetic operations (+, −, ×, /, etc.), Boolean logic functions (AND, OR, NOT, etc.) or any other mathematical functions. The terminal set T contains the arguments for the functions and can consist of numerical constants, logical constants, variables, etc. The functions and terminals are chosen at random and constructed together to form a computer model in a tree-like structure with a root point with branches extending from each function and ending in a terminal. An example of a simple tree representation of a GP model is illustrated in Fig. 1.
Fig. 1

The tree representation of a GP model (X1 + 3/X2)2

The creation of the initial population is a blind random search for solutions in the large space of possible solutions. Once a population of models has been created at random, the GP algorithm evaluates the individuals, selects individuals for reproduction, generates new individuals by mutation, crossover and direct reproduction and finally creates the new generation in all iterations [23]. During the crossover procedure, a point on a branch of each solution (program) is selected at random and the set of terminals and/or functions from each program are then swapped to create two new programs as can be seen in Fig. 2. The evolutionary process continues by evaluating the fitness of the new population and starting a new round of reproduction and crossover. During this process, the GP algorithm occasionally selects a function or terminal from a model at random and mutates it (see Fig. 3). GEP is a linear variant of GP. The linear variants of GP make a clear distinction between the genotype and the phenotype of an individual. Thus, the individuals are represented as linear strings that are decoded and expressed like nonlinear entities (trees) [24, 30].
Fig. 2

Typical crossover operation in genetic programming
Fig. 3

Typical mutation operation in genetic programming

2.1 Gene expression programming

GEP is a natural development of GP first invented by Ferreira [29]. Most of the genetic operators used in GAs can also be implemented in GEP with minor changes. GEP consists of five main components: function set, terminal set, fitness function, control parameters and termination condition. Unlike the parse-tree representation in the conventional GP, GEP uses a fixed length of character strings to represent solutions to the problems, which are afterward expressed as parse trees of different sizes and shapes. These trees are called GEP expression trees (ETs). One advantage of the GEP technique is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosome level. Another strength of GEP refers to its unique, multigenic nature that allows the evolution of more complex programs composed of several subprograms. Each GEP gene contains a list of symbols with a fixed length that can be any element from a function set like {+, −, ×, /, Sqrt} and the terminal set like {x1, x2, x3, 2}. The function set and terminal set must have the closure property: each function must able to take any value of data type which can be returned by a function or assumed by a terminal. A typical GEP gene with the given function and terminal sets can be as follows:
$$ \underline{{ + . \times . + .\,x_{1} . - .}} + . + . \times .\,x_{2} .\,x_{1} .\,x_{3} .\,3.\,x_{2} .\,x_{3} $$
where x1, x2 and x3 are variables and 3 is a constant; ‘‘.’’ is element separator for easy reading. The above expression is termed as Karva notation or K-expression [28, 31]. A K-expression can be represented by a diagram which is an ET. For example, the above sample gene can be expressed as Fig. 4.
Fig. 4

Example of expression trees (ETs)

The conversion starts from the first position in the K-expression, which corresponds to the root of the ET, and reads through the string one by one. The above GEP gene can also be expressed in a mathematical form as:
$$ x_{1} ((x_{1} + 3) - (x_{2} \times x_{3} )) + (x_{2} + x_{1} ). $$

An ET can inversely be converted into a K-expression by recording the nodes from left to right in each layer of the ET, from root layer down to the deepest one to form the string. As previously mentioned, GEP genes have fixed length, which is predetermined for a given problem. Thus, what varies in GEP is not the length of genes but the size of the corresponding ETs. This means that there exist a certain number of redundant elements, which are not useful for the genome mapping. Hence, the valid length of a K-expression may be equal or less than the length of the GEP gene. To guarantee the validity of a randomly selected genome, GEP employs a head–tail method. Each GEP gene is composed of a head and a tail. The head may contain both function and terminal symbols, whereas the tail may contain terminal symbols only [28].

The GEP algorithm begins with the random generation of the fixed-length chromosome of each individual for the initial population. Afterward, the chromosomes are expressed, and the fitness of each individual is evaluated. The individuals are then selected according to their fitness to reproduce with modification. The individuals of this new generation are subjected to the same developmental process: expression of the genomes, confrontation of the selection environment and reproduction with modification. The above process is repeated for a definite number of generations or until a solution has been found [28]. A basic representation of the GEP algorithm is presented in Fig. 5. In GEP, the individuals are selected and copied into the next generation according to the fitness by roulette-wheel sampling with elitism. This guarantees the survival and cloning of the best individual to the next generation. Variation in the population is introduced by conducting single or several genetic operators on selected chromosomes, which include crossover, mutation and rotation. The rotation operator is used to rotate two subparts of element sequence in a genome with respect to a randomly chosen point. It can also drastically reshape the ETs. As an example, the following gene
$$ + . + . \times .\,x_{2} .\,x_{1} .\,x_{3} .\,3.\,x_{2} .\,x_{3} .\underline{{ + . \times . + .\,x_{1} . - }} $$
rotates the first five elements of gene (4) to the end. Only the first seven elements are used to construct the solution function (x2 + x1) + (x3 × 3), with the corresponding expression shown in Fig. 6.
Fig. 5

A basic representation of the GEP algorithm
Fig. 6

Example of expression trees (ETs)

3 Short-term load forecasting models

3.1 Model construction using GEP

In this study, two STLF models are established by GEP in order to improve the automation, intelligence and precision of load forecasting. There are a large amount of influencing factors considered in STLF, such as meteorological, climate factors and seasonal factors, etc. However, not all of these factors are necessary because some factors are relevant and the other not relevant. The conclusion can be deduced only in accord with few factors. All factors considered would compromise the accuracy of the prediction scheme. In the STLF problem, it is not simple to identify a relationship between the parameters, or the problem could be too complex to be described in a mathematical function. GEP has the ability to generate the best computer program to describe the relationship between the input and output. Consequently, the peak and total loads formulations were considered to be as follows:
$$ {\text{PL}},\;{\text{TL}} = f(T_{1} ,\;T_{2} ,\;T_{3} ) $$

Day of predicted load


Peak load at day k (Max{L(1, d),…, L(24, d)})


Peak load at day k\( \left( {\sum\nolimits_{h = 1}^{24} {L(h,\,k)} } \right) \)

T1 (°F)

Average temperature at day k

T2 (°F)

Peak temperature at day k

T3 (°F)

Lowest temperature at day k

L(h, k)

Load at hour h on day k.

Two different GEP-based models were developed for PL and TL. In order to find the optimal STLF model, selecting appropriate parameters of the GEP evolution is necessary. Various parameters involved in the GEP predictive algorithm are shown in Table 1. The parameter selection will affect the model generalization capability of GEP. They were selected based on some previously suggested values [24] and also after a trial and error approach.
Table 1

Parameter settings for the GEP algorithm



Number of generation


Number of chromosomes


Number of genes

1, 2, 3

Head size

3, 5, 8

Linking function

×, +

Fitness function error type


Mutation rate


Inversion rate


One-point recombination rate

0.3, 0.5

Two-point recombination rate

0.3, 0.5

Gene recombination rate


Gene transposition rate


Function set

+, −, ×, /, √,

log, sin, cos, tan, exp

For developing the GEP-based empirical models, a computer software called GeneXproTools [32, 33] was used. The best GEP models were chosen on the basis of a multi-objective strategy as below:
  1. (i)

    Providing the best fitness value on the training set of data

  2. (ii)

    Providing the best fitness value on a test set of unseen data.


3.2 Model construction using regression analysis

In the conventional material modeling process, regression analysis is an important tool for building a model. In this study, a multivariable least squares regression (MLSR) [34] analysis was performed to have an idea about the predictive power of the GEP technique, in comparison to a classical statistical approach. The LSR method is extensively used in regression analysis primarily because of its interesting nature. LSR minimizes the sum-of-squared residuals for each equation, accounting for any cross-equation restrictions on the parameters of the system. If there are no such restrictions, this technique is identical to estimating each equation using single-equation ordinary least squares. The LSR models were developed using the same input variables as GEP. Eviews software package [35] was used to perform the regression analysis.

3.3 Model development using GRNN

ANNs have emerged as a result of simulation of biological nervous system. The ANN method was founded in the early 1940s by McCulloch and co-workers [36]. The first researches were focused on building simple neural networks to model simple logic functions. Generalized regression neural network (GRNN) is a class of ANN architectures proposed by Specht [37]. GRNN is one variant of the radial basis function (RBF) network.

The available database was used for developing the GRNN prediction model. For the development of GRNN model, a script was written in the MATLAB environment using Neural Network Toolbox 5.1 [38]. For the GRNN analysis, the available data sets were randomly divided into learning, validation and testing subsets. The learning data were used for the training of the algorithm. The validation data were used to specify the generalization capability of the obtained models on data they did not train on (model selection). In other words, the learning and validation data sets were used to select the best models and were included in the training process. Thus, they were categorized into one group referred to as training data. The testing data were finally used to measure the performance of the optimal model obtained by GRNN on data that played no role in building the model. In order to avoid overtraining, the spread of the radial basis functions (spread constant) was changed in a way that error of the validation data became close to error of the learning data sets. If a smaller spread constant was selected, the output of models would completely fit on the learning data, but the generalization ability of the models might be decreased. This procedure is shown in Figs. 7 and 8 for the peak and total loads. The training and testing data sets were the same as those used for developing the other models. The optimal GRNN model had three layers: three input units in the input layer, a hidden layer with 50 neurons (equal to the number of the learning data) and an output layer.
Fig. 7

Results of peak load prediction using GRNN for different spread of radial basis functions
Fig. 8

Results of total load prediction using GRNN for different spread of radial basis functions

3.4 Performance measures

In order to evaluate the performance of the proposed models, correlation coefficient (R), mean square error (MSE) and mean absolute error (MAE) were used to evaluate the capabilities of the proposed correlations. R, MSE and MAE are given in the form of formulas as follows:
$$ R = {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {h_{i} - \overline{{h_{i} }} } \right)\left( {t_{i} - \overline{{t_{i} }} } \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {h_{i} - \overline{{h_{i} }} } \right)}^{2} \sum\nolimits_{i = 1}^{n} {\left( {t_{i} - \overline{{t_{i} }} } \right)}^{2} } }}} $$
$$ {\text{MSE}} = {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {h_{i} - \overline{{h_{i} }} } \right)}^{2} }}{n}} $$
$$ {\text{MAE}} = {\frac{{\sum\nolimits_{i = 1}^{n} {\left| {h_{i} - \overline{{h_{i} }} } \right|} }}{n}} $$
where hi and ti are, respectively, actual and calculated outputs for the ith output, \( \overline{h}_{i} \) is the average of the actual outputs and n is the number of sample.

3.5 Database

As in other research area, in STLF, it is important to allow the reproduction of one’s results. The only way of doing is using public domain data sets. Our test case includes hourly load and temperature data from a North American electric utility in 1988, which can be found at These data sets were also considered in [39]. Typically, the electric load shows a different type of behavior for each day of the week. On weekdays, the behavior tends to be similar, while on Saturdays and Sundays, it is quite different. When it is used as input for a forecasting model, the information related to the day of the week may influence the results. Thus, our focus is on a normal weekday (i.e., no holiday or weekends).

For the GEP analysis, the database was randomly divided into training and testing subsets. In order to obtain a consistent data division, several combinations of the training and testing sets were considered. The selection was such that the maximum, minimum, mean and standard deviation of parameters were consistent in the training and testing data sets. Out of the 251 data, 201 data (80%) are used as training and 50 data (20%) for testing the generalization capability of the correlation of the methods. The descriptive statistics of the data used in this study are given in Table 2. To visualize samples’ distribution, the data are presented by frequency histograms (Fig. 9).
Table 2

Descriptive statistics of the variables used in the model development




T1 (°F)

T2 (°F)

T3 (°F)









Standard error












Standard deviation






Sample variance









































Fig. 9

The histograms of input and output variables

4 Results

4.1 GEP-based formulation for peak and total loads

Formulations of the peak load (PL) and total load (TL) for the best result by the GEP algorithm are as given below:
$$ {\text{PL}}({\text{MW}}) = {\frac{729}{{T_{1} - T_{3} }}} - \sin (T_{2} )(T_{2} - T_{3} ) + (\ln (T_{3} - 6) - 65)^{2} - \ln (2T_{3} )^{0.5} (20T_{3} - 200) $$
$$ {\text{TL}}({\text{MW}}) = T_{1} T_{2} - 8T_{3} \cos (T_{2} ) - 793T_{1} - \sin \left( {{\frac{{T_{1} }}{7}}} \right)(T_{2} + 7) \cdot (T_{3} + 7) + 86943 $$
The comparisons of GEP-predicted and actual peak and total loads are, respectively, shown in Figs. 10 and 11. The expression trees of the above formulations are shown in Figs. 12 and 13.
Fig. 10

Results of GEP-predicted and actual peak load: a training; b testing
Fig. 11

Results of GEP-predicted and actual total load: a training; b testing
Fig. 12

Expression tree for the peak load
Fig. 13

Expression tree for the total load

4.2 MLSR-based formulations for peak load

Formulations of the peak load (PL) and total load (TL) for the best result by the MLSR algorithm are as given below:
$$ {\text{PL}}({\text{MW}}) = - 13.0944134\,T_{1} - 0.4942135227\,T_{2} - 34.85161858\,T_{3} + 4740.246312 $$
$$ {\text{TL}}({\text{MW}}) = - 89.20143218\,T_{1} - 146.2622355\,T_{2} - 553.1808846\,T_{3} + 85524.89205 $$
The comparisons of MLSR-predicted and actual peak and total loads are, respectively, shown in Figs. 14 and 15.
Fig. 14

Results of MLSR-predicted and actual peak load: a training; b testing
Fig. 15

Results of MLSR-predicted and actual total load: a training; b testing

4.3 GRNN-based model for peak and total loads

The comparisons of GRNN-predicted and actual peak and total loads are, respectively, shown in Figs. 16 and 17.
Fig. 16

Results of GRNN-predicted and actual peak load: a training; b testing
Fig. 17

Results of GRNN-predicted and actual total load: a training; b testing

5 Discussion

As described above, two formulas for peak and total load forecasting were obtained by means of GEP. No rational model to predict the peak and total loads has been developed yet that would encompass the influencing variables considered in this study. Thus, the GEP-based formula was benchmarked against the GRNN and MLSR models. Overall performance of these models on the train and test data are summarized in Table 3. A comparison of the ratio between the predicted and measured peak and total load values using different methods is also visualized in Fig. 18. It can be observed from Figs. 10, 11, 14, 15, 16 and 17 that the GEP and GRNN models significantly outperform the MLSR model. The GRNN model has provided slightly better results than the GEP model on the training, testing and whole of data. Although GRNN is successful in prediction, it has a fundamental disadvantage. GRNN is not able to produce practical prediction equations. Hence, it does not provide a better understanding of the nature of the derived relationship between the input and output data. This approach is appropriate to be used as a part of a computer program and is not suitable for practical calculations. On the other hand, the GEP provides a simple equation that can readily be used for routine design practice via hand calculations. By including the data for other types of peak and total loads, the proposed GEP model can be improved to make more accurate predictions for a wider range. Besides, empirical modeling based on the regression analysis has significant limitations. Most commonly used regression analyses can have large uncertainties, which own major drawbacks pertaining idealization of complex processes, approximation and averaging widely varying prototype conditions. In regression analyses, whatever the nature of corresponding problem is, it is tired to model by a predefined equation, either linear or nonlinear. Another major constraint in the application of regression analysis is the assumption of normality of residuals.
Table 3

Overall performance of the proposed models for peak and total load forecasting


Train performance

Test performance







GEP (8) (PL)














MLSR (10) (PL)







GEP (9) (TL)














MLSR (11) (TL)






Fig. 18

A comparison of the ratio between the predicted and measured peak and total load values using different methods (horizontal axis represents test number)

6 Parametric analysis

For further verification of the models, a parametric analysis was performed in this study. The main goal is to find the effect of each parameter on the values of PL and TL. The methodology is based on the change in only one input variable at a time, whereas other input variables are kept constant at the average values of their entire data sets. Figures 19 and 20 present the predicted values of PL and TL as functions of each parameter for the proposed GEP-based correlations. The sensitivity of PL and TL prediction to T1, T2 and T3 can be determined according to these figures. The parametric study results indicate that PL increases with increasing T1 up to about 50°F and then starts decreasing. It can be seen that PL is not sensitive to the variations of T2 up to 80°F and that thereafter, it increases with increases in T2. PL continuously decreases with increases in T3. The relevant results for TL imply that it decreases with increasing T1. It can also be observed that TL is not so sensitive to the changes in T2 and T3.
Fig. 19

Peak load parametric analysis in the GEP-based correlations
Fig. 20

Total load parametric analysis in the GEP-based correlations

7 Conclusions

In this study, a robust variant of GP, namely GEP is proposed to short-term load forecasting. Practical running shows that the algorithm of GEP does not need to determine the form of forecasting model in advance, and it can create function expressions automatically. Two different correlations for the peak and total load forecasting were developed. On the basis of a trial study and literature review, the average, maximum and lowest temperatures of day were identified to be used as predictor parameters. The following conclusions can be derived from the results presented in this research:
  1. (i)

    It was observed that the GEP-based correlations are capable of predicting the peak and total loads with high accuracy. The proposed nonlinear GEP models produced considerably better outcomes over the linear regression-based models.

  2. (ii)

    The proposed models simultaneously take into account the role of several important factors representing the load behavior.

  3. (iii)

    Unlike the traditional methods, GEP does not require any simplifying assumptions in developing the models.

  4. (iv)

    In addition to the acceptable accuracy, the proposed GEP-based formulas are relatively short and simple.

  5. (v)

    The sensitivity of the proposed correlations to the variation of influencing parameters was evaluated through a parametric study. In most cases, the peak and total loads decrease with increasing the average, maximum and lowest temperatures of day.

  6. (vi)

    The proposed GEP-based correlations give the user an insight into the relationship between input and output data. An interesting feature of the GEP approach is in the possibility of getting more than one correlation for a complex phenomenon by selecting various parameters and function sets involved in its predictive algorithm.

  7. (vii)

    As more data become available, including those for other years or conditions, the proposed models can be improved to make more accurate predictions for a wider range.


The above conclusions confirm the efficiency of the developed models for their reliable applications to short-term load forecasting. GEP is quite robust in nonlinear relationship modeling. However, the underlying assumption that the input parameters are reliable is not always the case. Since fuzzy logic can provide a systematic method to deal with imprecise and incomplete information, the process of developing a hybrid fuzzy–GEP model for such problems can be a suitable topic for further studies.


The authors are thankful to Prof. Otávio A.S. Carpinteiro (Federal University of Itajubá) for his support and providing the database.

Copyright information

© Springer-Verlag London Limited 2010