An effective co-evolutionary algorithm based on artificial bee colony and differential evolution for time series predicting optimization

Non-linear model optimization for predicting time series is a challenge problem. In Intelligent Transportation Systems (ITS) application, the indispensable short-term traffic flow prediction with big data makes the problem worst. To improve the prediction accuracy and ensure real-time performance in the big data environment, we propose a novel co-evolutionary artificial bee colony (ABC) improved by differential evolution (DE) optimization algorithm combined with a traffic flow predicting model trained by extreme learning machine (ELM) neural network. The proposed model can inherit the better generalization performance and the less training time consumption of the standard ELM, and can achieve a more balanced search strategy with the optimized weights and biases to overcome the random initialization deficiency of the typical ELM, and successfully obtain higher prediction accuracy compared with state-of-the-art methods. To verify the efficiency of the proposed model, we apply it to Lozi and Tent chaotic time series simulations and measured traffic flow time series experiments. Simulation and experimental results demonstrate that the proposed model has superior performance and competitive computational efficiency.


Introduction
An accurate and rapid short-term traffic flow prediction is important to the performance of many advanced applications of the intelligent transportation system (ITS). The forecast results can help road users make better travel routes, reduce traffic congestion and improve the capacity of the existing road networks. The non-linearity and complexity of shortterm traffic flow prediction have attracted researchers to propose many solutions to solve this challenging problem. The traffic flow data are a typical time sequence signal in which values are collected from traffic sensors based on an uniform time interval and a fixed sampling rate. Meanwhile, urban traffic flow data different with traditional time series data can reflect the periodical similarity like morning and night peak in big cities and the zonal relevancy like schools or hospitals or edge zones. The unique temporal and spatial B Yun Yang yangyun@chd.edu.cn Zongtao Duan ztduan@chd.edu.cn 1 School of Information and Engineering, Chang'an University, Xi'an, China characteristics of traffic flow data make it very difficult to accurately predict in real time. Therefore, the forecast results often have an obvious deviation compared with real measured traffic flow data. There are two major reasons. First, earlier short-term traffic flow prediction methods mainly depend on the limited historical detecting data because of lack of traffic flow data collecting infrastructure. Second, it is hard to build one predictive model that has better scalability and fit all traffic situations. Luckily, we have entered the era of big data transportation with the development of infrastructure and new data transmission technology.
Nowadays there are various sensors to collect real-time traffic flow data including radars, cameras, mobile Global Position System (GPS), social media, etc. However, the big data ITS also induce some new problems for short-term traffic flow prediction. Therefore, considering the remarkable big traffic flow data feature, how to efficiently use the tremendous traffic flow data to improve the accuracy and timely prediction result has grown into a hot spot [1,2]. Recently, a deep-leaning-based traffic flow prediction method has been successfully used to extract deep features for prediction, which have achieved superior performance [1,3]. However, the computational cost during both training and prediction processes is expensive for all deep learning models [4]. Apart from deep learning models, the popular autoregressive integrated moving average (ARIMA), Neural network (NN), and Support Vector machine (SVM) are successfully used for short-term traffic flow prediction. The NN models are widely utilized in engineering applications of ITS because of the good performance and the compact architecture.
A novel NN called extreme learning machine (ELM) can obtain good generalization at fast speed, which is suitable to predict short-term traffic flow in real time. However, the random weights and biases of ELM could lead to the instability of ELM output even based on same training dataset. To improve the stability of ELM and keep the predication accuracy, the input weights and hidden bias in ELM can be optimized by some search algorithms [19]. However, single optimization mechanism has different emphasis on local searching or global searching. Therefore, combining different optimization algorithms to build hybrid strategy for ELM predicting traffic flow is an effective and interesting research thought, which is the main motivation of our work.
In this paper, our contributions are as follows. First, we propose a novel hybrid NN model which combines a novel co-evolutionary artificial bee colony (ABC) optimization algorithm improved by differential evolution (DE) with a traffic flow predictor trained by extreme learning machine (ELM) algorithm. Second, we use the Lozi and Tent chaotic mapping to verify the improved ABC-ELM model to outperform than several traditional models and standard ELM. Finally, we collect real traffic flow sensors data to train and test the improved ABC-ELM model compared with the standard ELM model to certify the competitive performance and the computational cost in the real big data ITS environment.
The organization of this paper is as follows. The next section reviews the popular short-term traffic flow prediction solutions followed by which the detailed implementation of the proposed hybrid prediction model is explained. The results of simulation and experiment are shown in the subsequent section. The concluding remarks and future works are given in the last section.

Related works
In addition to the works mentioned in "Introduction" section, we further review works closely related to research and theoretical analysis in this paper. Over the latest few years, many data analysis models have been developed to solve the short-term traffic forecast problem.
These approaches can be divided into two categories, namely parametric methods and non-parametric methods. Among the parametric methods, the ARIMA model and many variants of ARIMA were widely used to predict shortterm freeway traffic flow [5]. Parametric methods can obtain a good performance when traffic flow shows regular variations, but they cannot deal with the obvious chaotic characteristics and nonlinear nature of traffic flow data. Therefore, many researchers have focused on nonparametric methods, such as K-nearest neighbor (KNN) model [6], back propagation (BP) neural network model [7], radial basis function (RBF) neural network model [8], and support vector regression (SVR) [9]. In these classical models, RBF neural networks have not only more powerful approximation but also better autoadaptability. It becomes the researched focus of many experts in traffic flow forecasting field [2].
Recently, a new RBF network called ELM has been verified in the benchmark regression and classification data sets very well [10,11]. For non-linear chaotic time series prediction problem, the ELM using sigmoid activation function can obtain high accuracy [12]. Recently, researchers proposed a hybrid framework optimized ELM by self-adaptive differential harmony search for financial time series prediction [13]. Extensive comparison experiments based on BP and RBF with ELM improved by the global search optimization approach could achieve the superior performance and not easily get trapped in local minima. However, the traditional optimization method in the proposed ELM model could not handle uncertainties and outliers in the complex time series data. More recently, time series prediction has become more accurate because of the deep learning development. In [14], ELM for wind speed time series forecasting optimized by stacked auto-encoders (SAE) is compared with current deep learning models like a deep belief network (DBN) and a restricted Boltzmann machine (RBM). The evaluation results indicated that the proposed ELM model could obtain average accuracy of 93.73% on single datasets and 94.04% on combined datasets. However, ELM with SAE is time consuming. To overcome these drawbacks and improve the stability of ELM network, how to optimize the parameters of ELM for time series prediction is still a challenging problem.
During the last decades, the biological population-based optimization provides robust and simple solution for multimodal and multi-objective optimization problems. The classical biological optimization technique includes early genetic algorithms (GA) based on Darwinian evolution theory of the living beings. GA, as the most popular optimization method, has the obvious advantage of the powerful local search ability. Similar to GA including mutation, crossover and selection operation, the advanced differential evolution (DE) overcomes the drawback of GA easily falling into local optima and performs well in many fields. Compared with GA and DE, the particle swarm optimization (PSO) inspired by bird behavior has obvious advantages of easy implementation and less tuning parameters. For short-term traffic flow forecasting, the latest work in [15] proposed PSO-ELM to improve the accuracy and obtain a competitive performance by comparing with several state-of-the-art methods. However, the drawbacks of PSO algorithm are it is easily trapped in local optima and premature convergence occurs. The latest artificial bee colony (ABC) algorithm is a novel optimization which is good at global search and can be directly combined with a neural network [16]. However, the local research ability of ABC is poorer than other biological optimization algorithms. To improve the GA,the DE, the PSO and the ABC performance, some various combination attempts have been down in hybrid GA, DE, PSO and ABC model. In [17], authors present the performance comparison of DE, PSO and ABC based on benchmark test functions. The experimental result shows that the DE algorithm can obtain a better solution than ABC and DE, and the computation time is minimum for ABC and maximum for PSO based on same iterations.To achieve the short-term traffic flow prediction goal, we propose a novel ABC-DE algorithm to optimize the ELM neural network parameters.
The standard ABC algorithm starts from random solutions to search for better solutions through iteration according to the fitness result, and it has slow convergence speed and is easily prone to premature convergence [18]. Considering the real-time and accurate traffic flow prediction requirement in the big data ITS, we improve the local searching space with new optimal searching strategy [19][20][21] of ABC based on differential evolution (DE) [22] algorithm to optimize the input weights and hidden biases of ELM [22]. To verify the proposed model in the big data ITS, Lozi and Tent chaotic functions [2,24,25] and real measured traffic flow data can be applied to simulate and implement experiments. Compared with ARIMA, KNN, BP, RBF, SVR and standard ELM, the proposed model can obtain a better generalization and higher accuracy and competitive time consumption in the big data ITS.

Proposed network architecture for traffic flow prediction
In this section, we propose a novel neural network architecture for short-term traffic flow prediction in a big data environment based on the ELM and improved ABC, including the theoretical analysis, model confirmation and optimization, respectively.

ELM neural network
ELM proposed by Guangbin Huang [10,16] was originally inspired by biological learning and aimed to overcome these limitation drawbacks faced by conventional machine learning theories and techniques. From the neural network architecture point of view, the output function with L hidden nodes for a standard ELM [16] can be defined as follows: where the activation function f (•) is a nonlinear piecewise continuous function like sigmoid and radial basis. Where ω is the random weights of the input layer which connects the hidden node with ith input vector, and b is the random biases of the hidden layer. For the given n training samples . . , n} , the output value of ELM network can be defined as In formula (1), the initial ω and b value can be randomly assigned by the ELM theory, and the output weights β can be calculated by the least squares solution. If the training of ELM network aims to reach not only the smallest training error, but also the smallest norm of output weights, which means the ELM network should be trained to approximate arbitrary samples with zero error [11], there exist β , ω and b that make y i = t i hold true. Therefore, the compact vector version of ELM function could be expressed as follows: where H ω,b,x is named as the hidden layer output matrix, the output weights β can easily be calculated by the least square solution as follows: where function pinv(·) means to compute the Moore-Penrose pseudo-inverse of the hidden layer output matrix H ω,b,x . It is easy for programming, so the output weights β can be successfully calculated by this function. However, the convergence of the standard ELM is generally slow, because the training of a standard ELM needs a large number of hidden nodes to approach an appropriate result. To overcome this issue, the kernel-based ELM (KELM) is suggested by the authors [16]. To KELM network, the hidden layer output matrix can be presented as follows: where the function k(•) is the ELM activation kernel function which can be but are not limited linear, polynomial, sigmoid, Fourier, hard limit or radial basis function (RBF).
can be chosen to achieve a competitive regression model and good generalization performance. Therefore, with the optimum condition of the Karush-Kuhn-Tucker, the output weights matrix β of KELM can be expressed as follows: where ε is defined as penalty coefficient, the μ ,σ and ε parameters of KELM are similar to the input layer weights and the hidden layer biases of ELM and can be randomly initialized to train the KELM network. The random initialization of parameters can lead to the obvious unstable output of KELM or ELM network, which will reduce the advantage of the ELM or KELM generalization ability. To solve this problem, the self-adaptive optimization algorithm can be introduced to optimize the above parameters of ELM or KELM neural network.

ABC algorithm
An ABC optimization algorithm that imitates the foraging behaviors of honey bee colony is a new intelligent optimization algorithm. In ABC algorithm, there are three types of bee including employed, onlooker and scout bees searching for the best food source. The employed and onlooker bees perform the local search, and the scout bees control the global search [18]. The main steps of the ABC algorithm can be given as follows: Step 1: Initialization stage -Initialize the boundary of positions value including [Min, Max]. -Randomly initialize each searching position X i on the boundary, the initialization function can be defined as follows: where Step 2: Employed bee phase -Employed bees generate new solutions using the following expression: In where f it i is the fitness value of the ith solution X i , and the f i is the cost value of the ith solution X i .
Step 3: Onlooker bee phase -Select a new solution in all the employed bee solutions using the selection probability p i as follows: In the standard ABC algorithm, it is difficult to balance the global search and the local search which will result in the slow convergence speed or the earlier local optimal solution. Therefore, accelerating convergence speed and avoiding local optima are two main research directions for the improved ABC algorithm.

DE strategies
Many DE mutation strategies have been widely described like DE/rand/1, DE/best/1,DE/current-to-best/1, and DE/rand/2. These strategies can be defined as follows: DE/rand/1: V i = X r 1 + F(X r 2 − X r 3 ), DE/best/1: V i = X best + F(X r 1 − X r 2 ), DE/current-to-best/1: where V i is the perturbed vector, F is the control parameter, X best is the global best solution found so far, and X r 1 , X r 2 , X r 3 , X r 4 , and X r 5 are randomly selected from the target solution X i . All mutation strategies have different searching ability. The DE/rand/1 strategy is good at a population diversity but is difficult to obtain an optimal solution. The DE/best/1 strategy can implement a fastest convergence but usually obtain a local optimum. The DE/rand/2 strategy involving two difference vectors may result in better perturbation than other strategies but easily lead to a premature convergence. Considering both a good quality solution and inferior solution, the DE/current-to-best/1 strategy is effective to solve multi-objective optimization problem and can achieve the convergence more quickly. According to the experimental results based on 26 standard benchmark numerical optimization functions for four conventional DE strategies in [19], we directly select the DE/current-to-best/1 strategy to enhance the local searching ability of the original ABC algorithm in the mutation operator phase.

Improved ABC algorithm by novel DE strategy
Comparing with popular evolutionary algorithms such as Genetic algorithm (GA), Particle swarm optimization (PSO), Differential Evolution (DE) and Ant colony optimization (ACO), the classical ABC can outperform to solve some multidimensional optimization problems. However, it also has some obvious drawbacks [20], which the search strategy of classical ABC can only update one element in a vector at each time, which will be good at global exploration and poor at local exploitation. Specifically, the update strategy of original ABC is different from the other representative populationbased algorithms, which will result in ABC hardly taking the advantage of the best solution. However, the best solution information and the convergence performance are so important in solving some real engineering problems. Inspired by the searching strategy of the DE algorithm, we propose an improved ABC algorithm, which modifies the local searching mechanism of ABC in the onlooker bee phase, which is beneficial to exploitation by introducing the best-so-far solution.
The novel co-evolutionary algorithm can combine ABC and DE strategies during the search process, so as to improve the search efficiency. For clarity, the main steps of the improved ABC by novel DE strategy function are described in Algorithm 1.
In the classical ABC algorithm, the onlooker behavior should be calculated by a same equation with employed bees. However, in the improved ABC algorithm, a new equation is proposed by the basic DE/current-to-best/2 mutation strategy which can easily achieve good convergence speed and obtain a better local optimum solution. Therefore, the improved ABC algorithm can also maintain the global search ability of Algorithm 1 Improved ABC algorithm Input: SN(population size), MCN(maximum iterations), LM(maximum searching limit), T rial i (memory trial numbers),

Min(lower boundary), Max(upper boundary) Output: BT(best solution)
Initialisation : 1: Initialize T rial i is zero 2: Generate X i initial solutions by Eq. (7) Repeat Employed bees phase: 3: Generate a new candidate solution V i by Eq.(8) 4: Select a greedy best one between X i and V i 5: Compute the fitness Fit i of each solution by Eq.(9) 6: Calculate the probability P i of each solution by Eq. (10) Onlooker bees phase: 7: Using probability P i to select a solution X i 8: Search the best solutions x best i j among the random neighbours x r 1, j and x r 2, j of x i, j including itself 9: Using DE/current-to-best/2 mutation equation to generate a new can- 10: Select a greedy best one between X i and V i Scout bees phase: 11: if (trail i > L M) then 12: replace X i using new solution by Eq. (7)

Optimizing ELM neural network by improved ABC algorithm
The training of ELM is extremely fast and suitable for big data. For classical ELM, the output weights are calculated by analytical solution instead of the traditional gradient descent algorithm, and this change can obviously improve the speed and generalization ability of the ELM network [10]. The training goal of ELM is to find the optimal of the input weight matrix and hidden bias vector to make the network obtain the minimum error between the observed values and predicted values. Therefore, the input weights and biases and their amount dramatically affect the performance and accuracy of ELM. In the ELM model, the input weights and hidden biases are randomly assigned and unchanged during the training process. Therefore, the random parameters may be a set of non-optimal or unnecessary values, which make ELM network require a large number of hidden nodes to approach appropriate result and response slowly into testing data. Therefore, selecting the optimal input weights and hidden biases can provide more compact ELM network architecture and better generalization performance.
Applying the ABC algorithm to train ELM is relatively straightforward. The main steps of the ABC-ELM model are described as follows: Step 1, the parameters of ABC algorithm are initialized according to the initialization stage of Algorithm 1.
Step 2, the encoded input weights and hidden biases of ELM as the colony population of ABC algorithm is randomly created as follows: where the input weight w and the hidden biases b are randomly assigned in the range [−1, 1], and where L is the number of ELM hidden nodes, i = (1, 2, . . . , S N ).
Step 3, the employed bee phase, the onlooker bee phase and the scout bee phase are executed in sequence according to Algorithm 1. Then the best solution of the ABC algorithm can be obtained after the iterations are met.
Step 4, the ELM network can be trained and tested with the best input weight and hidden bias solution obtained in Step 3.

Metrics for forecasting error
Three criteria are commonly used to evaluate the performance of the traffic flow forecast model. They are the root mean square error (RMSE), the mean absolute error (MAE) and the mean relative error (MRE) and are defined as follows, respectively:

Experiment results
To verify the validity of the optimized ELM by the improved ABC algorithm, Lozi and tent chaotic time series and measured traffic flow time series are applied into the experiments. For the improved ABC algorithm, the colony size should be set as 40 including the number of employed bees and onlooker bees, and the maximum control limit should be set as 100 * 2, and the maximum iterations can be 6 times, and the number of optimized parameters for simulations is 101 * 15,000, and the number of the optimized parameters for measurement is 101 * 8000, which include input weights and hidden biases. Moreover, the lower bound and upper bound for parameters can be set as [−1, 1], and the search radius of the onlooker bees can be set as r 1 = 1, r 2 = 3.

Time series prediction simulation
Lozi time series is a kind of discrete chaotic system which has randomness and ergodicity features [23]. The Lozi equation is given as follows: The parameters used are a = 1.7 and b = 0.5 as suggested in [2] and the initial conditions are X 0 = −0.1 and Y 0 = 0.1. Figure 1 shows the Lozi chaotic generator mapping. Tent time series is a simple chaotic system which has uniform probability and power spectral density feature [24]. Therefore, it suits to simulate the computational processing of the big data. Figure 2 shows the tent chaotic function mapping. The tent mathematical model is defined as follows: In the experiment, the first 15000 samples of the time series (75%) are selected as the training dataset and the remaining 5000 samples (25%) are taken as the testing dataset.
The SR29-S-402863 dataset is collected from the Caltrans Performance Measurement System (PeMS) database [26]. In this paper, the traffic flow data collected on the weekdays of the first 7 months of the year 2017 are used for the experiments. The 8000 samples for the first 6 months are chosen as the training set, and the remaining 1 month's 1900 data are selected as the testing dataset. Table 1 shows the prediction errors and run time of four kinds of models compared with the Lozi system, the tent system, and the SR29-S-402863 dataset. According to the comparison, the parametric ARIMA and KNN model can obtain highest accuracy, but their disadvantages are highest time cost which is not suitable for the modern ITS. The traditional neural network model such as the BP, RBF, and SVR present poor performance. Therefore, the stable accuracy and lowest time consumption of the ELM and improved ELM model should be more suitable for modern ITS big traffic flow real-time prediction problem.

Traffic flow prediction by improved ABC-ELM of freeway SR29-S-402863
The proposed ELM neural network model trained by the improved ABC algorithm is applied to the SR29-S-402863 dataset. The hidden nodes of ELM can be set as 100. Figure 3 presents the output of the proposed improved ABC-ELM model for the traffic flow prediction. The improved ABC-ELM comparing ELM results can be recorded in Fig. 4. It can be clearly seen the RMSE with the optimal results is more stable and better than that with ELM. Figure 5 shows the Abs error of the improved ABC-ELM predicting results, and Fig. 6 shows the relative error of the improved ABC-ELM predicting results.

Discussion of the results
The prediction error due to variance shows the variability of a model prediction between the training dataset and the testing dataset. In our experimental process, all models are iterated multiple times to decrease the variance error. The prediction error due to bias means the difference between the true value which the models are trying to predict and the expected value. After the successful iteration of all models, the testing results on Lozi, tent and SR29-S-402863 dataset are recoded in Table 1. Clearly, the RMSE and MAE of the improved ABC-ELM model on Lozi, tent and SR29-S-402863 dataset are better than other models. For the MRE, the NaN results are derived the zero value in SR29-S-402863 testing dataset. We also compare the proposed improved algorithm with the ABC and DE to optimize ELM respectively. The predictive errors on SR29-S-402863 dataset is shown in Table 2. RMSE and MAE for the ABC-DE-ELM model indicate a good performance. Thus, the proposed improved ABC-ELM model can obtain higher accuracy than other comparing models. Of course, the time cost of the improved ABC-ELM model are higher than the standard ELM model and the standard ABC-ELM model because of the global searching function. However, the comparison results show that our proposed improved ABC-ELM model can obtain an optimal balance between high accuracy and low cost consumption, which are suitable for the traffic flow prediction in the big data era.

Conclusion
In this paper, we propose a novel and effective prediction model for traffic flow forecasting. This study adopts to improve the ABC algorithm to optimize the input weights and hidden biases of the ELM neural network and simulates the prediction of two typical chaotic time series of big data comparing the accuracy with ARIMA model, KNN model, BP model, RBF model, and SVR model. With the application of this proposed model to predict real traffic flow measurement systems, the experimental results indicate that the proposed method has higher prediction accuracy and has more stability results. Therefore, we believe that the improved model can have good prospects in real time traffic flow prediction for the big data environment. The future work will focus on the optimal kernel parameters of KELM network to achieve better prediction results.