Neural Networks with Improved Extreme Learning Machine for Demand Prediction of Bike-sharing

Accurate demand prediction of bike-sharing is an important prerequisite to reducing the cost of scheduling and improving the user satisfaction. However, it is a challenging issue due to stochasticity and non-linearity in bike-sharing systems. In this paper, a model called pseudo-double hidden layer feedforward neural networks is proposed to approximately predict actual demands of bike-sharing. Specifically, to overcome limitations in traditional back-propagation learning process, an algorithm, an extreme learning machine with improved particle swarm optimization, is designed to construct learning rules in neural networks. The performance is verified by comparing with other learning algorithms on the dataset of Streeter Dr bike-sharing station in Chicago.


Introduction
With the development of sharing economy, bike-sharing systems have rapidly emerged in major cities all over the world. Bike-sharing can be described as a short-term bike rental service for inner-city transportation providing bikes at unattended stations. It has become one of the most important low-carbon travel ways. Compared with traditional rental service, bike-sharing will not be limited by the boxes at bike-stations. It provides convenient services, but generates complicated problems. For instance, the layout of bikesharing stations is flexible and the capacities of stations is not fixed, leading to big fluctuate demands for stations. Some new characteristics are exhibited, such as the uneven distribution of user demands in time and space.
Accurate demand prediction of bike-sharing can effectively improve user experience and enhance brand competence, which was elaborated in [2]. El-Assi et al. [3] and Part of this work [1] has been submitted to 10th EAI International Conference on Mobile Networks and Management (EAI MONAMI 2020).
Xun Shao x-shao@ieee.org Extended author information available on the last page of the article.
Ermagun et al. [4] elaborated the main problems and difficulties in the demand prediction of bike-sharing. Solutions can be mainly divided into two types. The traditional one is based on statistical analysis. Yang et al. [5] proposed a semiparametric geographically weighted regression method to estimate a bike-sharing trip using location-based social network data. [6] proposed a method combining bootstrapping and subset selection that utilized partially useful information in each bike-sharing station. It can solve problems in which data cleaning approaches failed due to the lack of original data.
The other method is based on artificial neural networks. Yang et al. [7] proposed convolutional neural networks to predict daily demands of bike-sharing at both city and station levels. Lin et al. [8] proposed graph convolutional neural networks with data-driven graph filter model. The heterogeneities of demands among different bike-sharing stations were also discussed. Xu et al. [9] developed a dynamic demand prediction model based on a deep learning approach with large-scale datasets. The comparison results suggested that prediction accuracy of long-short term memory neural networks was better than statistical models and advanced machine learning methods. Chang et al. [10] developed a prediction framework integrating artificial immune system and neural networks. The performance is verified by comparing with other models. Feng et al. [11] discussed the Markov chain population model to predict bike demands among different travel stations. Kim [12] studied the influence of weather conditions and time characteristics on demands of bike-sharing. Furthermore, deep learning methods and comprehensive methods with heuristic algorithms were applied in various engineering projects [13][14][15][16], but rarely applied for the demand prediction of bike-sharing.
In addition, those methods have some limitations. Increasing the number of hidden layers is a feasible approach to achieve the certain prediction accuracy, which may lead to overfitting phenomena and reduce the generalization performance of prediction models. In addition, accelerating gradient descent can improve the convergence rate, while it leads to an unstable generalization. Considering these limitations, a novel neural network model is proposed, which is called pseudo-double hidden layer feedforward neural networks. In this paper, an algorithm, extreme learning machine with improved particle swarm optimization, is proposed to tune weights and biases in neural network to improve prediction accuracy. Finally, experiments are performed on the dataset of Streeter Dr bike-sharing station in Chicago to verify the effectiveness of the model proposed.
2 Pseudo-double hidden layer feedforward neural networks

Network structure
Pseudo-double hidden layer feedforward neural networks (PDLFNs) are biologically inspired computational models, which consist of processing elements (or neurons) and connections between them with coefficients. The structure of PDLFNs is different from the single hidden layer feedforward neural networks (SLFNs) and the double hidden layer feedforward neural networks (DLFNs). As shown in Fig. 1, it contains one input layer, pseudo-double hidden layers, and one output layer. The hidden layers consist of layer V and layer B. In SLFNs and DLFNs, the hidden layer is the collection of neurons with activation functions as well as provides one or two intermediate layers between the input layer and the output layer. While in PDLFNs, layer V is a special hidden layer. There is only one neuron with a smooth function in this layer. Thus, PDLFNs can directly process original sample data to produce the final results. The design consideration of V layer mainly comes from the following two results. In the first place, multiple hidden layers can reach high prediction accuracy even by setting fewer neurons in each hidden layer [17]. In addition, there is some noise disturbance in sample data, which can be reduced by smooth processes [18].
Without loss of generality, assume that the numbers of neurons in the input layer, layer B and output layer are I , J and K, respectively. There are N samples inputted into PDLFNs, commonly in the shape of means of multivariate time series. Each sample contains I -dimensional data. Mathematically, the n-th (1 ≤ n ≤ N) sample is represented by the vector is the data presented to the i-th neuron in the input layer. The corresponding weight vector of the input layer to layer V is specifies the influence between the i-th neuron in the input layer and the neuron in layer V.
Compared with DLFNs, the neuron of layer V in PDLFNs is no longer with a bias value and an activation function, but with a smooth function. The smooth function, denoted as S V X (n) , W (IV) , can take corresponding forms according to characteristics of sample datasets, such as moving average functions, exponential smoothing functions, autoregressive functions, and adaptive filtering functions.
The rest part of PDLFNs is similar to traditional feedforward neural networks, where each neuron is assigned a bias and the output of each neuron is an activation function. The summation in each neuron includes a bias for lowering or raising its input to the activation function such as linear function, sigmoid function and hard limit function. It is worth mentioning that the activation functions for the neurons in the same layer are always the same. The weight vector between layer V and layer B is denoted as is the weight of the connection between the neuron in layer V and the j -th neuron in layer B. The bias for the j -th neuron in layer B is denoted as b (B) j . The output of the j -th neuron in layer B is denoted as B j X (n) , W (VB) in Eq. 2, where A B (x) is the activation function for the neurons in layer B.
Finally, the outputs of PDLFNs can be represented by the vector in Eq.

Learning algorithm
The traditional learning algorithms in the feedforward neural networks are mainly based on the gradient descent methods. The back-propagation (BP) learning algorithm is a representative one, where gradients can be computed efficiently by propagation from the output to the input. It is one of the most successful and widely popular learning algorithms for training neural networks in recent years. However, several limitations arise, such as not easy to determine network structures and learning rates, unstable convergence results, and time-consuming learning.
To resolve the issues above related with gradient-based algorithms, Guangbin Huang [19] proposed an efficient learning algorithm called extreme learning machine (ELM) for feedforward neural networks, especially for SLFNs. ELM is equipped with several salient features different from traditional popular gradient-based learning algorithms. ELM avoids some difficulties in gradient-based learning algorithms, such as to determine stopping criteria, learning rates and the number of learning epochs, and to find local minima. However, it is also found that ELM tends to require more hidden neurons than traditional gradientbased learning algorithms as well as result in ill-condition problems due to random determination of the input weights and hidden biases. Considering these limitations of ELM, an improved particle swarm optimization (IPSO) is proposed to optimize input weights and hidden biases.

Improved particle swarm optimization
Particle swarm optimization (PSO) is one of the most representative meta-heuristic optimization algorithms. It mimics the social behavior of organisms, such as birds in a flock or fish in a school, which grants them surviving advantages. Considering a swarm with M particles in a Ddimensional search space, there is a position vector Z t m = (z t m1 , z t m2 , ..., z t mD ) T (1 ≤ m ≤ M) and a velocity vector V t m = v t m1 , v t m2 , ..., v t mD T for the m-th particle after the t-th iteration. Z 0 m and V 0 m are the initial position and velocity of the m-th particle, respectively. The best position of the m-th particle is denoted as P BEST m = (pbest m1 , pbest m2 , ..., pbest mD ) T , and the best position of all particles as GBEST = (gbest 1 , gbest 2 , ..., gbest D ) T . In classical version of PSO, the position and velocity vectors are updated according to Eqs. 6 and 7. v t+1 md = wv t md +c 1 r t 1 pbest md −z t md +c 2 r t 2 gbest d −z t md , (6) where 1 ≤ d ≤ D; w, c 1 , c 2 , r t 1 and r t 2 are respectively inertia weight constant, two acceleration constants with positive values, and two uniform random parameters within [0, 1]. The final solution of PSO is sensitive to these control parameters.
The focus of the improved approaches has revolved around adapting the inertia weight w. It is important for balancing the global search, also known as exploration, and local search, known as exploitation.
In order to make the algorithm converge to the global optimal values more quickly and effectively, a comprehensively improved method is proposed. It is an adaptive PSO algorithm combined with the compression factor. The velocity vectors are updated according to Eq. 5, where λ = 2/ 2 − β − √ β (β − 4) is the compression factor given β = c 1 + c 2 . T max , w max and w min are the maximum iteration number, the initial inertial weight and the final inertial weight, respectively. We can ensure that this algorithm is equipped with a strong global search ability in the early iterative stage by the adaptive inertia weight in the first part in Eq. 5. Correspondingly, we can also ensure that this algorithm is equipped with a local sophisticated search ability by the compression factor in the second part.

Improved extreme learning machine
In this paper, ELM combining with IPSO (IPSO-ELM) is proposed as the learning algorithm of PDLFNs in Fig. 2

Prediction periods
Self-regulating abilities in bike-sharing systems can generally meet users' rental demands during flat hump periods and low peak periods. However, it does not hold for peak periods. Meanwhile, the users' rental during peak periods is one of the main factors influencing scheduling schemes. Thus, we only discuss the demand prediction problem during peak periods in this paper.

Demand Prediction Model
The demand prediction model of bike-sharing is shown in Fig. 3. In this model, the numbers of neurons in the input layer, layer B and output layer are 3, 29 and 1, respectively. The bike rental demands from the (n − 3)-th day to the (n − 1)-th day are inputted into the input layer. Weighted moving average function (WMA) in Eq. 8 is applied as the smooth function for neurons in layer V.
Hyperbolic tangent sigmoid function in Eq. 9 and linear function in Eq. 10 are applied as the activation functions for neurons in layer B and the output layer, respectively.
There is only one neuron in the output layer. Thus, W (BO) , the weight matrix of layer B to the output layer, is degenerated to the weight vector in this model. O X (n) , W (BO) , from the neuron in the output layer of PDLFNs, is the prediction demand in the n-th day.

Evaluation criteria
In order to verify the effectiveness of the proposal, mean square error MSE and square correlation coefficient R 2 , are selected as the evaluation criteria.
where p n , r n , p and r are predicted bike rental demands in the n-th day, recorded demands in the n-th day, average demands throughout the whole period, and recorded demands throughout the whole period, respectively.

Load and prepare data
The dataset in this paper is from the official website of Streeter Dr bike-sharing station in Chicago. We split the data into two sets, one for training and the other for testing. The recorded bike rental demands during 92 days from March 1st, 2017, to May 31st, 2017 are aggregated as the training set. And the recorded demands during the next 30 days are aggregated as the test set. According to the regularities in the dataset, we can find that the peak periods of renting bikes are always in the same interval. Taking March for instance, the number of rental bikes is shown in Fig. 4. The peak periods of renting bikes are mainly distributed from 13 p.m. to 17 p.m.

Prediction results
To verify the performance of IPSO, MSEs before and after the improvement are shown Fig. 5. The global search ability of particle swarm is effectively improved by optimizing the particles' velocities, by adjusting inertia weight in the early iterative stage and adding a compression factor in the later

Comparison and discussion
In order to verify the effectiveness of the prediction model in this paper, comparative experiments are conducted, considering 3 different network structures (SLFNs, DLFNs and PDLFNs) with 3 different learning algorithms (ELM, PSO-ELM and IPSO-ELM). The evaluation criteria are MSE and R 2 proposed in the preceding section. The experiment results of 9 prediction models are shown in Table 1.
The longitudinal comparison is among the same network structure with different learning algorithms. MSE and R 2 in IPSO-ELM are always the minimum and maximum in LSFNs, DLFNs and PDLFNs. It suggests that IPSO-ELM is the best learning algorithm among these 3 algorithms. The neural networks with IPSO-ELM can obtain the most accurate prediction results. The horizontal comparison among different network structures with the same learning algorithm shows that PDLFNs can obtain the most accurate prediction results. From the perspective of MSE and R 2 , PDLFNs with IPSO-ELM is the best prediction model among 9 models, and the prediction accuracy improved by changing the network structure is greater than changing the learning algorithm. The effectiveness in other experiment results is shown in Figs. 7 and 8.

Conclusion
Aiming at predicting the demands of bike-sharing, this paper constructs the PDLFNs model consisting of "input layer -V layer -B layer -output layer"and improves ELM combining with IPSO as learning algorithm. This model has two advantages verified by the comparative experiments on predicting the demands of Streeter Dr bike-sharing station. In the first place, a simple network structure is equipped with stable generalization and high accuracy. In addition, improved ELM is an effective learning algorithm of feedforward neural networks besides SLFNs, which In this paper, we have predicted the demands of bikesharing during peak periods, which are main factors influencing scheduling schemes. Nevertheless, optimal scheduling schemes are still influenced by the demands during flat hump periods and low peak periods. We will consider this issue in the future and predict 24-hour demands in the bike-sharing system. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.