A system for electric vehicle’s energy-aware routing in a transportation network through real-time prediction of energy consumption

To tackle the problem of range anxiety of a driver of an electric vehicle (EV), it is necessary to accurately estimate the power/energy consumption of EVs in real time, so that drivers can get real-time information about the vehicle’s remaining range. In addition, it can be used for energy-aware routing, i.e., the driver can be provided with information that on which route less energy consumption will take place. In this paper, an integrated system has been proposed which can provide reliable and real-time estimate of the energy consumption for an EV. The approach uses Deep Auto-Encoders (DAE), cross-connected using latent space mapping, which consider historical traffic speed to predict the traffic speed at multiple time steps in future. The predicted traffic speed is used to calculate the future vehicle speed. The vehicle speed, acceleration along with wind speed, road elevation, temperature, battery’s SOC, and auxiliary loads are used as input to a multi-channel Convolutional Neural Network (CNN) to predict the energy consumption. The prediction is further fine-tuned using a Bagged Decision Tree (BDT). Unlike other existing techniques, the proposed system can be easily generalized for other vehicles as it is independent of internal vehicle parameters. Comparison with other benchmark techniques shows that the proposed system performs better and has a least mean absolute percentage error of 1.57%.


Introduction
Electric Vehicles (EVs) are emerging as alternative to fossil fuel vehicles and automobile industry is finding them as a solution to number of problems like environmental pollution and increasing petrol/diesel prices due to limited supply, etc. Although the popularity of EVs is increasing at a rapid rate, but there are some barriers also for their wide spread adoption. Major factors, which affect the consumer perception about EVs and hence their market penetration, have been analyzed by different researchers [1][2][3][4][5]. From these studies, some of the major factors found are range anxiety due to limited driving range, high purchasing price, long time to recharge the battery, and limited availability of public charging infrastructure. Out of these, the limited driving range B Shatrughan Modi shatrughanmodi@gmail.com 1 Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, India increase anxiety in the driver that whether he can reach a particular destination or not with current charge is the major contributor for non-adoption of EVs. People feel more comfortable in purchasing plug-in hybrid vehicles in comparison to battery EVs. Although with improvements in battery technology [6,7], consumers have gained some confidence, but accurate and real-time prediction of energy usage can provide a great boost to consumer's confidence. Therefore, in this work, the focus is on developing a system that can provide reliable real-time energy consumption prediction for different routes to the destination, so that the driver can choose best possible route.
In the literature, researchers have proposed a number of methodologies to estimate the energy consumption of EVs. These methodologies can be divided into three categories, namely, macroscopic, mesoscopic, and microscopic, based on the level of granularity at which they can be used for different applications. The techniques, which can provide total energy consumption estimate for the whole trip that can include multiple connecting roads, are categorized as macroscopic techniques [8][9][10][11][12][13]. The mesoscopic techniques [14][15][16][17] can assign energy consumption as a cost to each road in a road network, and then, optimal driving routes to the destination can be found. To get the second-by-second energy consumption estimates, the microscopic techniques [18][19][20][21][22] can be used as they provide more detailed output. The output from microscopic techniques can be aggregated to assign the energy consumption to each road, and thus, they can be used in replacement to mesoscopic techniques. In addition, further aggregation of energy consumption of connecting roads can be used to calculate the total energy consumption for whole trip, which thus can fulfill the purpose of macroscopic techniques. The second-by-second energy consumption output from microscopic techniques can also be used to guide the EV driver's with instructions like to maintain a particular speed and to select a particular route, etc., based on remaining charge in battery. Hence, these techniques can help in boosting the confidence in EV drivers. Therefore, in this work, a system has been developed which can provide second-bysecond energy consumption prediction and can be used to guide the EV driver for energy aware route selection.
Different techniques have been employed by the researchers to develop the methodologies for estimating energy consumption of EVs. Most of the researchers mainly used simulation models [23][24][25][26] or regression-based techniques. The regression-based techniques include techniques using linear [13,20,27], polynomial [8,12,14,15,19] or logarithmic [11] regression. The simulation models have a major drawback that to simulate a vehicle, internal vehicle specific parameters are required and these internal parameters are not freely available from manufacturer. In addition, the simulation models cannot be generalized, because they get calibrated to a particular vehicle. The regression-based techniques work on real-world data, but as real-world data are mostly taken from different sensors, so it contains noise in it. This makes regression models to give erroneous results as the regression-based techniques are noise sensitive [28,29]. Other than simulation or regression-based techniques, some techniques were developed using Neural Network (NN) [9,10], Convolutional Neural Network (CNN) [22], and Neuro Fuzzy [30,31]. Although, these techniques show better performance as compared to other regression-based techniques with noisy data, but are not suitable for real-world application. The NN models proposed in [9,10] are macroscopic models, i.e., they gave single output for total energy consumption at the end of the trip. Hence, using these, realtime guidance to the driver about alternative route selection or the optimal speed to maintain, etc. is not possible. Also, these models considered limited number of factors, i.e., speed, jerk, acceleration, and some parameters related to road. Similarly, the CNN-based energy consumption estimation model developed in [22], considered only three parameters, namely, road elevation/grade, tractive effort, and vehicle's speed as input. This limits the application of these approaches in real world.
In recent years, some of the researchers have also combined multiple techniques to develop hybrid models for energy consumption prediction of EVs. For instance, a machine learning-based hybrid system using Markov Monte Carlo method and XGBoost [32] was developed for predicting future energy consumption. The developed system considers historical driving and environment data for predicting the energy consumption. An ensembled learning approach [33], combining decision trees, random forest, and k-nearest neighbor, was developed for EVs' energy consumption prediction. The developed system provide prediction results considering number of factors like average trip speed, trip distance, air conditioning, road gradient, etc. Another hybrid model was proposed in [34] for energy consumption estimation, using Recurrent Neural Network (RNN) with attention mechanism and deep neural network. The model takes traffic information and vehicle state as input for generating the prediction results. A combination of residual neural network and Bi-LSTM (Bidirectional Long Short-Term Memory) was proposed in [35] for energy consumption prediction. Another hybrid system [36] using the combination of Deep CNN, LSTM, Bi-LSTM, and ANN was developed for predicting energy consumption of EVs. Similar to [9,10], the hybrid systems [32,33,36] are macroscopic models and cannot be used for real-time driver guidance. Also, although the hybrid models discussed so far have provided encouraging results, but these can be improved further by taking into consideration number of other factors, because these models have considered only a subset of factors for prediction. For instance, the hybrid system proposed in [34,36] ignored environment-related factors. Similarly, [35] ignored the effect of auxiliary loads like air conditioning or heater usage. Number of different studies [37,38] concluded that energy consumption of an EV gets effected by number of factors, such as traffic, road elevation, auxiliary loads, wind speed, wind direction, environment temperature, and initial battery's state of charge, etc. Hence, for reliable energy consumption estimation, it is necessary that the influence of all these factors be considered.
To fill the gaps discussed above, a microscopic system has been developed which can provide reliable real-time energy consumption estimates based on different influencing factors. The proposed approach uses Deep Auto-Encoders (DAE), cross-connected using latent feature space mapping to predict the traffic speed at multiple time steps in future. For this, it uses the historical traffic speed from number of neighboring roads. Then, a multi-channel Convolutional Neural Network (CNN) has been used to predict the energy consumption based on calculated vehicle speed from predicted traffic speed, road elevation, wind speed, and battery's initial state of charge, etc. Then, the estimates given by multichannel CNN are fine-tuned using a Bagged Decision Tree (BDT) by further reducing the prediction error. As mentioned previously, one of the main problems is to calibrate a simulation model internal parameters of a vehicle is required. As these parameters are not shared by the manufacturers freely, it makes the calibration process very difficult. The proposed system uses six external parameters, namely, historical traffic information, wind speed, temperature, road elevation, initial battery's SOC, and auxiliary loads, as input, and provide reliable energy consumption estimates. All these input parameters are easy to collect; for instance, historical traffic information is provided by different government agencies for example traffic information for California can be obtained from web-portal of California Department of Transportation Performance Measurement System (PeMS) [39]. Similarly, elevation data for different roads can be obtained using Geographic Information System (GIS), and wind speed and temperature data can be obtained using openly available weather API's for example, OpenWeatherMap API's [40]. Another problem discussed above is the noise sensitivity of regression techniques. It is well-known fact that NN-based techniques can handle the noisy data very effectively, and as the proposed system uses CNN and DAE as its modules, the proposed system can work efficiently even with noisy data. Also, unlike other NN-based approaches [9,10], the proposed system provides real-time energy consumption as output, and hence, it can be used to provide real-time instructions to the driver for alternative routes to the destination with less energy consumption or for maintaining a particular speed, so that optimal energy consumption takes place. Following points highlight the major contributions of the current work: i. A system has been developed which can provide reliable and real-time energy consumption estimates and hence can also be used for guiding the driver in real time. ii. The proposed approach provides the results considering the influence of all major factors, namely, vehicle acceleration, traffic, temperature, road elevation, auxiliary loads, wind speed, and battery's SOC. iii. The proposed system is easy to generalize to any vehicle, because it works independent of internal vehicle parameters, which manufacturers do not share publicly. iv. The multi-channel CNN and cross-connected DAE make the approach work efficiently even with noisy data, which is generally the scenario in real world. Also, it makes the approach effectively learn the non-linear relationship between the input parameters.
Rest of the paper has been divided into number of sections. First, the details about the real-world dataset used for this study have been discussed in the section "Datasets". In the section "Analysis of factors influencing energy consumption", various factors that can influence the energy consumption of an EV are analyzed. The architecture of the proposed microscopic system has been discussed in detail in the section "Proposed system". The details about the implementation are presented in the section "Experimental setup". The comparison results of proposed system with other state-of-the-art techniques has been provided in the section "Results and discussion". Then, "Conclusion" concludes the current work.

Datasets
In this study, three different datasets are used. First and second dataset contains traffic data from multiple loop detectors, located on the highways of Bay Area and Los Angeles city of California state. Third dataset contains data generated from the simulated model of Nissan Leaf, provided in FASTSim (Future Automotive Systems Technology Simulator) [41] which is a very popular simulation tool. The following paragraphs discuss the details of these datasets.
Traffic data for city of Bay Area and Los Angeles were obtained from a reliable data source, web-portal of California Department of Transportation Performance Measurement System (PeMS) [39]. It provides real-time traffic data (like traffic speed, traffic flow and occupancy rate, etc.) from number of loop detector sensors. These sensors are mainly located on different lanes on the freeways of California state. Figure 1a shows the location of 660 different sensors on the map of Los Angeles, for which the traffic data were used in this study. 382 sensors, shown with green marker, out of these 660 sensors were selected that cover five freeways, namely, I110, I5, I405, I10, and US101. Two months (from 1st June, 2017 to 31st July, 2017) of traffic data were used for training and testing the proposed system. Similarly, Fig. 1b shows the 325 sensors on the map of Bay Area. These sensors are installed on eight different highways, namely, SR237, I880, US101, I280, SR85, SR17, SR87, and I680. Traffic data of 6 months (from 1st Jan, 2017 to 30th June, 2017) were collected from these sensors of Bay Area and used in this study for training and testing the proposed system. Furthermore, as there is very less traffic in night, so in this study, traffic data only from 7:00 a.m. to 10:00 p.m. have been used. As traffic data are used from two different location, i.e., Los Angeles and Bay Area; henceforth, these datasets are denoted as PEMS-Los and PEMS-Bay, respectively.
A simulated model of Nissan Leaf given in simulation tool named FASTSim was used to obtain a large amount of data for EV's energy consumption. The simulated model of Nissan Leaf in FASTSim is based on the vehicle-specific parameters that can be found in [22,42,43]. Similar to this, the other simulated models can be used to obtain dataset for other EVs. This depends on the availability of vehicle-specific parameters from manufacturer like internal resistance of battery and efficiency curve of motor, etc. In this study, the dataset was generated by varying number of different fac-tors like road elevation (from − 20 to 20%), environmental temperature (from −5 • C to 35 • C with interval of 10 • C), 40 different drive cycles (like New European Driving Cycle (NEDC) and Highway Fuel Economy Test (HWFET) etc.), initial state of charge of vehicle's battery (from 30 to 90% with interval of 20%), 4 different auxiliary load profiles, and 8 different profiles of wind speed. These eight wind speed profiles correspond to the first eight categories of wind speed (out of 13) according to the Beaufort scale [44]. Only first eight categories were used as last five categories do not provide the preferable conditions for driving. In this dataset, the data are recorded at a frequency of 10 Hz. A number of small partition each of 10 s interval were obtained by partitioning the dataset, for conducting the experiments. Therefore, experiments were conducted with approximately 17 lac partitions. It is to be noted that one can generate more data by changing the different factors like environmental temperature, road elevation, and drive cycles, etc.

Analysis of factors influencing energy consumption
There are multiple factors (like vehicle speed, acceleration, road elevation/grade, battery's initial SOC, auxiliary loads, wind and environmental temperature, etc.) that influence the energy consumption of EVs. Following subsections analyze the influence of these factors on energy consumption.

Vehicle-related factors
The factors related to vehicle include vehicle speed, acceleration, and auxiliary loads. The vehicle speed and acceleration mainly depend on driver behaviour and traffic conditions. Figure 2 shows the effect of change in speed/acceleration on the power/energy consumption of an EV. In the figure, the vehicle speed represents the speed profile of Highway Fuel Economy Test (HWFET) drive cycle. It can be observed that power consumption changes with change in acceleration. Whenever there is decrease in acceleration, the power consumption also decreases and vice versa. Also, it can be seen that area under the power curve represents the energy. The energy has been categorized into energy consumed and energy regenerated. From the graph, it can be concluded that whenever there is negative acceleration, like when driver applies brake, instead of consuming the energy, the EV can generate the energy (shown in green color) and store it back to the battery. This makes EVs more efficient than fossil fuel vehicles in terms of energy consumption in urban areas with more traffic.
Similar to speed and acceleration, the auxiliaries, such as AC/heater and lights, etc., can influence the energy consumption during the trip. To analyze the effects of auxiliaries, comparison of energy consumption for HWFET drive cycle with three different auxiliary loads is shown in Fig. 3. The three different auxiliary loads used for comparison are given in Table 1. It can be seen from the figure and the table that energy consumption has a linear relation with auxiliary loads, i.e., if more auxiliaries are used, more energy will get consumed. For instance, energy consumption increases by approximately 1 MJ from auxiliary load profile 1 to auxiliary load profile 3 with same drive cycle of HWFET. Therefore, a increase of approximately 12-13% in total energy consumption during the trip has happened, which can be quite critical for deciding whether the destination can be reached or not. Therefore, it is important to consider the usage of auxiliary loads along with speed/acceleration for estimating the energy consumption.

Driving condition-related factors
Another set of factors that influence energy consumption of EVs fall under the category of driving conditions, which include road grade, wind speed, and environmental temperature. Figure 4 show the effect of road grade on energy consumption of EV for the HWFET drive cycle. In the figure, the energy consumptions results for four different road grade profiles are shown by keeping drive cycle, i.e., HWFET and other factors like auxiliary loads of none, environmental temperature of 25 • C as constant. It can be observed that even though the drive cycle and other factors are same, still the energy consumption is quite different for different road grades. From the figure, it can be observed that whenever the road grade is negative (declining road), the energy consumption starts decreasing. This happens due to regenerative mechanism where energy is regenerated and stored back to the battery. Due to this regenerative mechanism, it can be observed that instead of consuming the energy for the trip, approximately − 3 MJ of energy has been stored back to the battery in Fig. 4c. Similarly, Fig. 4a show that the total energy consumed with grade profile 4 is approximately 3.5 MJ less than energy consumed with grade profile 1. This excess energy can further be used for covering more distance or operating the auxiliaries.
Wind speed can also influence the energy consumption of an EV during a trip. To analyze the extent of its influence, energy consumption for HWFET drive cycle at 0% road grade, 25 • C temperature and no auxiliaries has been compared with four different wind speeds in Fig. 5. According to Beaufort scale, the four different wind speeds used for comparison are Calm (< 0.5 m/s), Light Breeze (from 1.6 to 3.3 m/s), Moderate Breeze (from 5.5 to 7.9 m/s), and Strong Breeze (from 10.8 to 13.8 m/s). Here, it is to be noted that in this study, direction of wind is assumed to be opposite to the vehicle's travelling direction. From the figure, it can be observed that the wind speed has linear relation with   Other than road grade and wind speed, environmental temperature is also an important factor that can influence the energy usage of an EV during a trip. The environmental temperature mainly affects the battery's internal resistance which in turn affects the capability of the battery to supply power to the vehicle. The temperature has inverse relationship with battery's internal resistance, i.e., the battery's internal resistance increase with decrease in temperature and vice versa. With increase in internal resistance, the capacity of the battery to supply power to the vehicle reduces. This may cause the vehicle to fail to achieve the desired acceleration or to move forward on road with high road inclination, especially when battery's SOC and temperature are low. Figure 6 shows the effect of temperature along with initial battery's SOC on energy consumption during the trip with HWFET drive cycle at 0% road grade, calm breeze, and no auxiliaries. The energy consumption is shown as different shades of gray from black to white, where darker shades represent low energy consumption. From the figure, it can be observed that when temperature and initial battery's SOC is low like temperature of −5 • C and 30% of initial SOC, the battery supplies less energy to the vehicle and hence less energy consumption. This does not mean that the vehicle was able to cover the trip with less energy, because in such cases, the vehicle stops even before completing the trip. It can be observed that the energy consumption increases with increase in temperature and battery's SOC, because the battery's capability to provide power to the vehicle improves to achieve the required acceleration and complete the trip.
In brief, in the above discussion, the individual influence of different factors like road grade, vehicle speed, vehicle acceleration, environmental temperature, initial battery's SOC, wind speed, and auxiliary loads has been analyzed by keeping others as constant. In real world, this is not the case, as all these factors effect the energy consumption simultaneously during a trip. This makes it a necessity to take into account the influence of all these factors for accurately estimating the energy consumption for a trip. ig. 6 Effect on energy consumption with different temperatures and initial SOC for HWFET drive cycle

Proposed system
A system using deep learning has been developed for accurately predicting the energy consumption of EVs. The proposed system can be utilized to predict the energy consumption on different routes to a destination, and then, the driver can select the best possible route or some routing algorithm can be used for route selection. As discussed in the previous section, to provide accurate estimates the system takes into consideration, the effect of number of factors like traffic, road elevation, wind speed, environmental temperature, etc. Fig. 7, shows the overall framework of the proposed system. There are mainly seven computational modules in the system. In brief, the data pre-processing module preprocesses the raw traffic data by cleaning, clustering, and formatting the data, and then, the traffic prediction module predicts the future traffic speed. The predicted future traffic speed is then used to predict the vehicle's future speed profile on a particular route. Then, to predict the power consumption, the power consumption estimation module takes seven inputs of vehicle acceleration, speed and road elevation for a route, etc. The re-sampler module is used to increase or decrease the sampling frequency of different parameters, so that all the parameters have same frequency before finetuner module uses them as input. The fine-tuner module takes estimated power consumption along with re-sampled seven input parameters of power consumption estimation module and further improve the estimates for power consumption. At last, the state of charge remaining in the vehicle's battery is calculated by the state of charge calculator module using estimated power consumption given by fine-tuner module. Following subsections discuss the working of each of these seven modules in detail.

Data pre-processing
This module is responsible for pre-processing the raw traffic data collected from different loop detector sensors. As explained in the section "Datasets", the sensors provide realtime traffic data like traffic flow, occupancy rate and traffic speed, etc. Historical speed data from number of sensors have been used for experiments, in this work. Following subsections explain in detail the process of pre-processing raw traffic data.

Data cleaning
There can be number of anomalies, like outliers due to noise and missing values due to sensor failures, etc., present in traffic data collected from different loop detector sensors. Therefore, it is necessary to clean the data before using it for experiments. Therefore, in this study, data were cleaned beforehand using different data cleaning methods. For instance, it was found that there are some loop detectors for which there are large amount of missing values in the dataset. Hence, such type of sensors were omitted from the dataset, for which the number of missing values crosses a particular limit. For sensors other than these, linear interpolation was used to fill the missing values.

Neighboring sensors' selection
To predict traffic of a road, it is necessary to consider the traffic from its adjacent roads also. This effect is known as the traffic's spatial dependency. Furthermore, this spatial dependency can vary with respect to the time; for instance, one road can contribute more traffic to a particular road in the morning and in evening other road can contribute more. This is known as temporal dependency. Thus, it is necessary that traffic data are clustered according to both spatial and temporal dependencies. To achieve this, an algorithm (see Algorithm 1) has been developed to find the neighboring sensors of a particular sensor based on existing spatial-temporal dependency. There are multiple input parameters to the algorithm, namely, (i) a list of all n sensors S, (ii) a τ × n traffic speed matrix T S , where τ represents the total number of time instants, (iii) a n × n distance matrix D, where element d i, j ∈ D is the shortest distance through road from sensor i to j, (iv) time instant t 0 , and (v) target sensor p for which the algorithm has to identify neighboring sensors. First of all, the sensors that are within a distance of δ from the target sensor p are shortlisted and stored in S. The shortest distance of these selected sensors from p is also stored in D S . In the next step, the past traffic speed data for the interval of t 0 − t of all the selected sensors in S are recorded in T S . Then, traffic similarity is obtained between each sensor q ∈ S and target sensor p based on the past traffic data from T S . Two metrics namely, absolute mean traffic speed difference and Pearson's correlation coefficient [45], were used for calculating traffic similarity. Therefore, with these steps, three attributes, namely, distance among sensors D S , absolute mean traffic speed difference S , and Pearson's correlation coefficient C S , were obtained for each sensor q ∈ S with respect to target sensor p. Then, based on these three attributes, all the sensors in S are ranked using Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) [46], which is a multi-criteria decision-making approach. Equal weight λ was used for all the three attributes, but decision-making criteria ξ were different for each of them. As the correlation should be maximum, so for correlation ξ = +1 was used. Similarly, ξ = −1 is used for absolute mean difference and distance as these two attributes should be minimum, for a particular sensor to rank high. From this ranked list, the top m sensors were selected and stored in S * as neighboring sensors of target sensor p. In the current study, top 10 sensors were selected from the ranked list, i.e., m = 10 was used. This algorithm was applied for all 382 sensors at different time instants to select the neighboring sensors. Although the target sensor was taken from selected 382 sensors, but the algorithm can output any sensor from total 660 sensors as neighboring sensor, based on the spatial and temporal traffic dependency.

Formatting the traffic data
After obtaining the neighbor sensors for target sensor p, the historical traffic speed data of these selected neighbor sensors S * were formatted into a particular matrix (say, X of size l × m × c). Here, l, m, and c represent the number of time instances during time interval t, number of selected neighbor sensors, and number of channels, respectively. For this study, to consider 5 h of past traffic data, l was taken as 60 as traffic data from loop detectors are available at an interval of 5 mins in the dataset. As discussed in the previous section, m was taken as 10. The number of channels c were taken as 4, where each channel represents the traffic data for different day, as shown in Fig. 8. In the figure, d represents the current day, so first channel contains traffic data for the current day, Algorithm 1: Algorithm for selecting the neighboring sensors and second channel contains traffic data of the previous day, i.e., d − 1. Similarly, third and fourth channels correspond to the traffic data of same day of the week as current day but from past 2 weeks, i.e., d −7 and d −14. By taking data from past weeks, the weekly traffic pattern can be captured. Other than days, the time instances are also different for different channels, i.e., in first channel traffic data for time instances from t 0 − t to t 0 was considered, but in other channels, the time interval is from t 0 − t 2 to t 0 + t 2 . This was done to capture the past traffic pattern after time instant t 0 for the same day but in past 2 weeks. After formatting the past traffic data into the traffic data matrix X , it was passed to the traffic prediction module as input to predict the future traffic speed vector Y = [y 1 , y 2 , . . . y n ] T . Here, the element y i represents the future traffic speed at time instant t 0 + i at sensor p. In the current study, the traffic prediction module is developed to predict traffic speed for next 1 h, i.e., n is 12. For training the traffic prediction module, the data matrix X and future traffic speed vector Y were normalized using Eq. (1) . (1) In the above equation, z j ,ẑ j represent the traffic data and normalized traffic data of j th sensor, respectively. max(z) and min(z) are the maximum and minimum of traffic data of all sensors, respectively.

Traffic prediction module
A deep learning model has been developed using Deep Auto-Encoders (DAEs) for predicting multistep traffic speed. Different researchers have employed DAEs to extract robust features from input and then used features further for regression or classification tasks. For instance, DAEs were used by Vincent et al. [47] for extracting robust features from the corrupted inputs. Similar to this, Masci et al. [48] initialized the convolutional neural network with the hierarchical features extracted from DAEs. Although, different researchers have used DAEs for different problems, but it has proved the capability of DAEs for extracting the robust features. Thus, inspired from this fact, two auto-encoders are employed for predicting multistep traffic speed, in this work. There are mainly two components in an auto-encoder, namely an encoder and a decoder. The encoder takes the input and tries to learn an intermediate representation, also known as latent space representation, from the input. This intermediate representation is further used as the input for decoder that tries to regenerate the input from the intermediate representation.
The architecture of the proposed traffic prediction model is shown in Fig. 9. The traffic prediction model used two different DAEs: one (represented as DAE X ) for capturing the historical traffic speed patterns and other (represented as DAE Y ) for capturing future traffic speed patterns. The historical traffic speed X is converted to its corresponding latent representation Z X by the encoder of the first DAE, E X , i.e., E X : X → Z X . The decoder D X , on the other hand, used this latent representation to regenerate the historical traffic speed, i.e., D X : Z X → X . Similar to this, the encoder and decoder of the second DAE work on future traffic speed Y by first converting it to its latent representation Z Y and then back to the input future traffic speed. Following equation represents the extraction of ith latent feature map (l i ) from input I : In the above equation, b i , W i , * , and σ represent the bias, weight, convolution operation, and activation function. Similarly, the following equation represent the reconstruction Fig. 8 The traffic data matrix X along with its channels from latent representation: where c,Ŵ , and L represent bias for input channel, transpose convolution operation, and latent feature maps. Therefore, the two DAEs are used to learn the features of individual domains using corresponding intermediate latent representation. For this, mean square error is used as the loss function for input I and reconstructed outputÎ in both the DAEs, given in following equation: The basic idea was to employ both the DAEs, i.e., DAE X : E X − D X and DAE Y : E Y − D Y , by cross connecting them, i.e., encoder E X connected with decoder D Y , to predict future traffic speed as output using historical traffic speed as input. The motivation for this approach came from the successful application of this idea of cross-connection for image to image translation [49,50]. The cross-connection helps the model to first learn the critical features of both the domains and then use these features to translate from one domain to other. Hence, in this work for cross connection, Latent Feature Mapping Module LFMM was used. The LFMM acts as a bridge between the encoder and decoder by mapping the latent representations of historical traffic speed Z X and future traffic speed Z Y , i.e., F M : Z X → Z Y . The two DAEs DAE X and DAE Y were trained separately; hence, the latent representations Z X and Z Y are from different dimensional space. Thus, the main purpose of LFMM is to fill the gap between the two latent spaces through proper mapping. As DAE X was trained using historical traffic speed, so the encoder E X already learned to extract critical features from the input. Similarly, the decoder D Y of DAE Y already learned to decode the future speed Y from corresponding latent representation Z Y . The flow of data for individual DAEs is shown with red and green arrows, in Fig. 9. The blue arrows are used to represent the flow of data in cross-connected DAEs. The architectural details of both the DAEs and LFMM are discussed in following subsections.

Deep auto-encoders (DAEs)
As explained earlier, in the proposed architecture of traffic prediction module, there are two separate DAEs DAE X and DAE Y . The encoder E X of DAE X has a linear architecture with three convolutional layers each with Tanh activation and batch normalization layer. Convolutional kernels of size 3 × 3 are used in all of the convolutional layers. During convolution in each convolutional layer padding of 1 and stride of 2 is used. It has been proved by the past research that deep learning models can achieve better performance with deeper architectures, but it also raises the problem of vanishing gradient that makes training difficult. Thus, the concept of residual learning was introduced to overcome this problem in [51]. Hence, in the encoder E X , three residual blocks are used to take advantage of residual learning. There are two convolutional layers each having kernels of size 3 × 3 with padding and stride of 1. Each convolutional layer is followed by non-linear activation ReLU and batch normalization in each of the residual blocks. In each residual block, a connection is used to add input of residual block with the batch normalized output of next convolutional layer. Dropout layer is also used to avoid overfitting. The architecture of the decoder D X is similar to the encoder E X ; just the order of layers is reversed. In decoder D X , the transposed convolutional layers are used in place of convolutional layers to reverse the effect of convolution.
The second auto-encoder DAE Y also has a linear and simple architecture. Similar to encoder E X , the encoder E Y also has three convolutional layers. Each convolutional layer is followed by a Tanh and batch normalization layer. For convolution, the kernels of size 3 are used in each convolu- Fig. 9 Architecture of traffic prediction model tional layer along with padding and stride of 1. To reduce the dimension of features two average pooling layers are also used. Similar architecture is used for decoder D Y . The only difference is instead of using the convolutional layers, transposed convolutional layers are used, and upsampling layers are used instead of average pooling layers. Similar to the autoencoder DAE X , the dropout layers are used in both encoder and decoder of auto-encoder DAE Y to avoid overfitting.

Latent feature mapping module (LFMM)
LFMM is used to cross connect the two DAEs. The main purpose of LFMM is to map the historical traffic speed latent features Z X with future traffic speed latent features Z Y . A linear architecture containing a convolutional, a flatten, and a fully connected layer is used for this. Kernels of size 3 × 3 are used in the convolutional layer along with stride of 1 and padding of 2 each for rows and columns. After training the individual DAEs separately, they were cross-connected using LFMM, and then, the complete network was fine-tuned.

Speed profile generator module
In this module, an algorithm (see Algorithm 2) has been developed to calculate the speed profile of the particular route. The algorithm takes mainly four inputs, namely, (i) S r which represent the list of k sensors on route r , (ii) D r which represent the sequence of distances between sensors, i.e., distance of second sensor from first, then distance of third from second and so on, (iii) future speed prediction Y r of size k × n, where n is the number of future time steps, and (iv) t 0 which represent the initial time instant. In brief, the algorithm first calculates the time t d a vehicle takes to cover the distance to the next sensor based on the average speed v avg . The average speed is the average of current speed v curr at current sensor and the next step's future speed on next sensor. Then, based on the travel time, using interpolation, the speed on the next sensor is calculated and that becomes the new current speed. This process is repeated for all the sensors on the route r . At last, the algorithm returns a time-series V r which contains traffic speed at the sensor locations of route r at different time instants if a vehicle starts the journey at time t 0 .

Power consumption estimation (PCE) module
This module has the responsibility to estimate the power consumption for an EV based on environmental, traffic, and road conditions. This has been done by developing a multi-channel CNN [52], given in Fig. 10. The architecture of the multichannel CNN is inspired from the architecture of the network used in [53] for classifying different hand gestures based on time-series pose data. In PCE module, the proposed multichannel CNN takes seven time-series as inputs, namely, speed of the vehicle (V r ), acceleration (A r ), road elevation (El r ), auxiliary loads (Ax ld ), initial SOC of battery (B soc ), wind speed (W sp ), and environment temperature (E tp ). It has been assumed that a vehicle normally moves with the speed of the traffic. Therefore, the traffic speed profile provided by speed profile generator module is first interpolated before using as input vehicle speed in PCE module. Vehicle acceleration can be easily calculated from vehicle speed. Road Algorithm 2: Algorithm for generating the future speed profile for a particular route Input: S r , D r , Y r , t 0 Output: V r 1 Function SpeedProfileGenerator(S r , D r , Y r , t 0 ): elevation, wind speed, and temperature can be obtained using publicly available web services. As mentioned in the section "Datasets", all the input time-series (say s) are recorded at a frequency of 10 Hz and each time-series is partitioned to n smaller time-series. Each of these smaller time-series are of 10 s duration. These partitioned time-series are then used as input to the multi-channel CNN. From the dataset, it was observed that two parameters, namely, battery's initial SOC (B soc ) and environmental temperature (E tp ), does not vary significantly within 10 s interval. Due to this, for each partition, these parameters were taken as constant and feature extraction was not done for these parameters. For all other input parameters, separate feature extraction module was used to extract the important features. Each feature extraction module has similar configuration with three convolutional branches and one residual branch.
The residual branch is used in each feature extraction module, because during the training, they make backpropagation of gradient better. Due to this, network easily optimizes and achieves better accuracy [51]. Normally residual branches act like an identity function, but here instead of giving input as the output, three average pooling layers are used. The average pooling layers generate a down-sampled output by calculating the average of the particular region of input data. This also makes the network locally invariant, i.e., same features can be extracted by the network regardless of rotation, shifting, or scaling of input [54]. Therefore, pooling layers extract important features along with reducing the scale of the network. Mainly feature extraction happens using the three convolutional branches with similar configuration, as discussed below.
Each convolutional branch of feature extraction module has three convolutional layers activated by ReLU. First, two convolutional layers are followed by average pooling layer and third convolutional layer is followed by a dropout layer and average pooling layer. Convolutional layers in a single convolutional branch have different number of kernels; for instance, the first convolutional layer uses 8 kernels, whereas the second and third use 4 kernels. Each convolutional branch is different from other, as each convolutional branch has convolutional layers with different kernel size, i.e., first branch has layers with kernel size of 7, second has layers with kernel size of 5, and third uses kernels of size 3. Kernels of different sizes help in extracting features with respect to different time resolutions. Dropout layers have been used in each convolutional branch to avoid overfitting of the network. Padding of n in each convolutional layer has been used. It can be calculated, using the equation n = (kernel_si ze − 1)/2, for each convolutional layer based on the size of the kernels used in layer.
The features extracted by each convolutional and residual branch are combined using concatenation and output of size 13 × 12 is obtained from feature extraction module. The output features extracted by each feature extraction module are concatenated and flattened further by the consecutive layers. The flattened features are further concatenated with two input parameters, i.e., temperature and initial battery's SOC. Then, the feature vector of length 782 is used as input to two consecutive fully connected layers and final power consumption output of size 10 is obtained for interval of 10 s.

Re-sampler module
The input parameters are down-sampled or up-sampled by this module to synchronize the frequency with power consumption output given by PCE module. It is necessary, because the output from PCE module is of 1 Hz frequency, i.e., 10 readings for interval of 10 s, whereas the input parameters are of different frequency. Two input parameters, namely, (the values after @, K, and N represent the dimension of vector, the kernel size, and the number of kernels, respectively) battery's SOC and environmental temperature, are taken as constant for each interval of 10 s. Therefore, these are upsampled to frequency of PCE module's output, i.e., 1 Hz frequency. Other input parameters are of 10 Hz frequency; hence, they are down-sampled to 1 Hz frequency.

Fine-tuner module
The re-sampled input parameters from previous module and the power consumption output from PCE module are combined and then used as input for this module. The fine-tuner module fine-tunes the power consumption estimate further by decreasing the error based on the input parameters. A number of regression models were trained to find out the best one and it has been observed that the Bagged Decision Tree (BDT) gave better results with least error. Breiman [55] proposed an ensemble learning technique called bagging or bootstrap aggregation in which multiple weak learners are combined to get the improved accuracy. Each weak learner, in this case the decision tree, is separately trained using a subset from training dataset. The subsets from training dataset are selected by dividing the dataset using bootstrap resampling. In bootstrap resampling, the training samples are selected randomly from dataset with replacement, i.e., one sample can occur in multiple subsets. After training each learner separately, their output is combined combined by taking an average value and final fine-tuned power consumption estimate is obtained.

State of charge calculator module
The state of charge calculator module takes the fine-tuned estimate from Fine Tuner module and calculate the battery's remaining state of charge. Following is the equation, given in [56], that can be used for this: where the symbols SOC 0 represents the battery's SOC at time t = 0 and SOC t is SOC at time instant t. P est is the predicted power consumption fine-tuned for the given time interval and E cap is the rated energy capacity of the battery. As the time series are partitioned into small time-series of 10 s interval, the value of t is 10 for first partition. After that for next partition, the calculated remaining battery SOC can be used as initial SOC. This process goes on for all the partition till there is no more partition left.

Experimental setup
As discussed previously in the section "Proposed system", there are three main modules in the proposed system. First is the Traffic Prediction module, second is Power Consumption Estimation module, and third is Fine Tuner Module. These modules were trained separately and then integrated as part of the whole system. Therefore, to train these modules data from three different sources, described in the section "Datasets", was used. First, the traffic prediction module was trained using traffic data collected from multiple loop detectors. Second, the power consumption estimation module and fine-tuner module were trained using the data generated from the simulated model of Nissan Leaf, provided in very popular simulation tool named Future Automotive Systems Technology Simulator (FASTSim) [41]. All of these modules were trained with 70% of the corresponding dataset and then tested with remaining 30%. After that, the modules were integrated as part of the complete system and tested again as a whole system. In this section, the implementation details for training and testing of the proposed system have been discussed.

Training the modules
The three main modules of the system, i.e., Traffic Prediction module, Power Consumption Estimation module, and Fine Tuner module, were trained separately. While training the fine tuner and power consumption estimation module, the feedback loop, in which the next interval's initial SOC is obtained using the remaining SOC from previous interval, was not used. However, while testing the complete system, the feedback loop was used.
As explained earlier, the proposed traffic prediction module has two DAEs. After separate training of the each DAE, they were connected using LFMM. Then, the fine-tuning was performed for cross-connected DAEs. Similar to this, the Power Consumption Estimation module was trained. During training these modules, the different layers like convolutional, transposed convolutional and fully connected layers were initialized with Xavier initialization [57]. The Xavier initialization initializes the layer weights using random numbers from the range of − 6 fan in +fan out , 6 fan in +fan out with uniform distribution. Here, fan in and fan out are the number of input and output connections with the layer, respectively. To find the optimized solution, the Adam optimizer [58] was used with the initial learning rate of 0.001. The optimizer calculates square of gradients and exponential running average of gradients at each step. The decay of these running averages was controlled using parameters β 1 and β 2 with values of 0.9 and 0.999, respectively. The parameter was initialized to a small value of 10 −8 to avoid division by zero during training. The loss function of Mean Square Error (MSE) was used during training to minimize the error. As there are dropout layers present in the networks that act as the regularizers to avoid overfitting. For these dropout layers, a number of experiments were performed to find the optimal value of drop rate p. It has been observed that after a certain threshold (0.2 in this case), increasing the dropout layer does not provide any advantage in network's performance. The Fine Tuner module was trained after training power consumption estimation and traffic prediction module.
The Fine Tuner module was implemented using a bagged decision tree that contains multiple decision trees ensembled together. The Bayesian optimization was used to find the optimal number of trees and other parameters for the bagged decision tree. It was found through Bayesian optimization that 10 weak tree learners ensembled together provide the minimum prediction error.

Testing the modules
After training the individual modules, they were integrated and tested as a complete system. For this, directed nodeedge graph G and G , as shown in Fig. 11, were created where nodes represent the sensor location on freeways of Los Angeles and Bay Area and directed edges represent the roads connecting these sensors. The direction of edges represent the direction of flow of traffic. The elevation at different sensor locations and distance among them was obtained using free web APIs provided by openroute service [59]. A number of routes were identified between different source and destination from the graph G and G . Environmental temperature and wind speed on different dates and time instances were obtained from free web APIs provided by Meteostat [60]. Then, the system was tested for energy prediction accuracy along these routes at different time instances.

Results and discussion
In this section, the results obtained by comparing the performance of proposed system with the other state-of-art techniques are discussed. First of all, the cross-connected deep auto-encoder model proposed in traffic prediction module was compared with multiple machine learning and deep learning techniques. Then, the models which performed best were selected and integrated with other state-of-the-art energy consumption prediction models, and the final integrated models were compared with the proposed system.

Performance evaluation of traffic prediction module
In this section, the performance of traffic prediction module has been compared with number of machine learning (kNN, SVR, ANN, XGBoost, and LASSO) and deep learning (CNN and LSTM) techniques.
i. ANN: An Artificial Neural Network (ANN) [61] with three layers was developed for predicting the traffic speed at different time horizons. Sigmoid function was used as the activation function for each neuron. ANN cannot learn the input patterns across time; hence, it lacks the ability to capture temporal dependency. ii. LASSO: Regularized least-square regression (LASSO) [62] was used for traffic speed prediction. Separate LASSO was trained for prediction at different time horizons. Each LASSO was trained using elastic net optimization with Alpha = 0.75 and tenfold cross-validation. iii. SVR: Another popular machine learning algorithm is Support Vector Regression (SVR) [63], which has been used successfully by different researchers for number of regression tasks. Similar to LASSO, separate SVRs, with Radial Basis Function (RBF) kernel, were trained for traffic prediction at different time horizons. iv. kNN: k-Nearest Neighbor (kNN) [64] uses Euclidean distance to determine the k similar input values, and then, it predicts the future traffic speed based on the weighted sum of these values. For selecting the hyperparameter k, cross-validation was used by changing k from 5 to 20. v. XGBoost: For predicting traffic speed, XGBoost [65] was also developed. It is one of the popular techniques that has already been used by number of researchers for different regression-based tasks. To use it for traffic prediction, the historical traffic speed was reshaped into a vector form before passing as input. vi. CNN: One of the most popular deep learning techniques is Convolutional Neural Network (CNN) [54], that has already proved its performance for various tasks. Traffic data in the form of matrices were used for training a CNN. Due to their deep architecture, CNN can identify complex patterns easily irrespective of their spatial orientation, but they lack the capability to learn temporal patterns. vii. LSTM: To learn temporal patterns, Long Short-Term Memory (LSTM) [66] can perform much better than CNNs. As traffic data have inherent temporal dependency, it is developed for traffic prediction.
To have a fair comparison with aforementioned techniques, the best suitable parameters were selected for implementing these techniques. Last 5 h of traffic data of the target sensor was used for training ANN, kNN, SVR, LASSO, XGBoost, and LSTM to predict traffic for next 1 h. The optimal values of hyperparameters were selected for these models using cross-validation; for instance, the kNN model provides the best performance with value of k = 17. Traffic data in same matrix format, as for proposed model, were used for training the CNN model. The CNN model contained three convolutional layers activated using ReLU each followed by a average pooling layer, and then, the output is obtained from a dense layer.
Three evaluation metrics, namely, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), are used to compare the performance of the above-mentioned state-of-the-art techniques with the proposed traffic prediction module. Following equation can be used to obtain the values for these metrics: In the above equations, N represents the total number of samples, and x i act and x i pred represent the actual and predicted value, respectively. The first two metrics, i.e., RMSE and MAE, measure the absolute deviation between actual and predicted values, whereas the third metric measures the relative deviation. The comparison results of aforementioned benchmark techniques and the proposed traffic prediction module at five different time horizons for both the datasets PEMS-Los and PEMS-Bay are presented in Table 2. The results presented in the table show the effectiveness of proposed approach in predicting traffic at different time horizons. It can be concluded that the proposed approach has performed better than other techniques for all time horizons with respect to both the datasets. The only exception is the prediction performance at 5-min time horizon for PEMS- Los dataset. Even though at 5-min prediction time horizon for PEMS-Los, the proposed approach does not has the best performance, but the performance is still comparable to other techniques. In the table, with the increase in prediction horizon the performance of each technique decreases, however, there is very less decline in performance for proposed traffic prediction network as compared to other techniques. It is due to the capability of proposed approach to capture spatio-temporal traffic dependency with the help of proposed neighbor sensor selection algorithm, whereas other techniques lack to effectively capture both spatial and temporal dependency. For instance, ANN, LASSO, SVR, kNN, and XGBoost cannot capture temporal dependency. Although, LSTM can effectively understand temporal dependency, but lacks the capability to understand spatial dependency. Similarly, CNN can effectively learn spatial dependency, but lacks the ability to learn temporal dependency. In brief, from the above discussion and results presented in Table 2, it can be concluded that the performance of proposed traffic prediction module is better in comparison to other benchmark techniques. Other than the above-mentioned techniques used for comparison, a number of more complex techniques have been proposed by different researchers using the P E M S − Bay dataset. This includes techniques like Spatio-Temporal Graph Convolutional Network (STGCN) [67], Graph Multi-Attention Network (GMAN) [68], Graph Wavenet [69], and Diffusion Convolutional Recurrent Neural Network (DCRNN) [70], etc. According to the results provided by respective authors, the MAE of these techniques for 30-min prediction horizon is 1.81, 1.62, 1.63, and 1.74, respectively, and for 60-min prediction horizon is 2.49, 1.86, 1.95, and 2.07, respectively. As compared to these, from Table 2, it can be seen that the proposed traffic prediction module has very less MAE of 1.49 and 1.75 on P E M S − Bay dataset for 30-min and 60-min prediction horizon, respectively. This proves the better performance of proposed technique as compared to other benchmark techniques. For more details, reader is advised to refer [71]. Considering the above discussion and the comparison results of Table 2, three approaches, namely, LSTM, XGBoost, and CNN, with comparable performance along with the proposed technique are selected for further analysis.

Performance evaluation of proposed approach
Multiple routes were identified with different source and destination from the node-edge graph G and G , as explained in the section "Testing the modules". Data from these routes at different time instances were used to test the performance of the proposed approach. To benchmarks the results, the results of proposed approach are compared with multiple state-of-the-art techniques. For this, as discussed in the previous section, three models, namely, XGBoost, LSTM, and CNN, were selected based on the comparison of their performance with traffic prediction module. Then, these selected models were integrated with three state-of-the-art techniques for energy consumption estimation, presented in [19,20,22], and results were obtained for all the routes selected from graph G and G . These energy consumption prediction techniques were implemented as given below: i. A model was proposed by Galvin [20] for estimating the power consumption of an electric vehicle. The model consist of the following equation which consider speed and acceleration as the main input parameters for calculating the power consumption: where V , A, and P represent the EV's speed, acceleration, and power consumption, respectively. ii. Yang et al. [19] proposed a model for power consumption estimation of EVs which consider multiple factors for prediction. The model uses following two equations for estimation. Equation (10) is used for calculating power consumption in normal mode and Eq. (11) is used for estimating power generation in regenerative mode The best values are represented using bold notation +P accessory , In above equations, P represents the power consumption, P reg is regenerative power, v is vehicle speed, η te is transmission efficiency, δ is coefficient based on vehicle's weight, m is vehicle's weight, f is the coefficient for rolling resistance, i is road grade, ρ is density of air, C D is coefficient of aerodynamic drag, A area of vehicle's front, P accessory is power used for auxiliary loads, k is percentage of energy that can be restored from braking, and η m is the motor's efficiency. Parameter k lies in the range of (0,1) and can be defined using the following equation: The model was implemented for comparison by taking the values of A, C D , and m for Nissan Leaf from [22,42,43]. Yang et al. [19] provided the values of other parameters, such as δ, ρ, η te , η m , η e , and f . iii. Modi et al. [22] proposed a CNN model which can predict energy consumption for an EV by considering only three input parameters, namely, road elevation/grade, tractive effort, and vehicle speed. The CNN model has a simple architecture with total seven layers (three convolutional layers, two max pooling, one flatten, and one fully connected layer).
The existing energy consumption prediction models, discussed above, were trained and tested using the data from different routes selected from graph G and G . Then, these models were integrated with the three selected traffic prediction models, i.e., XGBoost, CNN, and LSTM. The comparison results of proposed approach and these integrated models have been presented in Table 3. The comparison has been presented using three evaluation metrics, i.e., RMSE, MAE, and MAPE, as defined in Eqs. (8), (6) and (7). These met- The best values are represented using bold notation rics are calculated based on energy consumption estimation given by the particular model and actual energy consumption given by FASTSim. In Table 3, the notation XGboost-Galvin represent an integrated model, in which XGBoost has been used for traffic speed prediction, and then, model proposed by Galvin [20] has been used for energy prediction. Similar notation has been used for other integrated models. From the table, it can be concluded that the proposed approach provides results with least RMSE, MAE, and MAPE of 0.47, 0.32, and 1.60, respectively, for PEMS-Los and 0.27, 0.20, and 1.57, respectively, for PEMS-Bay dataset. Also, it can be observed that the integrated models, which use CNN model proposed by Modi et al. [22] for energy consumption estimation, performed better than other integrated models. The main reason for this is that the model given by Galvin [20] does not consider the effect of other factors like wind speed, road elevation and auxiliary loads, etc. Similarly, the model proposed by Yang et al. [19] does not consider the effect of environmental temperature, battery's SOC, etc. Also, these models lack the ability to effectively represent the complex non-linear relationship between different influencing parameters. The model proposed by Modi et el. [22] takes three inputs, namely, vehicle speed, road elevation, and tractive effort. Here, the tractive effort is calculated from the combined effort the vehicle has to make to overcome the force due to road elevation and backward force due to aerodynamic drag, etc. Also, the CNN can learn the non-linear relationship very accurately. Although, the effect of most of factors have been considered, but the effect of auxiliary loads, environmental temperature, and battery's SOC was not considered in the CNN model of Modi et al. [22]. It can also be observed that the integrated model LSTM-Modi performed better than CNN-Modi and XGBoost-Modi for P E M S − Los and comparable for P E M S − Bay dataset. This is due to better performance of LSTM than CNN and XGBoost for traffic speed prediction, in most of the cases, i.e., traffic prediction for 15 min, 30 min, and 45 min horizon. Therefore, it can be concluded that the error gets accumulated further when the different traffic prediction models were integrated with different energy consumption estimation models. Similar observations can be drawn from Fig. 12; the figure shows the performance comparison of proposed system and other integrated systems for 100 randomly selected sample routes each from graph G and G between different sourcedestination pairs. The comparison has been performed by calculating Absolute Percentage Energy Deviation (AP E D) for each route using the following equation: In the above equation, E act and E est represent the actual and estimated energy consumption for a particular route.  Fig. 12, it can be observed that the proposed system provides better estimates for all the routes from graph G and G with least APED. Also, for most of the routes, the AP E D is in the range of 0-5%, shown with black color. This validates the results presented in Table 3, as discussed above.
As a special case, out of 100 randomly selected routes from graph G, two possible routes between location A and B are shown in Fig. 13. One route, shown in blue color, is of approximately 7.7 miles and other route, part of which is shown in gray color, is of approximately 8.7 miles. Both the routes have a common portion in stating, and then, they get separated. Henceforth, the smaller route will be represented as R − I and longer route will be represented as R − I I . The energy prediction comparison of proposed approach and integrated models for route R − I on 13 th July, 2017 has been presented in Fig. 14. From the figure, it can be observed that the proposed approach shows the same behaviour as the actual, i.e., the deviation of proposed approach's estimates from actual is very less. Also, it can be seen that each approach is taking different time for completing the trip from location A to B. This is because the proposed approach and integrated models gave different traffic speed estimates and hence follow different speed profile. Due to this, the energy consumption estimation also varies a lot for different approaches, but the proposed approach performs better as compared to all the integrated models. The comparison of energy estimation by proposed approach and different integrated models for routes R−I and R−I I for two different days, i.e., 13th July, 2017 (Thursday) and 16th July, 2017 (Sunday), has been presented in Fig. 15. From the figure, it can be observed that energy consumption is more on weekend in morning, i.e., 16th July, 2017 (Sunday) at 10 a.m. as compared to weekdays morning/evening and weekend evening. This is due to less traffic on Sunday morning; hence, the driver can drive at higher speed and also the driver uses less braking which results in less energy regeneration which can be restored back to the battery. Hence, more energy consumption takes place from the battery. On comparing energy consumption of both the routes, it can be seen that in most of the cases, energy consumption for route R − I I is more than route R − I . The obvious reason for that is more distance to be covered in route R − I I , but in Fig. 15c, the energy consumption for route R − I I is less than route R − I . Here, the driver can save around 1 MJ of energy, i.e., the saving of around 9%, which can be vital in cases where the SOC of battery is already less. This is just one case, but there can be other such cases where the driver can save more energy. Therefore, in such cases, using the estimates provided by the proposed approach, the driver can select the route R − I I over R − I for saving energy, even though the vehicle has to cover more distance.

Statistical analysis
To further validate the performance of proposed technique, two statistical tests namely, Multiple Comparison Test (MCT) and Kruskal-Wallis H Test (KWT), have been used. KWT is a rank-based non-parametric test based on one-way ANOVA. This test is mainly used to determine if independent samples from two or more groups are significantly different or not. Unlike KWT, the MCT helps to identify the groups that are significantly, different from one another using oneway ANOVA for comparing multiple groups. To validate the performance, these tests were performed in sequential order. First, KWT was applied on APED of prediction results for different routes of G, obtained using proposed and other integrated models. Then, using the rank information obtained from KWT, the second statistical test of MCT was applied.
The test results of KWT and MCT are given in Fig. 16. In Fig. 16a, a Box and Whisker plot has been shown for the results obtained from KWT test. The horizontal lines of boxes represent the 1st, 2nd, and 3rd quartiles. Similarly, there are vertical lines passing through boxes. These vertical lines represent the deviation of data. There are some outliers also that are shown using plus symbols in red color. From the Box and Whisker plot, it can be observed that the mean APED for proposed approach is least as compared to mean APED of other integrated systems. Also, the small size of the box for proposed approach validates that there is very less deviation in APED for different routes of G. Figure 16b shows the results of MCT applied on rank information obtained from KWT. In the figure, proposed approach has been highlighted using blue color line. Red color lines are used to represent the techniques that are significantly different from proposed approach. To represent techniques that are not significantly different, gray color lines have been used. The figure shows that the proposed approach is significantly different from  Table 4. The first two columns of the table provide the techniques that are compared. For each pair of techniques, the third-to-sixth columns provide lower bound, upper bound, mean rank difference, and p value, respectively. The MCT test was performed by considering significance level α = 0.05. The lower bound and upper bound provide the boundary values for 95% confidence interval. As in the table for most of the techniques, the p value is less than α, it shows that the proposed technique is different significantly. Hence, null hypothesis for these pairs is rejected. This similar behaviour can be observed using the values of lower and upper bound. If the confidence interval does not contain 0.0, then it can be concluded that the techniques compared are significantly different with respect to the significance level of 0.05. There are three pairs for which p value is greater than α. As discussed previously, the proposed approach provides better estimates than these techniques also with least APED. Hence, from this discussion, it can be said with confidence that the proposed technique is statistically better than other compared techniques.

Conclusion
A complete system has been developed which can provide the accurate and real-time energy consumption estimates to the EV drivers. These estimates can be used for energy-aware routing, i.e., the driver can be provided with information that which route he should take to save energy. The pro-posed system takes into account the influence of traffic, wind, road elevation, temperature, battery's SOC, and auxiliary loads. The process of implementation, training, and testing the proposed system has been elaborated in detail. To validate the results, the performance of the system has also been compared with other state-of-the-art techniques using two real-world traffic datasets, namely, PEMS-Los and PEMS-Bay. The main conclusions are highlighted in the following points: i. The proposed integrated system can provide reliable realtime energy consumption estimates and can be used for guiding the EV drivers in real time, which thus can reduce driver's range anxiety. ii. No internal vehicle parameters are required from manufacturer to train and use the proposed system. Hence, it can be generalized for other vehicles. iii. The comparison of proposed system with other benchmark techniques validates that the proposed system performs better than other techniques with least mean absolute percentage error of 1.60% and 1.57% for datasets PEMS-Los and PEMS-Bay, respectively. iv. The proposed system can be used in real world by deploying it to the vehicular embedded system by converting it to TensorFlow Lite format.

Declarations
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.