Keywords

1 Introduction

Vehicles powered by internal combustion engines are one of the main causes of pollution in urban areas [1], and they are the subject of study in order to determine the amount of pollutants they emit [2, 3]. Current regulations stipulate that pollutant emissions must be measured during real driving [4], since these are substantially higher than those obtained in laboratory tests [5], because they consider factors such as traffic, route selection and the type of driving conditions that directly influence consumption and the emissions generated [6]. Ortenzi and Castermans et al. determine that parameters such as excessive acceleration and deceleration, wrong selected gear and high engine and vehicle speed, increase the emission of pollutants [7, 8], but to determine their real influence, extensive campaigns are needed. Measurement with widely instrumented vehicles [9]. [10] determines that polluting emissions depend on factors specific to the vehicle such as the model, weight, type of fuel, technological level and route, and operational factors such as speed, acceleration, gear selection, road gradient and ambient temperature, so all emission models must consider these factors.

This article presents a novel method to estimate polluting emissions using vehicle driving variables such as speed and gradient as input data, obtained by GPS through the application of techniques such as Kmeans, classification trees and neural networks. In order to create a model for estimating pollutant emissions, a Real Driving Emissions (RDE) test was carried out on a route in which emissions data as well as OBD and GPS data were obtained, with these data an ANN (Artificial Neural Network) that was validated with the data obtained in a second RDE test, confirming the validity of the emissions estimator. This estimator was applied to a data set of 1.218 km of actual driving. The results obtained were compared with those obtained in the IVE model and OBD test, showing similar results.

2 Materials and Methods

2.1 Proposed Methodology

Polluting emissions must be measured in real driving conditions [4], with this, different factors such as driving style, type of fuel, geographical location and environmental conditions in which the vehicle circulates are considered within the results [11], which are currently not considered in the models used by the Mobility Company of the city of Cuenca (EMOV-EP). For the estimation of pollutants based on GPS data and machine learning, the following steps are proposed: a) Collection of GPS data and pollutant emissions in real driving conditions on two routes based on [4], b) Train the model based on machine learning with the data from route 1, c) Validation of the model obtained with the data from Route 2, d) Application of the model generated to the data set of 1218.9 km and e) Processing and presentation of results.

2.2 Data Collection in Real Driving Conditions

For data collection, the vehicle used is a 2018 model Kia Sportage, according to [12] it is the best-selling vehicle in Ecuador in the SUV segment. It has a 2.0L DOHC engine, 6-speed manual transmission and at the time of the experimentation it had 18.720 km of travel and all the maintenance operations recommended by the manufacturer. The portable emission measurement system (PEMS) used is the Brain Bee AGS-688 gas analyzer that works using the non-dispersive infrared absorption method (NDIR) for the measurement of CO2 [%], CO [%] and HC [ppm] and electrochemical cell for the measurement of O2 [%] and NOX [ppm]. The equipment is powered by a battery independent from the test vehicle as established in [4]. Fuel consumption is measured using the AIC Fuel Flow Master 5004. The GPS used is the one incorporated into the Freematics ONE+ data logger, which stores latitude (Lat), longitude (Long), height (Alt) and speed (VGPSo) data of the vehicle on an SD card in CSV format. In addition to the GPS data, the equipment stores driving data from OBD such as engine speed (RPM), intake manifold pressure (MAP), and vehicle speed (VOBD).

2.3 Test Routes

Two routes, in the city of Cuenca-Ecuador, were selected for data collection. They have urban, rural and highway segments: the Historic Center, the Panamericana Norte motorway and the Cuenca Azogues motorway respectively. Following the provisions of [4] to determine the emissions in real driving conditions (RDE). Both tests were done under the same conditions, without the presence of rain or strong winds, at an average ambient temperature of 14 °C, with a hot engine and a total mass of m = 1719.5 kg, corresponding to the vehicle with full fuel tank, the instrumentation and two passengers.

2.4 Estimation of Pollutants

The emissions generated in real driving conditions on the 2 established routes are measured by the gas analyzer in volumetric concentration, and it is necessary to convert them to mass concentration using the procedure described in [4]. Emission factors \({F}_{j,k}\) of each pollutant (g/km) in section k of the RDE is determined, where \({m}_{j,k}\) is the mass of the contaminant j and s is the distance traveled in section k of the RDE, k assumes the values of u, r, a for the urban, rural and motorway sections respectively. The results obtained in route 1 are shown in Table 1.

Table 1. Emission Factors

As shown in [13] the most influential variables in the generation of polluting emissions are CO2: [G, RPM, TPS, MAP], CO: [TPS, MAP, VOBD, RPM], HC: [TPS, MAP, VOBD, G] and NOX: [MAP, RPM, GEAR, VOBD], respectively. These predictors are obtained directly through OBD, so they cannot be used to train the GPS-based model. The gear used by the driver cannot be determined directly by VGPS since, the selected gears do not depend on the driving speed. For this purpose, relationships should be sought between the data provided by the GPS and the performance of the engine. Vehicle speed is related to engine speed by the gear ratios of each of the vehicle’s 6 gears. In order to obtain a label for training a classification tree that estimates the gear used by the driver, the data obtained in route 1 is used, to which the Kmeans algorithm is applied, specifically to the vector: \({{R}_{i}=\frac{ {VSS}_{i}}{RPM}}_{i}\). Which generates a label for each of the 7 groups obtained from their centroids, the groups generated correspond to each of the 6 gears of the vehicle and to the neutral position. The RTG value is obtained by estimating the median of Ri for each gear. For the analysis vehicle, the 5th gear (G = 5) is direct, so the transmission ratio R5 = 1, from this the transmission ratio of the differential crown group RC is determined by \({R}_{C}=\frac{3.6\pi {R}_{N}}{30{R}_{5}}\), where RN = 0.3607 m which corresponds to the value of the effective radius of the tire. The rotating mass factor is determined and the efficiency of each gear is determined by (1) and (2) respectively.

$${\gamma }_{G}=1.04+0.0025{{R}_{G}}^{2}$$
(1)
$${\eta }_{G}=\frac{0.8{R}_{G}}{{R}_{G}-\frac{0.1}{{G}_{max}}}$$
(2)

The torque values obtained in the dynamometric bench at different throttle openings with which Eq. (3) determines the acceleration value of the vehicle, where T is the torque delivered by the engine as a function of the gear chosen by the driver. The tractive force of the vehicle FT is related to the longitudinal acceleration ax, the braking force, the rolling and aerodynamic resistance, and the resistance to slopes, see Eq. 3.

$$m\frac{T{ \eta }_{G} {R}_{G}}{{m \gamma }_{G} {R}_{N}} ={F}_{T}-mg\left(f+{f}_{0}{{V}_{GPS\,i}}^{2.5}\right)+\frac{1}{2}\rho {{{C}_{X}{A}_{f}V}_{GPS\,i}}^{2}-mg\left(f+{f}_{0}{{V}_{GPS\,i}}^{2.5}\right)$$
(3)

The rolling and aerodynamic resistance are obtained by (3), in which the coefficients of static and dynamic adhesion are \(f=0.015\) and \({f}_{0}=0.01\) respectively, the density of air \(\rho =0.89\), drag coefficient \({C}_{X}=0.33\), and the frontal area of the vehicle \({A}_{f}=3.015\,{\rm m}^{2}\). The tractive force of the vehicle is expressed in (3), and can be determined with the data coming from the GPS. The braking force is applied when FT is null, that is to say, the accelerator and the brake cannot be activated at the same time. Traction force depends on engine performance, which involves throttle opening, intake manifold pressure, and the gear selected by the driver, thus obtaining a predictor element that, like longitudinal acceleration and gear label selected are used to train a classification tree that determines the gear used by the driver. The vector taken in route 1 for training is Ini = [FTi, axi, VGPSi]. The classification tree obtained has 7, the highest rate of hits occurs in neutral (99.5%), 1st (89.3%), 2nd (82.5%) and 3rd gears (83.6%), while in 4th (61.5%), 5th (45.3%) and 6th (71.9%). The efficiency of the given model decreases, because the performance of the vehicle under these conditions is very similar. The values obtained can be seen in Fig. 2, which when compared with the data obtained by OBD show significant differences in gear changes. During the gear change, the driver does not apply the accelerator, so the engine speed drops to the idle value of 663 RPM determined by the mode of the dead time in route 1, as shown in Fig. 1.

With the data obtained from Ii and the label of each march, 4 neural networks are trained to estimate pollutant emissions. Each network has 3 neurons in the input layer, 10 in the hidden layer, and one at the output. The R2 determination indices obtained are 0.935, 0.923, 0.914 and 0.943 for CO2, CO, HC and NOx emissions respectively. The data obtained in 1218 km of random route in the urban, rural and highway areas of the city of Cuenca are applied in the generated model. In Fig. 2 it can be seen that the 1st, 2nd and 3rd gears are used at low average speeds, on short journeys that occur mostly in urban areas and very rarely in rural areas and highways, and that at higher speeds CO2, CO and NOX emissions decrease.

Fig. 1.
figure 1

Model Results

Fig. 2.
figure 2

Emission Factors

The error obtained in calculating the average driving speed in each gear is due to the model that determines the gear selected by the driver. The average emission factors for each model are shown in Table 2, the values estimated by IVE are higher than the other models analyzed, the main difference is the CO2 emission factor that is strongly influenced by the low speeds of circulation in urban zone.

Table 2. Emission Factors

3 Conclusions

This work proposes a method for the estimation of pollutant emissions based on data from GPS applying Machine Learning. For it, initially a highly effective classifier was obtained for the evaluation of the gear selected by the driver, based on the Obtaining labels through Kmeans and the subsequent training of a classifying tree, the errors generated occur in the small moments that the transition between gears lasts. The calculation of pollutant emissions was given from the determination of the importance of the predictors in the data obtained in two routes of the RDE test, to later train 4 ANNs that obtained high R2 determination indices that validate the applied method. The model obtained is more robust to different traffic conditions and presents better results in circulation at low average speeds than the IVE and RDE models, so it is recommended to be used for calculating emission factors and estimating emission inventories vehicular.