Driving skill classification in curve driving scenes using machine learning

Driver support and infotainment systems can be adapted to the specific needs of individual drivers by assessing driver skill and state. In this paper, we present a machine learning approach to classifying the skill at maneuvering by drivers using both longitudinal and lateral controls in a vehicle. Conceptually, a model of drivers is constructed on the basis of sensor data related to the driving environment, the drivers’ behaviors, and the vehicles’ responses to the environment and behavior together. Once the model is built, the driving skills of an unknown driver can be classified automatically from the driving data. In this paper, we demonstrate the feasibility of using the proposed method to assess driving skill from the results of a driving simulator. We experiment with curve driving scenes, using both full curve and segmented curve scenarios. Six curves with different radii and angular changes were set up for the experiment. In the full curve driving scene, principal component analysis and a support vector machine-based method accurately classified drivers in 95.7 % of cases when using driving data about high- and low/average-skilled driver groups. In the cases with segmented curves, classification accuracy was 89 %.


Introduction
Driving support and infotainment systems inside vehicles are expected to improve the safety of driving and the comfort of those in the vehicle.As the data communication infrastructure matures, high-bandwidth communication is increasingly available between vehicles and the outside world.The functionality of some in-vehicle systems can be realized at data centers, with data exchanged between the vehicles and the data center at high speed.In Japan, some services of this type are already being offered, such as Toyota Motor Corporation's G-Book [1], Honda Motor Corporations' Internavi [2], and Nissan Motor Corporation's CARWINGS [3].Not only conventional vehicle data buses but also various sensors, such as Global Positioning System (GPS) sensors, radar, and cameras are now equipped in vehicles, mainly as safety equipment.New services can be provided by uploading data from these sensors over a data communication network.As examples, G-Book can locate a stolen vehicle by providing its position and can detect unlocked doors or windows in a vehicle and then send an alert to a user.
Along with more recent advances in the field of active vehicle control, many other driver assist systems exist, such as anti-lock braking systems, vehicle stability control, adaptive cruise control, and lane keeping assist.These improve the safety of driving.It is expected that novel services to provide driving support and infotainment will be implemented in the future.
Our overall aim is to create a framework for providing drivers with the right information, at the right time, in the right way to make driving safer, more comfortable, and more enjoyable.Toward realizing this goal, a conceptual schematic drawing of a data center-based cyber-physical information processing system is shown in Fig. 1 [4].Vehicleand driver-related sensor data are uploaded to the data center and stored in structured databases.Data related to the outside world, such as data about the social networks of the driver and Web data, are also uploaded to the data center.Inside the data center, the main data-processing steps can be summarized as acquisition and accumulating the data, analyzing the data, making predictions, and then filtering the output according to driver needs and preferences.Optimal humanmachine interfaces are expected to provide the right information to each driver at the right time, and the drivers' feedback data are to be uploaded to the data center for further processing.
Classification of driving skills and driver state is a basic issue in building systems for driver support and infotainment that can be adapted to the individual needs of specific drivers.In this paper, we present a machine learning approach to this problem.Conceptually, in this approach, a driver model is trained from sensor data related to the driving environment, vehicle response, and driving behavior for driver.Once the model is trained, the driving skill of drivers can be classified automatically for novel situations.
The results of this system can be used in building user interfaces and interactive vehicular applications.

Related work
To understand the driving skill of a given driver, her continuous process, which is characterized as the Perception-Decision-Action cycle, has to be taken into account.
Table 1 shows a detailed comparison of machine learning methods used in prior studies of classification of driving maneuver skill.Tang et al. [17] and Zhang et al. [16] have dealt with analyzing driving data in the lateral direction (steering angle only) by setting the speed of the driving simulator to a constant.In contrast, Chandrasiri et al. [15,18] have dealt with a more natural case: motions in both longitudinal and lateral directions of the driving simulator/real vehicle are considered with multiple attributes (specifically, speed is not fixed, and the driver can control the vehicle in both longitudinal and lateral directions).In making this possible, a novel approach has been used, converting time-indexed data into distance-indexed data.The details of this are discussed in Sect.3.1.
Most of the target driving scenes involve curves or double lane changes, although one includes a narrow straight lane [15].Each scene is intended to elicit a display of maneuvering skill.Another novel aspect of our research is that it presents scenes with curves of different radiuses and angular changes.Analysis of results from these scenes can determine which require more maneuvering skill.
In using machine learning to characterize driving behavior, there are two main steps.Various features and classifiers have been used.Tang et al. used wavelet transforms (WTs) and WTs with a discrete Fourier transform (WT ?DFT), applying this to the steering angle and using a neural network and support vector machine (SVM) for learning.Zhang et al. have applied DFT to the steering angle and used a neural network, SVM, and decision trees for learning [14,16].Chandrasiri et al. have used DFT with probabilistic neural networks and SVM in their work [15].
In our work, we have experimented with a combination of multiple features that cover both lateral and longitudinal controls as discriminative features.For dealing with multivariate time-series data, we first apply the well-known principal component analysis (PCA) to reduce the dimensionality of the data.In doing so, we also aim to overcome the problem of data sparseness.
In this work, we use the k-nearest neighbor (k-NN) classifier, which is simple to implement, and SVM, which is known to be powerful.The driving scenarios and data segmentation methods that are used in our experiment distinguish this study from Refs.[16,17].In those studies, a full set of data from single runs in driving simulator scenes involving a double lane change and a lane change in a curve (a single curve of common radius and angular change is used for all trials) is used for learning and prediction.In our work, we simulate typical curves to elicit typical driving maneuvers.Specifically, we use full curve data from six different curves of distinct radii and angular changes.By choosing driving scenes in this way, we can analyze differences in driving skill that are demonstrated in response to the difficulty of the driving scene, which can vary according to the radius and angular change of the curve.We also analyzed driving data after dividing the curves into small segments.This is a challenging task because the data from a small segment of a curve must be applied to analyzing the skill level of the driver.Despite the difficulty in finding it, knowing the driving skill displayed from a small driving segment would benefit driver support and infotainment applications.
As an example application, information overload could be prevented by taking instantaneous driver skill into account.Even for a highly skilled driver, real-time fluctuation in the classification of driving skill can be expected for reasons such as fatigue and driving conditions.

Driving simulator experiment
We collected data on driving in curve-containing scenes using the driving simulator shown in Fig. 2. The collected data include steering angle, speed, longitudinal acceleration, lateral acceleration, yaw rate, accelerator control, brake control, lateral displacement, longitudinal displacement, accelerator control speed, and brake control speed.These data were sampled at 60 Hz.Sixteen adult participants (men and women 20-40 years of age) took part in the experiment.As shown in Fig. 3, the driving course consisted of 6 curves (radius = 50, 100, or 200 m; angular change = 45°or 90°).In the simulator, each driver ran up to 12 trials in an urban scene with 2 trials per scene.In total, 960 (16 9 10 9 6) runs of driving data from curves were used for the analysis.Traffic signs indicating a speed limit of 60 km/h are set at 100 m and 50 m before the starting point of each curve (Fig. 4).This controls the speed at which drivers enter the curves.We asked drivers to drive in the left lane and not to cross the yellow center line.

Data conversion
The data were coded by time when acquired from the driving simulator.However, we want to compare these data across different test runs and different drivers for a given location of the driving course.To allow this, we converted the time-indexed data into distance-indexed 1 data.  1 In real vehicles, it will soon be possible to extract centimeter-level accuracy of the driving location using high-precision GPS sensors.
Driving skill classification in curve driving scenes using machine learning 199 123 shows an example of data before and after the conversion.
From the time-indexed sensor data, we convert to distanceindexed sensor data by linear interpolation.

Driving skill tagging
This section explains how the simulation runs of drivers are categorized into different skill classes.The results of analysis of a single run for each driver by an expert on driving skill were used to divide the group of sixteen drivers who took part in the driving simulator experiment into two skill classes.Five drivers were classed as high-skilled drivers, and the other 11 as low/average-skilled drivers.
Fluctuations in driving skill between different runs of a driver are not considered in this experiment.Tagging the driving skill on that basis of an individual run instead of all runs together of a driver may cause the final output results to be more accurate.However, there is a tradeoff between the tagging cost and the final outcome.
After tagging, there were 300 runs of curve driving data for high-skilled drivers and 660 for low/average-skilled drivers.With curve segmentation, this was increased to 1500 and 3300 data, respectively.

Data analysis methods
The flow of data analysis in this paper is depicted in Fig. 6.We used driving data acquired by a driving simulator as explained in Sect.3.Ten driving data attributes analyzed here are those previously mentioned (steering angle, speed, longitudinal acceleration, lateral acceleration, yaw rate, accelerator control, brake control, lateral displacement, accelerator control speed, and brake control speed), which covers both longitudinal and lateral control of the vehicle.In the pre-processing step, the collected driving data are converted from time-index data into distance-indexed data.This enables comparison among different runs of drivers at the same point on a curve (see Sect. 3.1 for details).The segmentation method is depicted in Fig. 7.As shown in figure, segments 1 and 5 are included in the data analysis along with the curve Sects.2-4.In those end segments, the vehicle enters and leaves the curve.Therefore, in the full curve case, analysis of driving skill via machine learning considers transition effects from straight segments to curves and vice versa.In our future experiments, the design of the road could be smoother, such as using clothoids.
To merge the data from different sensors into a single feature vector, some normalization method should be employed.For our analysis, we use the standardized score (z-scores) normalization at the points in the distance-indexed data.Feature extraction using PCA and driving skill classification using k-NN and SVM are explained in Sects.4.1 and 4.2, respectively (Fig. 8).For extracting features to use in analyzing driving skill with respect to both longitudinal and lateral maneuvers in a curve, we use multivariate time-series data from the sensors.Because the dataset is high dimensional and the data points are sparsely distributed, we use PCA to reduce the dimensionality, as is conventional to do for data with these characteristics.
We combine these normalized multiple sensor data into a one-dimensional feature vector, characterizing the ith run by a vector x i .
The covariance matrix of n data vectors, (x 1 ,x 2 , …, x n ), can be written as where T is the transpose operator and l is the mean of the data.
By eigenvector decomposition of C, we calculate m eigenvectors and then sort them in order of decreasing 123 magnitude of eigenvalue.By projecting the original observation data onto the eigenspace spanned by the top d eigenvectors, we can represent them in a lower-dimensional space effectively.In this paper, we selected the number of principal components d to optimize learning (choosing 10 %-15 % of the components) in presenting the original data.Enough principal components to account for over 90 % of the variance in the data were used for the analysis in our previous work [18].Overall accuracy could be increased by optimizing the number of principal components that are used (see Fig. 9).Depending on how we define x i , there are two main cases studied in this paper.
1. Full curve analysis: x i is defined for the sensor data across the entire curve for each curve.2. Segmented curve analysis: x i is defined for each segment of the curve and PCA analysis is performed separately for each segment of each curve.

Driving skill classifier
One method of classifying skill is to manually set up rules for classifying driving skills on the basis of extracted features of the driving sensor data.However, it is widely known that machine learning/data mining algorithms can effectively formulate these rules automatically using some training dataset.
Tang et al. have surveyed the ten most influential data mining algorithms in the research community [25].In this work, we use k-NN and SVM, which are both in the top ten.The k-NN algorithm is easy to implement, and the model can be built easily.In contrast, SVM is more difficult to implement but is one of the most robust and accurate methods known so far.
In the k-NN algorithm, to classify unlabeled data, the Euclidean distances between the unknown data and labeled data are calculated in the feature space, the k-nearest neighbors are identified, and the class labels of these nearest neighbors are then used to determine the class label of the unknown data.In a two-class learning task, SVM finds the best hyperplane to distinguish between the two classes by maximizing the margin between the two classes from the training data.SVM also has one of the best generalization abilities for correctly classifying future data.We compared different kernels (radial, linear, polynomial, and sigmoid) in terms of their classification accuracy and used the radial kernel, which had the best performance, in this work (Fig. 9).Details of these two algorithms can be found in [21][22][23][24][25].
In this paper, we use the leave-one-out method for evaluating the accuracy of the driving skill classification algorithms.We build classifiers (k-NN and SVM) by leaving out the data to be classified and using the rest of the data to build the classifiers for training.In the testing phase, the left-out data are used to test the classification ability of the classifier.This process is repeated for all available data.
Driving skill classification accuracy is defined as follows: Here, the following are used: N HH : Number of correctly classified high-skilled driving runs.
N LL : Number of correctly classified low/average-skilled driving runs.
N: Total number of runs.

Results and discussions
In this section, we show and discuss some of the results from the analysis of driving skill that was performed using the experimental method discussed in Sect. 4. Driving course contained six curves with different radiuses and intersection angles as explained in Sect.3. In this paper, we report classification accuracy for six full curves and curvesegmented cases.
Figure 10 shows how combinations of different attributes from feature extraction and two driving skill classifiers contribute to the accuracy of driving skill classification.In previous research, except for those using our proposed methods (this work and [18]), which are shown in Table 1, only steering angle data were used.We wanted to compare how combinations of multiple parameters work, and this can be seen in Fig. 10.We selected a combination of parameters that can be easily acquired in a real driving scenario using currently available sensor data, such as driver's control data (steering angle, acceleration pedal control, and brake pedal control) and basic parameters that account for both longitudinal and lateral movement of the vehicle (steering angle, speed, longitudinal acceleration, lateral acceleration, and yaw rate).Additionally, we also included lateral displacement, which In terms of the classifier, SVM performs better than k-NN in all cases, which is not surprising.However, since we have extracted low-dimensional discriminative features via PCA, k-NN does not offer much worse performance.
In analysis of the full curve data, using all ten attributes gives the highest accuracy rates among tested cases.Overall, low/average-skilled and high-skilled drivers were classified with 95.7 % accuracy using SVM.
In analysis of the segmented curves, an average accuracy of 89 % was obtained, using the same classifier.These results show the feasibility of using only a segment of a curve for classification, in spite of the small data window.Figure 11 illustrates how driving skill recognition accuracy depends on segment number.Using SVM for classification, the accuracy varies from 85 % to 91 % in different segments of the curves, with the maximum accuracy of 91 % recorded at the fourth segment.This segment is near the curve exit, which may require higher use of skill, particularly if errors had accumulated in earlier segments.
An example of PCA-based projection of the driving data onto a low number of dimensions (here, 2) is depicted in Fig. 12 for the segmented curve case.Clustering of drivers by skill level is visible, even in two dimensions.However, we cannot directly compare the accuracies of driving skill through the qualitative analysis of data for each segment in Fig. 11 because we use several different dimensionalities, with the components to use determined from the cumulative contribution ratio of the principal components.
Table 1 shows a comparison of machine learning methods for classifying driving skill at maneuvers.In the analysis by the proposed method, both lateral and longitudinal control of a vehicle are considered.In contrast, only Fig. 12 Segmented curve with driving data analysis: PCA space for the sixth curve (R = 100 m, 45°), projected to 2D lateral control is taken into account in the works by Zang et al. [16] and Tang [17].
Our work deals with a simple scenario of driving on a curve, and a highest accuracy in classifying driving skill compares favorably with results from prior research.
Driving skill classification results for curves of different radii and angular change [intersection angle (IA)] are depicted in Figs. 13 and 14.There is a tendency toward higher accuracy when the radius is smaller.This could be because the difficulty of driving increases as the radius becomes smaller.For curves of the same radius but different angular change, a change by 90°has higher difficulty than a change by 45°due to the longer distance of the curve in the first case.The above tendency can be seen explicitly in the analysis of the segmented curves.With the full curves, differences between the 50-m and 200-m radii are marked.From these results, we can select curves with a smaller radius and a larger angular change to increase the accuracy of driving skill classification.
In our experiment, all curves were to the left for convenience in building the simulator scenario.This may affect both the potential learning effects and the anticipation of control input needed.We intend to address these issues in future research.
However, this work and [18] are attempts to analyze driving skill on multiple curves and multiple segments and to compare driving skill in curves of different radii and angular change (as shown in Figs.13,14) using machine learning methods.

Conclusions
In this paper, we analyzed the skills of drivers at longitudinal and lateral maneuvers, using sensor data from a driving simulator.
We demonstrated the feasibility of classifying driving skill in both full curve and segmented curve cases by comparing features composed of different attributes and classifiers.
As features, principal components of combination of attributes such as steering angle, speed, longitudinal acceleration, lateral acceleration, and yaw rate were used.As classifiers, k-NN and SVM were used.In classification into two classes of driving skill with full curve scenes, an overall accuracy of 95.7 % was obtained using PCA components of ten attributes and an SVM classifier.The average accuracy was 89 % for the cases with segmented curves.
Analysis of driving skill in complex traffic environments, such as scenarios that contain surrounding vehicles and different road gradients, is left to future work.
In the future, we aim to build driver support and infotainment systems that can be adapted to individual user needs.

Fig. 1
Fig. 1 Cyber-physical system for vehicle application

Fig. 7
Fig. 7 Data segmentation at a curve

Fig. 9
Fig.9Grid-search result for optimization of threshold for including principal components in the analysis, k for k-NNs, and the kernel for SVMs

Fig. 11
Fig. 10 Driving skill classification accuracy for full curve and curve-segmented cases

Table 1
Comparison of machine learning methods for classifying driving maneuver skill SVMPCA principal components analysis, DFT discrete Fourier transform, WT wavelet transform, k-NN k-nearest neighbor, SVM support vector machine, PNN Probabilistic neural network, FFNN feed forward neural network, RBFN radial basis function networks a The Classification accuracy is shown based on the best case in the original paper