Introduction

Driver classification is a common strategy used to easily group individuals based on similar behavioral traits, driving style, or demographic characteristics. Classification provides an efficient approach to isolating unique characteristics associated with specific driver types under various driving conditions. Not only can this process be used to develop better driver behavioral models for car-following or lane changing, but also to identify risk-averse driving behaviors; thus, providing context for targeted safe-driving education.

Existing research indicates that driver classification has primarily been performed using aggregate or continuous driving performance metrics and driving-centric questionnaires. Furthermore, very few studies incorporate the use of physiological (defined as the information pertaining to a driver’s bodily functions such as cognition, heart rate, vision-related changes, skin conductance, perspiration, and others (Lin et al. 2006)) changes and environmental factors when assigning driving styles. To achieve a more holistic classification, this study seeks to address the above shortcomings using a data-driven approach to assess driving tasks performed under varying traffic conditions resulting in measurable changes to mental workloads (defined as the proportion of mental capacity required by an individual to perform a task (Brookhuis et al. 1991)). We also go a step further to assess the overall performance under complex tasks such as distracted driving. The main goals of this research are to classify drivers based on their simulated driving and psychophysiological (defined as physiological changes resulting from variations of psychological processes or behavior (Kummetha 2020)) performance using a machine-learning approach and predict their classification outcome (i.e., driving profile) from the scores of various psychometric questionnaires administered prior to their driving assessment, as independent variables within logistic regression. The novelty of this research stems from the extensive driving performance and psychophysiological time-series data collected across various task complexities and for a relatively large pool of ninety participants.

The manuscript begins by providing a comprehensive overview on the existing research on subjective and quantitative driver classification. The section detailing the data and methods is then presented, diving into the participant selection and simulation, psychophysiological measures, data preparation, and methodological approach to driver classification and statistical modeling. The results, discussion, and conclusions are then presented.

Background

Several studies have employed classification to identify behavior and driving styles. Some rely on continuous measures of driving performance or physiological parameters while others take on a subjective approach relying on questionnaires. Aoude et al. developed algorithms to classifying drivers by considering their behavior at intersections (Aoude et al. 2012). The authors collected roadside data which included vehicle speeds, range, and lateral positions. A GPS antenna also recorded video from all intersection approaches including the traffic signal phases. They implemented two machine-learning algorithms called Support Vector Machine (SVM) and Hidden Markov Model (HMM) to classify whether the driver was compliant or violating the signal indication. The developed model was then validated on another naturalistic dataset.

However, roadside data alone may not accurately classify drivers. Driving performance data from inside the vehicle can provide richer information for driver classification. Kondyli and Elefteriadou observed behavioral trends within a naturalistic setting suggesting three driver behavioral types: aggressive, average, and conservative. Aggressive drivers were observed to drive at high speeds (> 15 mph speed limit), perform six discretionary lane changes, and aggressive lane changing (Kondyli and Elefteriadou 2011). Average drivers did not exceed 10 mph over posted speed limits while conservative drivers maintained speeds within 5 mph of posted limits. Fernandez et al. used several data elements such as gas/brake pedal usage, speed indicators, and acceleration collected from a 3-D virtual driving simulator with a lifelike cockpit to group drivers (Fernández and Ito 2016). They developed a Fuzzy rule-based system that can profile the drivers into five groups (very passive, passive, normal, aggressive, and dangerous). Ly et al. built both supervised (SVM) and unsupervised (K-means clustering) machine-learning models using the inertia data collected from an instrumented vehicle’s Controller Area Network (CAN) bus (Ly et al. 2013). The study showed that features associated with the acceleration did not contribute much to classify drivers, rather the braking and turning events were prominent features to differentiate between drivers. Zhang et al. also developed a similar SVM model to classify drivers based on individual characteristics sensed by only low-level sensors (Zhang et al. 2016). Low-level sensor data include data gathered from sensors that are available on smartphones along with the sensor information available through the vehicle’s on-board diagnostics (OBD-II) port. Acceleration and jerk were used as main parameters instead of speed, as these variables were more closely related to driving behavior (Zhang et al. 2016). One of the shortcomings of the study was that the classifier model worked best with fewer drivers, which may not be a realistic case for real world-driving conditions. A general framework to classify drivers through mathematical strategies was proposed by Jensen et al. (2011). In-vehicle data and survey data were both collected to build the framework to classify drivers into six classes ranging from ‘timid’ to ‘aggressive’. The in-vehicle data were collected through the OBD-II recorder and included vehicle speed, acceleration, jerk, brake pressure, steering wheel angle, and yaw rate. The survey was mostly limited to questionnaires regarding demographic characteristics of the drivers.

Further, researchers have also explored the use of physiological measures in driver classification. Lin et al. classified ten drivers by analyzing physiological measures in response to an unexpected obstacle in a virtual simulator. Drivers were classified using driving trajectory, speeds, and event-related potentials (ERP) to form two categories: aggressive and gentle (Lin et al. 2006). Zeng et al. utilized Electroencephalography (EEG) signals to classify driver’s mental state (Zeng et al. 2018). Physiological data (EEG signal, blink rate, and heart rate) were collected through a driving simulator experiment. They proposed two deep convolutional neural network-based models called EEG-Conv and EEG-Conv-R, which can predict the mental state of the drivers (whether the driver is drowsy or not). The experiment and validation concluded that the developed models were more reliable and accurate for binary classification of the driver’s mental state than traditional machine-learning models like SVM.

Subjective questionnaires are frequently used in driving research to capture information not readily available or captured in performance datasets. The Driver Behavior Questionnaire (DBQ) was initially created to assess whether a questionnaire could distinguish between four main factors, i.e., unfocused errors, emotional errors, reckless violations, and confused errors. The questionnaire could explain up to 33% of the variance in aggressive driving taking into account various errors and violations (Hong et al. 2014). Berdoulat et al. investigated the potential effects of aggressiveness and transgressive behavior of the drivers while driving (Berdoulat et al. 2013). They utilized five different types of scales and questionnaires to assess driving anger, aggressiveness, and impulsiveness. DBQ was used to classify three types of behaviors: “violations”, “slips and lapses”, and “Errors”.

Tao et al. investigated the role of personality traits and driving experience in risky driving behaviors among drivers in China (Tao et al. 2017). Personality traits were measured by Eysenck Personality Questionnaire (EPQ) which has 12-item subscales assessing four personality constructs such as: extraversion (e.g., “Are you a talkative person?”), neuroticism (e.g., “Does your mood often go up and down?”), psychoticism (e.g., “Do you enjoy co-operating with others?”) and lie (e.g., “Do you say some bad words casually?”). The risky driving behaviors were analyzed by the Chinese version of DBQ which modified the original four factors of DBQ into: ordinary errors and violations (15 items), lapses (6 items), aggressive violations (3 items), and speeding (3 items). Another questionnaire that has been moderately used in driving behavior studies is “The Driving Vengeance Questionnaire (DVQ)”. It was created and used to gauge how often drivers avenge themselves in typical driving circumstances (Wiesenthal et al. 2000). Wiesenthal et al. further validated the DVQ where they showed that DVQ scores were predicted by driver violence and vengeful drivers had violence into their past acts. In situations of heavy traffic, vengeful drivers were more likely to display moderate aggression.

Other psychometric evaluation techniques used in behavioral studies to assess mood and personality, cognitive engagement, and empathy and social decision-making, could also be applied to link general behavioral characteristics to driving styles. The most popular questionnaires/metrics as found in the literature are summarized in Table 1.

Table 1 Summary of common psychometric evaluators

From the literature, it is evident that past studies have primarily performed classification using one of the three categories of data, i.e., driving performance, physiological measures, or driving-centric questionnaires. However, most metrics are directly associated with driving, without regard for other behavioral traits outside the realm of the automobile but still influence the tasks associated with driving. In addition, most of the previous driver-classification studies, focused on uncovering extreme behaviors (e.g., errors or risky behaviors) and their correlation with safety and crash occurrence. However, they do not investigate how driver variability and different driver styles are affected by the roadway/traffic environment and complexity, and consequently, how this relationship may impact the operation of the traffic stream. Therefore, the main objective of this study was to investigate driver variability and potential driver classes while completing driving tasks under different roadway environments and normal/crash-free operations, by combining a variety of complex psychophysiological and driving performance measures. A secondary objective was to assess whether static driver information obtained from subjective questionnaires and demographic data can equivalently be used to predict the derived driver classes. This is achieved by:

  • Data fusion of psychophysiological and driving performance changes experienced by drivers when engaged in simulated tasks of varying level of difficulty, coupled with machine-learning techniques to provide a more accurate estimate of the ground truth for behavioral classification, and

  • Employing pre-driving psychometric questionnaires to determine the most suitable subjective metrics for predicting driving style. If psychometric questionnaires have strong predictive qualities, this may save resources and time for defensive driving educators, researchers, driving rehabilitation specialists, and drivers.

Data and Methods

In this study, we utilize time-series datasets of complex psychophysiological and driving performance measures as the ground truth to complement a set of psychometric and demographic questionnaires to assign and predict driver classes. This section details the participant selection and driving simulation, psychophysiological measures employed, and data preparation and methodological framework.

Participants and Driving Simulation

The established methodology involves collecting driving performance and psychophysiological data from six tasks in a driving simulator setting as shown in Fig. 1. A fixed-base simulator, in a half cab, utilizing three forward screens to provide a 170° horizontal field of view and a rear screen was used in this research. Ninety participants ranging from 18 to 64 years of age were recruited for the study (mean age = 31.4 years, standard deviation (SD) = 14.2 years, Male driver to Female driver ratio of 1). PANAS, CRT, IRI, EAI, PES, 18-point NFC, and a basic demographic survey (i.e., age; gender; annual mileage traveled; crash history; insurance type; traffic violation history; cell phone use while driving; and education level) were administered to drivers prior to their participation in the simulated drive. Drivers were compensated for their participation in the study with a $50 gift card. Table 2 presents a summary of the demographic properties of the recruited drivers.

Fig. 1
figure 1

KU driving simulator in action

Table 2 Summary of participant demographics

Participants’ eligibility for the study was determined based on the following criteria:

  • Aged between 18 and 65 years and in possession of a valid U.S. driver’s license;

  • At least 1 year of driving experience with an annual mileage of not less than 1,000 miles; and

  • In good health (no eye conditions apart from corrective lenses, free from any heart conditions and seizures, no history of inner ear/balance problems, and little to no known history of severe motion sickness).

For the simulated drive, a tutorial and six tasks consisting of select external conditions as detailed in Table 3 were created. The tutorial comprised of familiarizing participants with the simulator environment and equipment, particularly gas and brake pedal responsiveness, steering wheel sensitivity, speed and gap adjustment, and lane changing activity. The tutorial also served as a tool to screen for simulator sickness and any participants that showed signs of severe simulator sickness (i.e., vertigo, nausea, sweating, dizziness, fatigue, stomach awareness, vomiting, and general discomfort) were advised to forfeit with a partial monetary compensation.

Table 3 Summary of common psychometric evaluators

Each task was 8 km long with a posted speed limit of 70 mph (112.7 km/h) and was designed to capture incremental changes to behavioral and driving performance measures. Tasks varied in traffic density, but drivers were free to maintain any gap or speed. Emphasis was placed in scenario design to carefully control in-vehicle body movements, visual and auditory discrepancies, and task uniformity, as these significantly affect data quality of psychophysiological measures. The task sequence for each participant was completely randomized to eliminate order bias. Further, psychophysiological measures were continuously collected during the drive by capturing micro-level changes to vision and attention. At the end of each task, participants were required to complete the National Aeronautics and Space Administration-Task Load Index (NASA-TLX) (Hart 1986) and Situation Awareness Rating Technique (SART) (Selcon and Taylor 1990) questionnaires to provide a well-rounded subjective baseline.

Tasks 4, 5, and 6 shared the same roadway configuration as shown in Fig. 2. Task 6 additionally required participants to attempt a visual secondary task on a 10-in. Windows tablet, up to four times at random intervals no longer than 20 s each.

Fig. 2
figure 2

Layout of tasks a Tasks 1 and 2, b Tasks 4, 5, and 6 (Kummetha et al. 2021)

The tablet was placed approximately 30° to the lower right from drivers’ line of sight and was used to simulate the act of driving while using the media center or entering text into an on-board navigation system as shown in Fig. 3 (Kummetha 2020; Kummetha et al. 2020). The data collection zone was limited to the middle 6.4 km to minimize data irregularities arising from the beginning and ending of the drive, shown in Fig. 2.

Fig. 3
figure 3

Secondary task used for distracted driving (Kummetha et al. 2021)

Psychophysiological measures

For the continuous measurement of vision-related metrics such as pupil diameter, gaze point vectors, blinks, and fixations, a Fovio FX3 eye tracker with a data collection frequency of 60 Hz was utilized (EyeTracking-Inc. 2019). Mental workload was established from the eye tracker in one-second increments using a patented pupillometric technique called Index of Cognitive Activity (ICA) (Vogels et al. 2018). ICA was chosen over the raw pupil diameter metric because of its ability to disentangle pupil response arising from changing lighting conditions especially prominent in virtual environments (Vogels et al. 2018). ICA scores vary continuously between 0 and 1 indicating the lower and higher spectrum of mental workloads, respectively.

Engagement level (EL), defined as the level of engagement/arousal experienced in response to a particular task, was an important component in this research (Pope et al. 1995). Enobio 8 portable Bluetooth-based system was used to capture participants baseline and time-correlated EEG at 500 Hz (Neuroelectrics 2019). Eight EEG electrode positions (i.e., P3, PZ, P4, CZ, T7, T8, O1, and FZ) aligned with the 10–20 international mapping layout and findings of Pope et al. were used as shown in Fig. 4 (Pope et al. 1995). An ear clip was also used to add two reference electrodes, i.e., common mode sense (CMS) and driven right leg (DRL). Driver EL was calculated using the power spectral density (PSD) relationship β/(α + θ) established by Pope et al. and Prinzel III et al. to assess alertness and engagement, mental effort, and attention investment; with theta band (4–8 Hz), alpha band (8–13 Hz): linked to mental workload, and cognitive fatigue, and beta band (13–22 Hz): linked to visual attention and short-term memory (Pope et al. 1995; Prinzel 2001).

Fig. 4
figure 4

EEG electrode positions used (yellow) (Neuroelectrics 2019)

Data were first system time-synchronized to identify critical analysis zones using MATLAB. The EEG data were isolated using unique identifiers and split into 10-s events. This resulted in approximately twenty 10-s events for each task with the goal of 20-point estimates of EL per task. Postprocessing was performed using the EEGLAB v14.1.1 toolbox in MATLAB (Delorme and Makeig 2004). A custom MATLAB script was developed to convert data from the native Neuroelectrics-Enobio 8 (.easy) format to EEGLAB (.set). The EEG data were then resampled from their native 500 to 240 Hz to ensure noise and processing time reduction. The dataset was filtered by applying a high-pass filter at 1.6 Hz and a notch filter at 60 Hz. An independent component analysis was performed by identifying and pruning artifacts from blinks and muscle movement to improve signal quality. This was followed by the application of a low-pass filter at 40 Hz. The filtered signal was then split into 2-s epochs with 5% overlap. The Darbeliai plugin in EEGLAB was applied to perform a fast Fourier transformation (FFT) using a 2-s window to generate PSD values for each electrode position. Using β/(α + θ) at PZ, CZ, P3, and P4 electrode locations, a single combined PSD value representing driver EL was obtained every 10-s of the drive. PSD combined values closer to 0 indicate lower EL.

Further, a Polar H10 chest strap was used to monitor drivers heart rate (HR) at 1 Hz as a surrogate measure of mental workload. HR data were manually synchronized with the frames of the driving simulator output. Baseline HR data were collected prior to the start of the drive and HR variability within participants was normalized with respect to the baseline.

Data Preparation and Methods Summary

Due to the vast data collection techniques used in the study design, data fusion is employed to merge all output elements. The data fusion framework used to uniformly synchronize all data variables to 10 Hz is shown in Fig. 5.

Fig. 5
figure 5

Data fusion framework

Figure 6 summarizes the methodological approach with respect to five main stages, i.e., psychometric questionnaires, simulator study, data fusion, driver classification, and the prediction model.

Fig. 6
figure 6

Summary of methodological workflow

Results

Task-Specific Analysis

As each task was designed to capture performance and behavioral variability, changes in mental workload and situation awareness between individual tasks were assessed using a series of repeated measure analysis of variances (ANOVAs) with task 1 as the baseline. The ANOVAs were used to test the null hypothesis that the driver-reported means of the average NASA-TLX and SART scores across the six tasks were similar and to identify the presence of any statistically significant differences, providing quantitative validity for the experimental design in terms of the varying complexity of the attempted driving tasks. The average NASA-TLX scores showed no significant differences (α = 0.05) between the mean scores of task 2 and task 1 (baseline). However, significant differences in scores were observed between the baseline and tasks 3 (F(1, 83) = 4.087, p = 0.046, \({\eta }_{P}^{2}\) = 0.047), 4 (F(1, 83) = 16.298, p < 0.001, \({\eta }_{P}^{2}\) = 0.164), 5 (F(1, 83) = 35.230, p < 0.001, \({\eta }_{P}^{2}\) = 0.298), and 6 (F(1, 83) = 201.257, p < 0.001, \({\eta }_{P}^{2}\) = 0.708). The significant increases in NASA-TLX scores suggest that the developed tasks successfully captured various levels of mental workload, shown in Fig. 7.

Fig. 7
figure 7

Average a NASA-TLX and b SART scores

Similar trends across tasks were observed with the SART scores. No significant differences were observed between the baseline and the mean scores of the task 2 and task 3; however, task 4 (F(1, 83) = 7.448, p = 0.008, \({\eta }_{P}^{2}\) = 0.082), task 5 (F(1, 83) = 7.840, p = 0.006, \({\eta }_{P}^{2}\) = 0.086), and task 6 (F(1, 83) = 26.794, p < 0.001, \({\eta }_{P}^{2}\) = 0.244) showed significant differences. Overall, with greater task complexity an increase in subjective mental workload was observed along with a decrease in situation awareness (Kummetha and Kondyli 2022).

Time-Series Clustering

As tasks were constant between participants, time-series clustering using the Dynamic Time Warping (DTW) algorithm was selected for the analysis. The DTW algorithm provides a robust method of measuring and matching similar time-series datasets, such as trajectory and psychophysiological data in our case. The DTW algorithm also allows us to compare time-series datasets of different lengths, or even with time lag between two pairs of interest, especially useful in driving studies where no two events can be completely identical (Giorgino 2009; Mueen and Keogh 2016). The DTW score (dissimilarity) for the two time-series (D1, D2) can be calculated using Eq. 1. The higher the DTW score the larger the dissimilarity and the closer the score to zero the more identical the observed pattern in the selected variable across the pair of drivers (Durrani and Lee 2019).

$$DTW\left(D1, D2\right)=\frac{1}{K}\sqrt{\sum_{k=1}^{K}{d}_{k},}$$
(1)

where \({d}_{k}\) is the distance in cell k in the shortest path. K is the total number of cells in the shortest path.

The same approach is employed to compare more than two time-series trends. In this study, acceleration (m/s2), steering wheel angle (degrees), headway (m), mental workload as ICA from left eye, and EL, were used for time-series clustering using the hierarchical clustering algorithm with DTW score as the distance metric (Yan et al. 2019). HR was not included as ICA was deemed to be more accurate for measuring the mental workload. Hierarchical clustering was selected due to its ability to provide intuitive visualization of the clustering process and unsupervised identification of cluster numbers (Johnson 1967). Hierarchical clustering was performed with various iterations of the number of clusters (i.e., n = 2 to 5), resulting in an optimal solution consisting of two clusters. The Silhouette coefficient was also calculated for various n values to verify the optimal solution and the best/largest coefficient value (0.33) was obtained using n = 2 (Rousseeuw 1987). Hierarchical clustering does have some limitations associated with sensitivity to the distance metric and inability to handle missing data points, however, with careful selection of the distance metric (i.e., DTW score) the effects were minimized.

The summary of the aggregate metrics for the two clusters established for each task is shown in Table 4. The two established clusters did not exhibit qualities of aggressive driving as evident from the average speeds and headways. Average speeds were within 6 km/h of the posted speed limit while the smallest headway was 68.4 m (2.2 s). Drivers showed distinct properties within these clusters, including differences in self-reported measures of mental workload (NASA-TLX score) and situation awareness (SART score).

Table 4 Cluster properties

Cluster outputs from the six driving tasks were aggregated to individually classify drivers as having moderate or conservative driving styles i.e., if a driver was classified to represent moderate driving style in three or more of the tasks, they were classified as moderate, else conservative. This was necessary because although hierarchical clustering classifies drivers, it is not apparent as to what each cluster represents, and the aggregate cluster properties help differentiate the behavioral composition within them. Figure 8 shows the mean variable spread of the conservative and moderate clusters across the six tasks. Cluster ID 2 in almost all instances has a larger surface spread, with the exception of the speed axis. This surface spread explains the differences between the two clusters and suggests that moderate drivers (i.e., Cluster 1) were younger with less driving experience and on average maintained shorter headway, at higher but safe speeds, and exhibited lower mental workload and higher situation awareness compared to conservative drivers.

Fig. 8
figure 8

Radar charts showing the mean spread across the two cluster ids

The recorded increase in self-reported mental workload and decrease in situation awareness across both driver clusters could be a result of driving under extreme/non-typical conditions (such as Task 6 with the presence of a distraction). Out of 90 drivers, 84 drivers completed all the components of the questionnaires and driving tasks. 20 drivers were classified as overall conservative across all tasks while 64 drivers were classified as moderate.

Logistic Regression

A 70/30 training–testing stratified split was applied to each of the obtained cluster IDs before performing further analysis. Binary logistic regression, shown in Eq. 2, is a form of statistical modeling appropriate for independent binary outcome variables, such as in classification problems to distinguish one class from another. Further, binary logistic regression is capable of handling both continuous and categorical independent variables and provides informative outcomes expressed as odds ratios. In this application, binary logistic regression was performed to distinguish between conservative (indicated by the value 0) and moderate (indicated by the value 1) drivers and predict their driving styles, using the psychometric metric and demographic questionnaires data.

$$\mathrm{log}\left(\frac{p}{1-p}\right)={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\cdots +{\beta }_{n}{X}_{n},$$
(2)

where \(\mathrm{log}()\) is the natural logarithm. \(p\) is the probability of the dependent variable. \({\beta }_{0}\) is the intercept of the constant term. \({\beta }_{1},\dots ,{\beta }_{n}\) are the coefficients assigned to the independent variables \({X}_{1},\dots ,{X}_{n}\)

Training

A total of 10 iterations were performed using the following independent variables from the survey and questionnaires: PANAS (subscales: positive affect, negative affect), CRT, IRI (subscales: perspective taking, fantasy, empathic concern, and personal distress), EAI (subscales: AR, SA, ER, PT, EA), PES, 18-point NFC, and other demographics (i.e., age; gender; annual mileage traveled; crash history; insurance type; traffic violation history; cell phone use while driving; and education level). A balance between over-fitting the model and classification accuracy was achieved by excluding correlated parameters within the questionnaires. The backward stepwise selection was implemented to sequentially remove statistically insignificant variables in the model. Manual checks were performed to ensure that the included independent variables were contributing substantially to the beta coefficient (B) i.e., Exp(B) and 95% confidence intervals deviating considerably from 1. The results of the logistic regression are shown in Table 5.

Table 5 Logistic regression summary

The results from the binary logistic regression indicate that an individual’s annual mileage, CRT score, Age, IRI fantasy (FS), and EAI affective response (AR) can significantly help in predicting their driving style (moderate or conservative) with a training prediction accuracy of 87.7%. Figure 9 shows the scatter plots of the training dataset with respect to the significant variables in the logistic regression model. Results indicate a weak to moderate correlation between Annual Mileage and Age (r = 0.39) and CRT score and EAI AR (r = − 0.32), suggesting no substantial effect of multicollinearity.

Fig. 9
figure 9

Scatter plots and respective Pearson correlation coefficients

Testing

Results from the confusion-matrix of the test sample, shown in Table 6, indicate a higher-than-average classification accuracy of 65.4% and a relatively good recall, precision score, and F1 score, suggesting a good predictive model (especially for moderate drivers). However, for conservative drivers the percent of correct predictions is relatively low at 33%. This may be attributed to factors such as the relatively low sample size of the conservative drivers, selected training–testing data split, and the existence of other non-linear relationships between the variables not accounted for within the binary logistic regression.

Table 6 Confusion matrix

Discussion

An important objective of this research was to investigate driver variability under different roadway/traffic environments and task complexity. This was achieved through the testing of six driving simulator scenarios (Table 3) with multiple levels of task difficulty by varying the traffic, geometry, and control conditions, and presence of distraction. It was found that, as task difficulty increased, mental workload increased and situation awareness decreased, supporting the linear relationship between these variables and ensuring that our data collection methods were appropriate to study these relationships.

Time-series clustering using the hierarchical approach with the DTW score as the distance metric was also performed to classify drivers by utilizing a holistic approach where both performance metrics (acceleration, headway, steering wheel angle) and psychophysiological characteristics (mental workload, situation awareness) were adopted. The two derived clusters (i.e., moderate and conservative) indicate distinct differences in driver age, driving experience, average headway, average speed, and average subjective scores for mental workload and situation awareness, similar to the observations made by Lin et al. (2006) and Manjunatha et al. (2019).

The second modeling approach of binary logistic regression was used to investigate whether we can use offline, static driver information to accurately cluster drivers. The results of the logistic regression indicate that annual driving mileage, CRT score, Age, IRI fantasy score, and EAI affective response score, can be used to predict driving styles with a decent accuracy and precision. The odds ratios (Exp(B)) for the significant variables shown in Table 5 were assessed for more insights into the findings. The odds ratio for annual mileage suggests that an increase in reported mileage increases the chance of being a moderate driver. To be specific, every 1000 miles increase in the reported annual mileage is associated with a 54.7% increase in the odds of being a moderate driver. The CRT score can be used as a measure of intuitive thinking and objective assessment, an important component of executive function, which is crucial for safe driving. The results suggest that an increase in the CRT score (indicating higher level of objective assessment) by one point decreases the odds of being a moderate driver by 60.7%. Similarly, for every 1-year increase in age, the odds of being classified as a moderate driver decreases by 10.4%.

The IRI FS score, which represents an individual’s tendency to transpose into imaginative feeling and characters, was also found to be significant. A one-point increase in IRI FS score decreases the odds of being classified as a moderate driver by up to 14%. Although, not directly intuitive to driving, inference can be made from the fact that a high IRI FS score indicates the general tendency to use more cognitive resources engaging in thoughts while driving suggesting a slower behavioral adaptation to changes in conditions, typically associated with decreased driving performance (Davis 1983).

EAI AR, which is a subcomponent of empathy derived from affective response, was also found to be significant. An increase in the EAI AR score by one point suggests up to 28.6% decrease in the likelihood of being classified as moderate driver. This result seems plausible and similar to the past research (Wiesenthal et al. 2000) as an increase in empathy could indicate less vengeance and lead to more conservative driving behavior such as letting motorists cut in front or allowing merging at on/off ramps.

Overall, the results show similarities with past research utilizing other forms of questionnaires with similar subcomponents of personality traits. However, the differences lie within the variety of psychometric and psychophysiological data collected, multi-complexity experimental design, clustering approach, and prediction methods. The combined accuracy of predicting driving style from the logistic regression performed on the training and testing datasets was 76.6%, which is higher than similar studies using non-driving performance metrics for classification (Liao et al. 2022; Razmara et al. 2018).

Conclusions

This study aimed at addressing the shortcomings in driver classification research using a data-driven approach and assess driving tasks performed under varying mental workloads. A simulator study was carefully designed to achieve variability in complexity between individual tasks. Ninety drivers were recruited to participate in both the subjective and driving components of this research. The main goals of this research were achieved by successfully performing data fusion of psychophysiological and driving performance. Time-series clustering using the DTW score as the distance metric within the hierarchical clustering was performed to provide a more accurate and holistic estimate of the ground truth for behavioral classification.

Evaluating the use of pre-driving psychometric questionnaires to determine the most suitable metrics for predicting driving style outside of the automobile suggest promising results. Binary logistic regression summary indicates that an individual’s annual driving mileage, CRT score, Age, IRI fantasy score, and EAI affective response score, can be directly attributed to identifying their driving style. The results provide a promising outlook towards the practical strength of using psychometric evaluators in driver profiling with respect to personality, attitudes, and habits, risk assessment to develop targeted interventions for road safety improvement, deeper understanding of driver behavior and decision making processes that affect the conscious and subconscious components of driving, development of training and education programs targeted towards pre-screened individuals, and design/development of intelligent and autonomous transport systems to tailor behavioral needs and preferences of individual drivers.

Although the research goals were achieved, the limitations of simulator-based driving data should be acknowledged. However, the use of complex physiological equipment such as EEG and presence of hazardous conditions (in-vehicle distractions) might not be feasible for real-world testing. A larger sample size would provide more concrete evidence of the observed findings. Lastly, large sample sizes are also needed to investigate the attributes of the aggressive driving style cohort, since the existing dataset only captured moderate and conservative driving styles.

Based on the conclusions derived from this study, future research can utilize a similar methodology to study how psychometric evaluators can be used to predict driver behavior during partial (i.e., adaptive cruise control and lane centering) and full automation. Another aspect to explore would be utilizing psychometric evaluators to subjectively predict individual driving styles during travel-altering emergencies such as pandemics (i.e., coronavirus disease 2019 (COVID-19) travel restrictions/lockdowns) and disaster evacuations (i.e., hurricanes, wildfires, floods, and other types of severe weather).