Introduction

The COVID-19 pandemic has affected mobility patterns since December 2019 and continues incessantly for more than 2 years since the beginning (Zhu et al. 2020). Right from the beginning, many countries around the world imposed strict measures, such as lockdowns and suspension of all non-essential movements, to reduce human activity, which contributes to the spread of the pandemic.

In this direction, the existing literature seeks to explore the dynamics of the pandemic in several countries around the world to understand the impact that COVID-19 had on the transport sector (Sharifi and Reza Khavarian-Garmsir 2020). As expected, the restriction measures affected typical patterns of travel activities and mobility in urban regions across the world (Kim 2021). It has been demonstrated that following the restrictive measures taken by governments to restrict the spread of the disease, an unprecedented decline in traffic volumes has been identified (Aletta et al. 2020; Katrakazas et al. 2020). For example, in the Netherlands, people reduced their outdoor activities due to the pandemic, leading to a decrease in the total number of trips and a reduction in distance traveled, with an increase in the proportion of people working from home (de Haas et al. 2020). Existing studies have also shown that there was a major change in the choice of transport mode, especially during the first pandemic wave, and consequently, a change in the number of car-driven volumes was observed (Bucsky 2020). Also, a more recent study revealed an asymmetrical impact of COVID-19 related policies on the usage of on-demand mobility services, more specifically, the COVID-19 mobility and business restrictions triggered an intense decrease of ridership, but interestingly when lifting these restrictions did not result in a fast rebound (Lei and Ukkusuri 2022). 

In the context of road safety, during the COVID-19 lockdown measures, the number of road collisions, injuries, and fatalities have significantly decreased, especially during the first lockdown period. This has been documented, for example, in particular, in the Spanish province of Tarragona, where a sharp decrease in traffic crashes was revealed (Saladié et al. 2020). Similarly, Carter (Carter 2020) showed that during the first COVID-19 period (i.e., from March 15, 2020 to May 16, 2020), the total number of crashes in North Carolina decreased by half, fatalities decreased by 10%, and serious injuries increased by 6%, compared to the pre-closure baseline. A relevant study (Shilling and Waetjen 2020) indicated that all injury and fatal traffic crashes decreased on state highways and rural roads in California. Nevertheless, a study that used time-series to predict the road collisions, injuries, and fatalities that would have been observed without the existence of the COVID-19 pandemic made clear that the reduction of fatalities and injuries was disproportionate, taking into account the reduction in traffic volumes (Sekadakis et al. 2021).

Driving behavior has also changed during the pandemic as reported by recent studies (Katrakazas et al. 2020, 2021; Michelaraki et al. 2021). For example, according to the study by Katrakazas et al. (2020), which exploited driving data from the first lockdown period in Greece and Saudi Arabia, increased driving speed (6–11%) was observed, along with more frequent harsh accelerations and brakings per distance. Nevertheless, very few studies investigated driver behavior in more depth by analyzing and modeling naturalistic driving data. Katrakazas et al. (2021) quantified the impact of the pandemic COVID-19 on driving behavior using Seasonal Autoregressive Integrated Moving Average (SARIMA) time-series modeling. The results showed that the observed values of three indicators of driving behavior (i.e., average speed, speeding, and harsh braking events per 100 km) were higher than the predicted values based on the corresponding observations before the first lockdown period in Greece.

In this direction, the current study aims to identify and investigate the most significant factors in the entire 2020 that influenced the relationship between the COVID-19 pandemic metrics (i.e., COVID-19 cases, fatalities, and reproduction rate) and restrictions (i.e., stringency index and lockdown measures) with driving behavior. For this purpose, naturalistic driving data for a 12-month timeframe were exploited and analyzed. The examined driving behavior variables were harsh acceleration and harsh braking event rates concerning a time period before, during, and after the lockdown measures in Greece. The motivation is to cover the literature gap by giving insights into these two driving behavior indicators and how they influenced driving behavior for the entire year of 2020. A cross-lockdown comparison was also provided and gives insights into how the indicators varied across the examined conditions (i.e., no restrictions, 1st lockdown, and 2nd lockdown).

The paper structure is presented briefly: after the introduction, the methodology is described and includes the overview of the obtained dataset for this study, descriptive statistics of the examined variables, COVID-19 restriction measures, and the chosen ML technique background are presented. Then, the analysis results are provided for both harsh acceleration and harsh braking event rates. Finally, the main findings and conclusions are discussed, along with recommendations for further research.

Methodology

Data

Data Overview

To correlate driving behavior with COVID-19 metrics and restrictions, both driving behavior data and data concerned with the COVID-19 cases evolution and the restrictions imposed were used. OSeven Telematics (oseven.io) contributed driving behavior data in the form of a randomized dataset, comprising naturalistic driving trips extracted from the OSeven database. Data were provided in a pre-processed and randomized form, and the specific randomization techniques used were unknown to authors. It should be noted that the microscopic trip data used referred to the users of the OSeven smartphone application and not the entire population of Greece. The time span of the database was from 01/01/2020 to 31/12/2020 and included approximately 305,000 trips, which were carried out throughout Greece. The aforementioned 1-year dataset contains data before, during, and after the first case of COVID-19 in Greece (i.e., 26/02/2020) and the imposition of two lockdowns for non-essential movements.

For each trip completed, a large amount of driving data was recorded, including data from the mobile phone such as Accelerometer, Gyroscope, Magnetometer, and the GPS (speed, course, longitude, and latitude). Furthermore, the data provided include yaw, pitch, roll, linear acceleration, and gravity. After the end of each trip, the application is transmitting all data recorded to the central database of the OSeven back-end office via an appropriate communication channel, such as a Wi-Fi network or a 4G/5G cellular network. The total volume of data transmitted by an average driver was estimated at around 50 Mb/month in 2019 (Papadimitriou et al. 2019). The OSeven platform has clear privacy policy statements and follows strict information security procedures, in compliance with the General Data Protection Regulation (GDPR) and related EU directives. Therefore, all data have been provided by OSeven in a completely anonymized format and no geolocation information for the trips (apart from the related country) has been included in the dataset. No examination or analysis based on any demographic or personal characteristics of the examined sample was possible. As a consequence, this study retains a scope of macroscopic examination of driver behavior, considering the trips produced by the drivers collectively. A similar dataset was utilized in the previous analysis by the authors Katrakazas et al. (2020, 2021). Privacy policy statements cover the type of data that are collected, the reason why they are collected for, the time that they are stored, and the measures that have been taken to protect them based on encryption standards for data in transit and at rest. OSeven technology has already been accepted and approved by several national authorities and compliance officers of multinational brands, and it complies with the national regulation in the EU and all around the world.

Five driving behavior variables [i.e., harsh accelerations (HA)/100 km, harsh brakings (HB)/100 km, mobile use/driving time, driving during risky hours, and distance] were exploited from the OSeven dataset and their description can be found in Table 1. These variables were chosen as they have been found to be critical surrogate safety measures and have been extensively used in recent literature concerned with driving behavior, crash risk, as well as road safety during the pandemic (e.g., Kontaxi et al. 2021; Papadimitriou et al. 2019; Sekadakis et al. 2022).

Table 1 Variables units, description, and source

To understand how the pandemic evolved in Greece during 2020, data from the databases of “(Our World in Data 2020)” website (Our World in Data—OWD 2020) were exploited to capture the daily evolution of critical COVID-19 metrics, such as new cases, new fatalities, and the COVID-19 reproduction rate of the pandemic, which rates the ability of the coronavirus to spread. Furthermore, to take into account the restrictions imposed by the Greek government, the Stringency Index, by Oxford University, and their COVID-19 government response tracker (Hale et al. 2020, 2021) were used. Specifically, the stringency index ranges between 0 and 100 and represents the strictness of government responses to the pandemic. The stringency index is a composite measure based on nine response indicators (i.e., school closing, workplace closing, cancel public events, restrictions on gatherings, close public transport, stay-at-home requirements, restrictions on internal movements, international travel controls, and public information campaigns) rescaled to a value from 0 to 100 (i.e., 100 = strictest response).

Finally, to include traffic exposure data, the mobility data reports from Apple (Apple 2020) were used. As usually traffic data acquisition from national authorities requires additional time, it was chosen to use the Apple mobility report data as a proxy of traffic exposure in the study area. Similar data have been utilized in the previous work regarding the COVID-19 pandemic and driving behavior (Katrakazas et al. 2020). The driving requests from the Apple mobility reports are a surrogate measurement of traffic mobility. The aggregated data were collected from Apple Maps and show the mobility trends for major cities and several countries or regions. The information is generated by aggregating the number of daily driving requests made by Apple Maps users who requested navigation during the pandemic. These requests are expressed by the percentage change compared to a baseline of 100% on January 13th, 2020, a date prior to the pandemic.

All the variables examined in the current paper are summarized in Table 1.

Table 2 presents the descriptive statistics, i.e., mean, standard deviation, maximum value, and minimum values of the investigated variables, for the random subset of trips (305,638 trips). More specifically, 16,927 trips (5.5% of the total) were observed during the 1st lockdown and 42,262 trips (13.8%) during the 2nd. It is worth noting that all the considered variables are continuous. The sample size was different for COVID-19 metrics, measures and mobility compared to driving data as these metrics had daily observations for the entire of 2020. The COVID-19 and mobility datasets derived from OWD, Oxford, and Apple were merged with each trip provided by OSeven into a mutual database for the purpose of the analyses.

Table 2 Descriptive statistics of investigated variables

COVID-19 Restriction Measures

Table 3 summarizes the two lockdown periods of non-essential movements due to the COVID-19 pandemic that have been announced by the Greek government.

Table 3 Lockdown measures and important dates

The two lockdowns of 2020 are included in Fig. 1 in gray shades. Furthermore, the figure illustrates the evolution of driving mobility volumes (i.e., driving requests) through time in relation to COVID-19 new cases, stringency index of measures, and lockdown periods. An initial observation is that driving requests were significantly reduced during both lockdowns. Nevertheless, the greatest reduction in driving requests was during the first lockdown. Moreover, there was a spike of nearly 250% (150% more compared to the baseline of January 13th) in the requests in the first half of August.

Fig. 1
figure 1

Overview of mobility along with COVID-19 restrictions (lockdown and stringency index) and new cases

XGBoost Analysis

For analyzing harsh event rates per trip during the COVID-19 pandemic and extracting the most important factors, Extreme Gradient Boosting (XGBoost) algorithms were deployed. The XGBoost algorithms are supervised machine learning techniques that incorporate multiple Classification and Regression Trees (CART). XGBoost has regularly outperformed other approaches due to its versatility and efficiency (Nielsen 2016). XGBoost is used for supervised learning problems, where the training data (with multiple features) xi is used to predict a target variable yi (XGBoost Documentation 2021). Specifically, XGBoost algorithms were deployed in this paper to evaluate the feature importance of the aforementioned variables, i.e., driving requests, COVID-19 metrics, and restrictions in regards to the naturalistic driving behavior indicators. The naturalistic driving behavior indicators utilized in the study were the frequency of harsh events in the form of rates (i.e., harsh brakings and harsh acceleration per 100 km of driving). The learning process of the algorithm is iterative and consequently involves correcting previous errors in future iterations of the algorithm. A detailed overview of the comprised parameters and technical specifications of the algorithm in the used library can be found in XGBoost Documentation (2021).

XGBoost analysis has been used in past road safety publications which analyzed big data (Chakraborty et al. 2023; Formosa et al. 2023) and was used in the current analysis because it has been shown to be superior in accuracy compared to logistic regression models or even other ML methods such as Random Forests, Artificial Neural Networks, and Support Vector Machines in the area of traffic safety (Huang and Meng 2019)

Moreover, XGBoost has been successfully used in analyses investigating the influence of COVID-19 on driving behavior (Katrakazas et al. 2021) or the prediction of the transmission of the disease itself (Luo et al. 2021). Therefore, XGBoost has been selected as an appropriate method to reveal the factors of COVID-19 that affect driver behavior most critically.

In addition, the XGBoost algorithms have the capability to calculate the importance of each predictor variable in the developed model. In the XGBoost algorithm, the following three variable importance metrics were extracted (XGBoost Documentation 2021). These variable importance metrics are used by the XGBoost algorithms in the analysis to show which variables are informative in describing the driving behavior indicators (HA and HB /100 km):

  • Gain describes the enhancement in accuracy that a feature adds to its branches.

  • Cover describes the relative amount of observations (or the number of samples) concerned by a feature.

  • Frequency describes how often a feature is used in all generated trees.

The gain metric is used for feature importance interpretation in the analysis.

XGBoost Parameters

The XGBoost algorithm was run in an R-Studio environment using the xgboost package. Before running the algorithm, all the outliers were identified and then removed from the dataset, creating a clean undistorted set for analysis. Then, a random split was employed in the data; 75% was the training set, while the remaining 25% was the test set. Furthermore, multiple values in terms of learning rate (eta) were tested (0.01–0.3) for each XGBoost ensemble for extracting the optimal model for harsh event rates. Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function (Murphy 2012).

Additionally, K-fold cross validation was conducted to find the number of the best iteration within the XGBoost algorithm; preventing the model from overfitting. For each model, the function of K-fold cross validation tested about 200 different iterations to conclude the optimal iteration.

The defined parameters for the XGBoost model for harsh event rates are provided as follows:

  • Learning rate (eta): 0.01–0.3

  • Gamma: 1

  • Maximum tree depth: 6

  • Subsample ratio of the training instances: 0.8

  • Subsample ratio of columns when constructing each tree: 0.5.

An explanation of the different parameters is given below, apart from the learning rate, which was described earlier.

  • Gamma is the minimum loss reduction required to make a split in a tree. A higher gamma value makes the algorithm more conservative, preventing overfitting by reducing the number of splits.

  • Maximum depth of a tree is the maximum depth of the tree used in XGBoost. A deeper tree can capture more complex relationships in the data.

  • Subsample ratio of the training instances: It is the fraction of observations to be randomly sampled for each tree. A lower value can make the model more robust to noise.

  • Subsample ratio of columns when constructing each tree: It is the fraction of columns to be randomly sampled for each tree. Similarly to the previous ratio, a lower value can make the model more robust to noise.

Model Evaluation

Model evaluation metrics were utilized to assess the predictive performance of the algorithm on the test set using the three metrics (i.e., Mean Error, Root-Mean-Squared Error, and Mean Absolute Error) indicated below, as a common practice. The \({e}_{t}\) represents the error, i.e., \({\mathrm{actual}}_{t}-{\mathrm{predicted}}_{t}\), and N is the number of fitted points:

  • Mean Error (ME):

    $$ME=\frac{1}{N} \sum_{i=1}^{N}{e}_{t}.$$
    (1)
  • Root-Mean-Squared Error (RMSE):

    $$RMSE=\sqrt[2]{\frac{1}{N} \sum_{i=1}^{N}{{e}_{t}}^{2 }}.$$
    (2)
  • Mean Absolute Error (MAE):

    $$MAE=\frac{1}{N} \sum_{i=1}^{N}|{e}_{t}|.$$
    (3)

Analysis and Results

Harsh Acceleration Events

In this subsection, the results of the XGBoost model for Harsh Accelerations (HA) per 100 km are presented. First, the predictive power and accuracy provided by the application of the XGBoost algorithms on the test subset can be extracted by the achieved error. Table 4 presents the obtained error values.

Table 4 Error values on test predictions

The obtained feature importance is provided in Table 5. The top three variables that impacted HA/100 km were distance, mobile use/driving time, and driving requests. Also, a small contribution was provided by driving during risky night-time hours. With regards to the COVID-19 metrics, new cases in Greece seem to affect HA the most. The reproduction rate of the virus, as well as the strictness of measures and the new rates of fatalities due to COVID-19, were found to have less impact on HA/100 km. The detailed influence of predicting harsh event rates as expressed by the gain scores of XGBoost is shown in Table 5.

Table 5 Feature importance of HA/100 km—XGBoost algorithms

Furthermore, boxplots were created supplementary to XGBoost to reveal the trend of harsh accelerations under the three aforementioned restriction measures of 2020 (i.e., 1st lockdown, 2nd lockdown, and the time period without restrictions). These boxplots are demonstrated in Fig. 2. The boxplots show the median, interquartile range, minimum, and maximum values of completion time for each measure. Figure 2a, presents the boxplot for harsh accelerations including the whole dataset. As can be seen in the boxplot, the median values for each condition are equal to zero. On contrary to the non-zero mean value of harsh accelerations in Table 2, zero median values are extracted due to the fact that the values are not normally distributed and many of the trips recorded zero frequency in terms of harsh acceleration per distance, and thus, the lower quartile (25th percentile) is equal to the median as well. Consequently, an additional boxplot in Fig. 2b was created by excluding the zero values of the dataset. Hence, this boxplot presents only the trips with harsh event occurrence since trips with zero harsh accelerations were excluded. As shown in Fig. 2b, the 2nd lockdown in Greece had a narrower interquartile range than the 1st lockdown and the period without restrictions. This means that the upper quartile (i.e., 75th percentile) of the 2nd lockdown is lower than the other conditions. Moreover, by investigating the interquartile range of the boxplot the range of harsh events can be depicted and thus concluded the different patterns among different restrictive conditions. In this respect, the lower upper quartile of the 2nd lockdown reveals that the majority of the observed values (between the 25th and 75th percentile) had lower values and range than the other restrictive conditions. Also, the 1st lockdown has a higher upper quartile compared to without restrictions and the 2nd lockdown. The highest median was observed at the 1st lockdown, then at 2nd, and then without restrictions.

Fig. 2
figure 2

a Harsh accelerations/100 km under different restriction measures. b Harsh accelerations/100 km under different restriction measures by excluding zero values

Harsh Braking Events

In this subsection, the results for Harsh Brakings (HB)/100 km are presented. Table 6 presents the obtained error values for this model.

Table 6 Errors on test predictions

The obtained feature importance is provided in Table 7. Similar to the harsh accelerations model, the top three variables that impacted HB the most were; distance, mobile use/driving time, and the number of driving requests by Apple. A small contribution was also provided by driving during risky night-time hours. However, the COVID-19-related variable that influenced HB in Greece the most was different than the HA model and in this case was COVID-19 Reproduction Rate. Other COVID-19-related variables that influenced the frequency of harsh brakings in Greece were new COVID-19 Cases, the Stringency Index, and the number of new COVID-19 fatalities.

Table 7 Feature importance of HB/100 km—XGBoost algorithms

Similar to the model for harsh accelerations, Fig. 3a presents the corresponding boxplot for harsh brakings including the entire dataset. As can be seen in this boxplot, the highest median value was observed during the 1st lockdown. Then, the conditions without restrictions follow and it is noteworthy that the median for the 2nd lockdown equals zero. The box of the 2nd lockdown in Greece has a narrower interquartile range than the other conditions (i.e., 1st lockdown and without restrictions). This means that the upper quartile of the 2nd lockdown is lower than the other conditions. In this respect, the lower upper quartile reveals that the majority of the observed harsh braking event values (between the 25th and 75th percentile) had lower values and range than the other restrictive conditions. Also, it is shown that the 1st lockdown has a higher upper quartile compared to the period without restrictions and the 2nd lockdown. In Fig. 3b, similarly to the HA model, the highest median was observed at the 1st lockdown, followed by the 2nd, and the period without restrictions.

Fig. 3
figure 3

a Harsh brakings/100 km under different restriction measures. b Harsh brakings/100 km under different restriction measures by excluding zero values

Discussion

The current paper aims to identify and investigate the most significant factors that influenced driving behavior during 2020, a year that behavior was heavily influenced by the COVID-19 pandemic. Both COVID-19 metrics (i.e., COVID-19 cases, fatalities, and reproduction rate) and restrictions (i.e., stringency index and lockdown measures) were taken into account to identify their relationship with driving behavior. The XGBoost algorithm was chosen as the analysis method, and the results suggest a strong correlation between COVID-19 metrics and restriction measures with driving behavior. Furthermore, different patterns were revealed for both harsh event rates among three examined conditions, i.e., without restrictions, 1st lockdown, and 2nd lockdown.

Modeling results demonstrated that there are three common crucial factors that affected the frequency of HA and HB event rates the most during the pandemic. These factors were distance, mobile use/driving time, and driving requests (requested in Apple Maps). More specifically, trip distance and mobile use duration were the two most important factors out of the eight examined variables that influence HA and HB. Τrip distance had a great impact on HA and HB events probably due to the fact that the longer trips were driven on highways and rural roads than trips within the urban environment. Hence, the change in road type probably influences the braking and acceleration patterns of drivers. Another causal factor for the correlation between harsh events and duration was the increasing fatigue by increasing the trip distance. However, these assumptions need further research to be validated. Additionally, mobile phone use shows the importance of drivers being undistracted to avoid generating HA and HB events. Along with trip duration and mobile phone use, driving requests also had a crucial effect on harsh events. These driving requests are a surrogate driving exposure measurement and an indication of the prevailing traffic volumes. This finding reveals the relation between this exposure measurement with HA and HB events. A higher value of exposure indicates a greater density of traffic, and by extension, it changes the probabilities of the driver being involved in a harsh event for instance with more dense surrounding traffic. A small contribution to HA and HB was also provided by driving during risky night-time hours, indicating that there was a higher occurrence chance for events during night-time driving (00:00–05:00) due to the lighting conditions themselves as well as them probably being affected by the restrictions imposed by the Greek government during the nighttime and essentially reduced trips during risky hours (Katrakazas et al. 2021).

Additionally, four COVID-19-related variables were found to impact HA and HB event rates. New COVID-19 cases in Greece were found to prevail compared to other COVID-19-related variables in terms of their effect on HA events. COVID-19 Reproduction Rate was found on the other hand to influence HB events the most. The most influential pandemic-related factors for HA and HB events in Greece were COVID-19 Reproduction Rate, Stringency Index, and New COVID-19 Fatalities and Cases. This is in line with the existing literature. For example, the studies of (Dong et al. 2022; Lee et al. 2020; Vanlaar et al. 2021) found that COVID-19 restrictions negatively affected risky driving behaviors such as speeding, and distracted driving.

With regards to traffic exposure during 2020, it can be concluded from Fig. 1 that driving requests were significantly decreased during both lockdowns compared to the baseline of no restrictions. The greatest reduction was observed in the first lockdown compared to the second. This means that the traffic volume during the 1st lockdown was lower than in the other conditions (i.e., during the 2nd lockdown, and without restrictions). Hence, with fewer vehicles ahead, the drivers could accelerate more easily and this can be revealed in Fig. 2a, where the upper quartile was higher than in other conditions. Additionally, in Fig. 2b for trips with harsh accelerations occurrence, the median was higher during the 1st lockdown than the other conditions (i.e., during 2nd lockdown and without restrictions), meaning that the HA events were more frequent. This finding can be related partly to speeding, and an increase was revealed in the spatial extent of speeding, and in the level of speeding as well as statistically significant differences in speeding before and after the COVID-19 outbreak (Lee et al. 2020). With regards to the 2nd lockdown, for trips with harsh accelerations, the median was higher compared to conditions without restrictions, as a result of the decreased traffic volume but not at the same magnitude as the 1st lockdown, in which the traffic volume was much lower.

With regards to HB events, in Fig. 3a, again, the upper quartile is greater during the 1st lockdown than other conditions (i.e., during 2nd lockdown, and without restrictions), and combining Fig. 3b, the median is higher for trips with HB occurrence and this implies that the HB events were more frequent. This finding is also consistent with the literature (Katrakazas et al. 2020). This can be explained, for instance, as the traffic volume during the 1st lockdown was lower than the other conditions, and hence, with fewer vehicles ahead, the drivers could maintain higher speeds, as stated in (Katrakazas et al. 2020). With higher speeds, the drivers were more probable to be involved in a harsh braking event with potential traffic obstacles ahead (i.e., pedestrians, bikes, scooters, and traffic control signs or signals), especially during the lockdowns that the active transport was increased (Linares-Rendón and Garrido-Cumbrera 2021). With regards to the 2nd lockdown following the same logic as HA, for trips with harsh brakings, the median was higher compared to no restrictions as a result of the decreased traffic volume but not the same magnitude as the 1st in which the traffic volume was lower.

The results of the exploratory analysis by XGBoost indicate a correlation of COVID-19 metrics and restrictive measures with harsh brakings and accelerations. This correlation could be explained as COVID-19 metrics and restriction measures obliged commuters to stay at home. Consequently, the stay-at-home restrictions led to a decreased traffic volume, this can be validated by Fig. 1, and thus, the traffic volumes affected directly driving behavior. This phenomenon can substantiate why driving requests are a more important factor in the analysis than COVID-19-related variables. The XGBoost analysis yields valuable insights, and when integrated with descriptive findings (as demonstrated in Figs. 2 and 3), it serves to reinforce and substantiate the observed changes during various lockdown periods. One of the most significant impacts identified is the considerable decrease in driving requests, which emerges as the primary contributing factor during the COVID-19 lockdowns. Simultaneously, COVID-19 metrics and restrictive measures also emerge as crucial factors influencing the frequency of events. By combining XGBoost results with descriptive data, a comprehensive understanding of the dynamics behind the observed changes is achieved, shedding light on the central role of reduced driving requests and the relevance of COVID-19 metrics and restrictive measures in shaping the outcomes.

It should be noted that data on road traffic crashes, fatalities, and severe and slight injuries were not available in the dataset. Nevertheless, additional insights into the impact of the COVID-19 pandemic and lockdowns on crashes can be found in Sekadakis et al. (2021) and in Katrakazas et al. (2021). Crash data from 2020 (e.g., delivered from Hellenic Statistical Authority) showed that there was a decrease in absolute numbers of crashes, fatalities, and injuries. Furthermore, results indicated that driving performance was found to be more careless and riskier overall during the lockdown period. These findings can be supported by the previous studies in which it was found that lower vehicle traffic volumes and empty roads led to higher speeds and harsh events (Carter 2020).

Nevertheless, this work is not without shortcomings, and therefore, future research could focus on covering the remaining gaps that this work did not cover. Initially, future studies could concentrate on more sophisticated models, such as deep neural networks, e.g., Convolutional Neural Networks (CNNs) or Artificial Neural Networks (ANNs), which probably can accomplish lower errors and give more insights into driving behavior variables. In addition, more variables with regards to driving behavior, i.e., speeding, speeding duration, and speed, could be exploited using the same method to give in the same context results. These variables were tested but they led to models with large errors and, therefore, were not included in this work. Finally, it is important to note that telematics data can be separated by road environment type (e.g., highway, rural, and urban roads) as was done in Papadimitriou et al. (2019). Subsequent analyses for the influence of COVID-19 could therefore be conducted from an infrastructure-based scope (rather than the present trip-based scope), where additional data with geolocation and road type information could also enhance the current methodology. This could enable spatial analyses of the examined variables (such as in Ziakopoulos et al. 2022).

Conclusions

The present paper aims at identifying the most important factors that influenced driving behavior during 2020, a year during which driver behavior was heavily influenced by the COVID-19 pandemic. To accomplish this aim, data processing and analysis were conducted with the intent to transform data from several sources to a common reference framework. In particular, several databases were exploited: (i) A naturalistic driving database for a 12-month timeframe obtained by OSeven Telematics, (ii) COVID-19 pandemic metrics from “Our World in Data”, (iii) governmental response data described as a Stringency Index, by Oxford University, and (iv) traffic exposure data in the form of mobility data reports from Apple. The data were subsequently transformed and merged on a trip-level basis in an attempt to gauge the specific traffic environment and to detect variables from all datasets that influence driving behavior correlations.

The examined driving behavior variables were HA and HB events per 100 km concerning the time period before, during, and after the imposition of lockdown measures in Greece in 2020, i.e., the first year of the COVID-19 pandemic. The naturalistic driving data were extracted by a specially developed smartphone application and were transmitted to a back-end telematic platform from OSeven Telematics. The top three variables that influenced HA and HB event rates the most were distance, mobile use/driving time, and driving requests (as requested in Apple Maps). Focusing on the COVID-19-related variables, this study identified that the most significant factors in the entire 2020 were new COVID-19 cases, new COVID-19 fatalities, COVID-19 reproduction rate, as well as stringency index.

The results of the exploratory analysis by XGBoost indicate a correlation of COVID-19 metrics and restriction measures with harsh brakings and accelerations. As mentioned previously, COVID-19 restrictions affected commuters by forcing them to stay at home, and subsequently with lower traffic volumes; driving behavior was affected. Furthermore, for all the investigated three conditions, i.e., no restrictions, 1st lockdown, and 2nd lockdown, different HA and HB event rate patterns were revealed. HA and HB events were more frequent and with higher values range during the 1st and then 2nd lockdown compared to non-restrictive conditions for trips with harsh events occurrence due to their correlation with driving exposure measurements (i.e., Apple driving requests). Moreover, it was found that HA and HB event rates were also affected by risky night-time hours, indicating that there was a change in events during night-time driving (00:00–05:00) due to the lighting conditions themselves as well as probably affected due to the prohibitions imposed by the Greek government during the nighttime and essentially reduced trips during risky hours.