Keywords

1 Cooperative Driving

The intensive research and development of the last decades on driver assistance systems and automated vehicles already enables a synthesis of assistance and automation systems. And we are seeing the first automated vehicles [28] in real road traffic (in Germany), although still in limited operational design domains (ODD), driving certain sections of the highway in a highly automated mode (SAE level 3, SAE International Standard J3016 2021). During active automation, the driver may turn to non-driving related tasks, but must still be able to take over the driving task at any time. However, increasing vehicle automation also raises the question of how the cooperation and interaction between driver and vehicle should be designed in the future.

An initial idea in the early days of assistance systems and automation was to take over either longitudinal or lateral driving. By switching a function on or off, the driver decides which task is to be taken over, i.e. either lateral (LKAS) or longitudinal control (ACC). Afterwards, the driver cannot intervene further in the execution of the task, except to deactivate the automation. However, this form of interaction was already identified as complicated and disadvantageous by Schieben et al. [35] and Hoeger et al. [27].

An approach that goes beyond this black and white view of automation is shared and cooperative control. This can be explained with the H(orse)-Metaphor. The H-Metaphor describes how a driver and a highly automated vehicle cooperate similar to a rider and a horse, sharing and trading control in an assistance and automation scale of assisted, partially and highly automated driving. This insight was the starting point for the invention of highly automated driving (e.g. [13], Hoeger et al. [27]), for the German Federal Highway Research Institute (BASt) [16] and later on for the SAE levels of automation [34]. In a more general way, this parallels the thought of Christoffersen and Woods [2], who proposed based on ideas like assistant systems, e.g. Flemisch and Onken [11], to design an automation as a team player.

The concept of cooperative highly automated driving describes that the cooperation increases with a mutual understanding between the human and the co-system e.g. regarding the abilities of the partner and the distribution of control (e.g. Hoeger et al. [27, 15]).

A basic requirement for the co-system is to detect and understand the status of the driver and to use this information in order to balance between the driver and the co-system, e.g. by trading control towards the partner who still has the ability to control the vehicle. This allows harmonizing the driving strategies of the two agents (co-system and driver) into a common strategy [29]. Griffiths and Gillespie [18] use the term shared control to describe that the driver as well as the automation can have control over the vehicle at the same time. Flemish et al. [5] describe more precisely a design space of cooperative control, which combines shared and traded control.

A concept that includes shared control but has a wider scope can be referred to as cooperative automation and cooperative guidance and control (e.g. [4, 6]). Cooperation in this context implies working together towards the same goal. Cooperative automation is mainly understood as the cooperation between vehicles, e.g. Stiller et al. [39], Völker et al. [44]. However, cooperation can also be applied to the cooperation between the driver/operator and the automated driving system, as hinted already by Onken [32], Schulte [37] and Flemisch [9]. This driver-vehicle-cooperation requires a common mental model about the capabilities and limits of the automation and the driver [14]. In highly automated driving, the ability of the automation or co-system to observe and assess the abilities of the human partner has been an integral aspect of the concept of cooperative driving from the very beginning, e.g. in the EU-HAVEit project (Hoeger et al. 27).

One possible implementation of such an embedded mental model of the capabilities of cooperation partners is the concept of confidence horizons (Flemisch et al. [5], based on Flemisch et al. [10], Herzberger et al. [26], Usai et al. [43]). In this concept, the capabilities of the driver are continuously compared with those of the automated subsystem, resulting in two horizons: First, the confidence of the technical subsystem in its own ability to safely control the vehicle, and second, the confidence of the technical subsystem in the driver’s ability to take over the vehicle control. With that, it is possible to quickly identify whether transitions between different levels of automation are safe, whether there is a balanced distribution of control, and whether, when, and how a maneuver with minimal risk might be required. Figure 1 depicts the Confidence Horizon concept.

Fig. 1
An illustration illustrates the concept of the confidence horizon in autonomous driving. Confidence horizon, safety buffer, partially conditionally automated, and highly or fully automated are labeled, along with distances up to which automation and humans can take over control.

Confidence horizon concept with an example of a potential safety buffer (Flemisch et al. [5], based on Flemisch et al. [10] Herzberger et al. [26], Usai et al. [43])

A fundamental cornerstone of a dynamic balance between driver and co-system, here within the concept of confidence horizon, is the assessment of the driver’s (takeover) capabilities, which is addressed in the following section. A more detailed description of driver assessment can be found in the dissertation by Herzberger [23], which was written as part of the Priority Program Cooperative Interacting Automobiles (CoInCar) of the German Science Foundation DFG.

2 Driver Monitoring—State of the Art

In their meta-study on driver state monitoring systems (DSMS), Hecht et al. [20] point out that there is currently no commonly used definition of the term driver state. However, not only the basic definition, but also the possible states differ greatly: For example, Rauch’s [33] model focuses on vigilance and drowsiness, Marberger et al. [30] model focuses on drivers’ understanding of presented information during transitions, and Herzberger’s [23] model focuses on drivers’ assessment of driving performance. The majority of research on the classification of possible states focuses on the following constructs, or excerpts thereof: situational awareness, attention, stress, fatigue, strain, and confidence in automation (e.g., Heikoop et al. [21], or Guettas et al. [19]).

In this context, involvement in the driving task as well as the associated awareness for relevant information (ARI) plays a role in the assessment of takeover quality [17]. This ARI concept is also followed by the definition of Herzberger et al. [22]. Regardless of which definition for potential driver states is followed, it will be essential to have a reliable detection of those operator states by the technical system in order to avoid handing over the driving task in critical situations to drivers who are not ready for takeovers. In the following, therefore, an overview of the state of the art of current DSMS as well as current research approaches will be given.

In the past, vehicle manufacturers have mainly focused on drowsiness detection and the suitable warning. Most systems have focused on monitoring the driver’s steering behavior and concluded that a change in steering behavior, such as jerkiness, indicates a change in vigilance, e.g., the “Drowsiness Detection System” (Volkswagen AG). The detection of steering behavior is often additionally coupled with lane departure detection systems that register deviations from the zero line, such as “Attention Assist” (Daimler AG). In the case of newer, SAE level 2-capable vehicles with traffic jam assistant system, the vehicles sometimes also drive independently for several minutes. Here, however, the attention checks by the systems differ greatly: Some systems allow longer subsequent periods of driving in traffic jams without deactivation if hands are always detected to be on the steering wheel, e.g. “Traffic Jam Assist” (Audi AG). Other systems, such as “Driving Assistant Professional” (BMW AG) or “Blue Cruise” (Ford Motor Company), enable several-minutes periods of driving without hand contact with the steering wheel provided that the driver’s gaze is always directed on the road. For this purpose, the gaze is monitored by a camera system, e.g., above the instrument cluster. This system enables a warning and deactivation if the driver turns away from the driving task, since in SAE level 2 the driving task must be permanently monitored despite activated assistance systems and the responsibility lies with the driver ([23, 34]).

In addition to systems that detect the direction of the driver’s gaze, a great number of research projects are also focusing on different systems for measuring physiological parameters, such as electrocardiography (ECG), photoplethysmography (PPG), electroencephalography (EEG) or the measurement of electrodermal activity (EDA, also known as skin conductivity measurement). Sensors are usually attached to the driver’s body to record the signals most accurately. In addition, non-invasive methods are also being researched, where the sensors are built into the steering wheel rim, or the seat, for example Guettas et al. [19], but these require continuous contact with the body. Most common are studies that use ECG to record cardiac parameters such as heart rate and heart rate variability, e.g., Minhad et al. [31] and Taherisadr et al. [40]. This is used to draw conclusions about fatigue, stress, emotional responses, and general health of the driver. Much more complex are studies on EEG, which records electrical potentials of cerebral cortical neurons at the scalp. These studies attempt to detect cognitive states, such as activity or boredom, or even to transmit individual driving commands to the automation, such as hazard braking, e.g. Teng et al. [41]. Another approach are EDA measurements, where changes in sweat gland activity are recorded and analyzed. This methodology is used to assess emotional state, emotional arousal, or sleepiness [42]. However, this cannot validly capture which emotion is being measured because, for example, stress and anger elicit similar responses [45].

A disadvantage of some physiological measurement systems, e.g., EEG is that they are expensive, which would significantly increase the total vehicle cost, and that their use is hardly practical. In addition, other systems, such as EDA, are sensitive to surrounding temperature, and, clearly more serious, however, is that some of the signals detected may have different causes and vary greatly from person to person, making automated interpretation of these signals virtually impossible to date [1].

An alternative could be DSMS that focus on measuring the direction of the driver’s gaze. These have the distinct advantage that they can be permanently installed in the vehicle and do not need to be attached to the driver in any way. Further advantages are that vehicle manufacturers as well as suppliers already have experience with the series use of camera systems for observation, these systems can now be procured at low cost and require little installation space. For these reasons, many development approaches currently focus on camera-based state estimation. Hecht et al. [20] therefore describe DSMS based on eye tracking as the technology with the greatest potential.

However, for the use of such DSMS, it is necessary to identify measurable criteria that correlate with possible driver states or the future takeover quality. Despite various efforts to compile such a set of criteria (e.g. [17, 25]), the authors are not aware of any valid set of criteria to date. And even if the prediction of the driver’s future takeover capability should remain the major goal in driver assessment, the question arises whether today’s DSMS already make takeovers safer. The concept of the Diagnostic Takeover Request (Diagnostic TOR), which is described below, pursues this idea.

3 Theoretical Concept of the Diagnostic Takeover Request

Since the desired criteria that would enable the prediction of a future takeover quality are not (yet) available, an alternative approach, first published by Herzberger et al. [24] and Schwalm and Herzberger [38], and now patented [36], was developed.

This method, hereafter referred to as Diagnostic TOR, no longer focuses on inferring a state from the operator’s behavior during automated driving, but on predicting risky takeovers based on a driver’s response to a takeover request (TOR). The general idea is to detect missing or reduced takeover capability based on classifying drivers’ orientation reactions after a TOR. For that, orientation reactions will first be recorded and evaluated for a large number of drivers, together with the subsequent takeover quality. Based on this data set, post-hoc safe and thus good takeovers can be separated from riskier, poor takeovers. After this classification is done, the previously shown orientation responses to the TOR can be analyzed. The hypothesis here is that the orientation responses of drivers before safe and unsafe takeovers differ significantly [23].

If the orientation reactions do indeed differ significantly, this would allow to predict a risky takeover already after the orientation phase of the driver, which would enable multiple early intervention options such as the initiation of a minimum risk maneuver (MRM), or the transition to a higher level of automation. This is illustrated in Fig. 2 using the example of a SAE level 4 automated driving system (ADS) with a system failure. Following the SAE recommendation, the vehicle has to wait (dynamic driving task (DDT) fallback) for the driver’s response to take over control. The Diagnostic TOR concept aims to significantly reduce this waiting time, since it is not the driver’s actual intervention that has to be waited for, but only his or her orientation reaction.

Fig. 2
A block diagram illustrates the comparison between the diagnostic T O R concept and the S A E recommendation. The diagnostic T O R reduces waiting time and leads to potential time gain. In S A E recommendation vehicle has to wait and the passenger becomes the driver and resumes D D T performance.

Pattern for a diagnostic takeover: potential time gain for a minimum risk maneuver (MRM) using the Diagnostic TOR in comparison to the recommendation for level 4 by the SAE [24]

4 Review of the Concept and First Application

In this study, first published by Herzberger [23], the concept of the Diagnostic TOR was applied for the first time to find out whether it is indeed possible to detect distinguishable orientation reactions before safe and unsafe takeovers.

The study was conducted in the static driving simulator of the Institute of Automotive Engineering (ika) at RWTH Aachen University (see Fig. 3). A 5-series BMW (F10) served as mock-up and Virtual Test Drive (VTD) was used as driving simulation software. By using a curved projection surface, a visual range of up to 210° horizontally and 40° vertically was achieved. A three-lane highway with hard shoulder (RQ 31; FGSV [3]), with both straight and curved passages, was chosen as the driving situation. Since the environment was to be designed with as little stimulus as possible, there was no flowing or oncoming traffic.

Fig. 3
A photograph of a woman seated inside a car in the driver's seat with an installed static driving simulator. She wears a band like device on the forehead, which extends to the nose and accesses a touch screen.

In-vehicle study setup in the static driving simulator of the Institute of Automotive Engineering (ika) at RWTH Aachen University

The study design included an SAE level 2 driving function, which was activated by pressing an orange button on the steering wheel. The automation was designed so it could be activated at speeds above 120 km/h, whereupon it displayed a confirmatory feedback in the HMI and accelerated or decelerated to the maximum permissible speed of 130 km/h. After activation, longitudinal and lateral guidance was fully executed by the automation, provided the system limit (unforeseen situation ahead) was not reached. Furthermore, it was not necessary to have the hands on the steering wheel continuously or to re-engage with the steering wheel after a certain period of time. This system design was chosen because the goal was to have as many as 50% of the subjects (not) manage a safe takeover in order to have a data set as equal as possible for each of the takeover qualities (safe/unsafe). The system could be deactivated by both braking and steering interventions and by pressing the orange steering wheel button again.

After 15 min of SAE level 2 driving at 130 km/h in the center lane, a critical situation occurred, in the form of a broken-down vehicle (white Audi Q5) in the center lane ahead. The vehicle was parked with its hazard warning lights activated, but without any other markings e.g., a warning triangle. In addition, starting two minutes before the TOR, there was heavy traffic on the left lane at a higher speed, so that it was not possible to change to the left lane.

During automated driving, half of the participants were offered a non-driving related task (NDRT). This, visual-haptic NDRT, was a Tetris game running on a tablet (Samsung Galaxy Tab S3) mounted in the center stack. The NDRT was chosen to provide a relevant variance in takeover performance shown in the experiment. Under real-world conditions, Tetris, during a SAE level 2 drive, would not be an acceptable NDRT and thus represents a miss-use.

The subjects’ gaze direction was measured both during the automated drive and during the takeover. The head-mounted eyetracker Dikablis Professional by Ergoneers was used for this purpose. The defined areas of interest (AOIs) were based on the recommendations of ISO 15007:2020, see Fig. 4, which include the road ahead (1), the rearview mirror (2), the TICS (Transport information and control system) display (3), the instrument cluster (IC, 4), the driver-side rearview mirror (5), the driver-side side window (6), the passenger-side rearview mirror (7), and the passenger-side side window (8). For this study, the AOIs highlighted in blue (1, 2, 3, 4, 5, 7) were selected because additional hardware would have been required for the side window detection, which was not available at the time the experiment was conducted.

Fig. 4
An illustration of the dashboard of a car highlights 7 components of a car along with the road ahead. Some of the components are, the rearview mirror, T I C S, the instrument cluster, the passenger side rearview mirror, and the passenger side side window.

AOIs according to the numbering of ISO 15007:2020 (adapted from Herzberger [23])

N = 50 subjects participated in the study (52% female). The age of the participants ranged from 20 to 69 years (M = 32.18 years, SD = 11.03 years) and the average annual mileage was M = 16,076 (SD = 18,221 km). The results of the Karolinska Sleepiness Scale (KSS) as well as the Sofi scale, which measure the fatigue of test subjects, did not differ significantly between the groups safe takeover (ST) and unsafe takeover (UST)—the participants of both groups thus assessed themselves comparably awake. ST were defined as emergency braking in front of the broken-down vehicle, or swerving into the clear right lane, UST as swerving into the right lane occupied by faster moving vehicles, hitting a broken-down vehicle, or hitting the guardrail.

The subjects’ takeovers of the driving task were analyzed after the study. The evaluation of the objective driving simulator data revealed 34% ST by the driver without NDRT and 14% in the Tetris group. The UST break down as follows: 16% without NDRT and 36% in the Tetris group—in each case as a percentage of the total takeovers. Thus, the goal of obtaining data from both successful and unsuccessful takeovers in each group was achieved. The gaze sequences were then analyzed. No distinction was made as to whether the gaze sequences prior to a UST were from individuals with or without NDRT, since future algorithms would have to be able to handle both drivers with and without NDRT. Several gaze sequences were identified that were unique to UST (results see Table 1). A discussion of the results, together with those from the second study, follows after Table 1.

Table 1 Gaze patterns from the first and second study (S1 and S2) and the likelihood (L) of an unsuccessful takeover (UST) after a gaze pattern (adapted from Herzberger [23])

However, the most obvious limitation is that the number of participants per gaze sequence is very small, which is due to the fact that there are six possible AOIs that can be stringed together in any combinatorial order. Accordingly, there are a large number of possible combinations, which minimizes the probabilities of each occurring. In order to obtain a representative sample for each of the cases, driving simulation studies are simply not suitable, as they are too time-consuming and too expensive. However, the aim of the study was not to generate an exhaustive data set, but to perform a first analysis based on the question whether the Diagnostic TOR could be implemented in principle. This analysis showed that the orientation responses differ before ST and UST (at least for this first sample), which strongly supports the usefulness of this approach. But, given the small number of participants, a replication study (see next section) was needed to find out whether another sample shows similar gaze orientations and whether, despite further data, distinctive sequences are still preserved.

5 Generalizability of Orientation Reactions

The purpose of this replication study, first published by Herzberger [23] was to find out whether comparable orientation reactions could be found in a different sample under conditions that were as comparable as possible. Furthermore, the recorded orientation patterns should be merged with those from the previous study to check whether distinguishable sequences can still be identified in a larger sample.

This replication study was conducted in the static driving simulator of the IAW of the RWTH Aachen University. Since this is not based on a real vehicle mock-up but on a Bosch-Rexroth setup, it was possible to replicate the dimensions of the vehicle exactly. For this purpose, the mock-up at the ika was precisely measured to record both the distances between the mirrors and the exact positions of the IC and TICS. This ensured that as few confounding variables as possible were introduced into the study. A four-camera remote eye tracking system (Smart Eye Pro 6) was used. The system uses four cameras of which two were placed at the bottom of the A-pillars, one above the dashboard, and one at the bottom right of the TICS to reliably detect the relevant AOIs. This system was chosen because the head-based system had difficulties with sudden head movements and to best meet the requirement for a method that could be used in the real world. The study design was replicated as closely as possible.

N = 38 subjects participated in the study (55% male). Participants’ ages ranged from 18 to 65 years (M = 33.26 years, SD = 15.01 years), and mean annual mileage was M = 8,860 (SD = 11,289 km). After the study, the takeovers were classified. This resulted in the following takeover qualities by group: Successful were 36.8% of participants without NDRT and 23.7% with NDRT. Unsuccessful were 7.9% of the subjects without NDRT and 31.6% with NDRT. Thus, the replication study also achieved its goal of collecting data from both successful and unsuccessful takeovers in each group.

Table 1 provides the participants’ gaze patterns from both studies as well as the likelihood for a subsequent UST. The AOIs were labeled according to the following classification: Road ahead (1) is R, TICS display (3) is T, instrument cluster (4) is IC, driver-side rearview mirror (5) is Mirror left ML, passenger-side rearview mirror (7) is Mirror right MR, and rearview mirror (2) is MM (Mirror middle, since RM was too similar to MR). It becomes apparent that even after merging the data sets from the first and the second study (S1 and S2), it is still possible to identify distinct sequences. The gaze patterns that have a probability >0.5 and a sample n > 1 are highlighted in gray.

It is noticeable that takeovers are most often unsuccessful when they start at the TICS display (T). This is not surprising since drivers who have not lowered their gaze to T may at least still be peripherally aware of their surroundings or may even be able to recognize the critical situation at an early stage. These results should be taken as an opportunity to reconsider the warning strategy in the course of takeover requests, since warnings in the instrument cluster (IC) do not seem to be very helpful for safe takeovers, to say the least. An alternative could be to display the warning (additionally) in the TICS or the head-up display, so that the driver does not have to look into the instrument cluster to grasp the content or trigger of the warning. Since even with an enlarged sample separating orientation reactions were identified, the Diagnostic TOR seems to be a promising approach to detect a UST at an early stage in case of a necessary handover of the driving task to the driver by comparing the detected orientation reaction with previously recorded gaze patterns and thus to be able to initiate safeguarding measures.

6 Conclusion and Outlook

The presented concept of the Diagnostic TOR shows that a meaningful use of DSMS is already possible today, which could enable early detections of unsuccessful takeovers. Nevertheless, the major goal remains to identify criteria in the future that can be used to predict the driver’s capability to take over from the driver’s behavior during the automated drive. Until this is possible, however, approaches such as the Diagnostic TOR could be helpful in gaining more reaction time during critical transitions. Importantly, the limit in terms of reaction time as well as accuracy is far from being reached: In the presented studies, only the gaze pattern was used to estimate the human’s capability to take over. For more advanced approaches, such as confidence horizons, the human horizon can be determined in much greater detail. For example, there are ongoing studies that also include the driver’s body posture, weight shift and grip strength. The gaze direction, therefore, is only a small part of a takeover pattern [12, 8], and a large number of variables forms of takeover responses. More detailed takeover patterns are being investigated in the Exploroscope [7] of the IAW at the RWTH Aachen University and will also be included in the design of the confidence horizon in the future. This concept, which allows the two agents (driver and vehicle) to have a shared mental model, will enable dynamic in-vehicle cooperation of the driving task. The information about the mutual capabilities or limits of driver and automated system could also be made available to other automated vehicles in a next development step. If, for example, a transition to the driver fails and an evasive maneuver has to be performed at the last moment, surrounding automated vehicles could react adaptively based on this data, taking vehicle-vehicle-cooperation to a next level.