Drowsiness affects humans’ driving ability. This is why drowsy driving accounts for approximately 19%–24% of all fatal crashes in manual driving (Hell et al., 2012). To ameliorate the risk of drowsy driving, car manufacturers and scientists have been working on reliable systems to detect, warn of, and ultimately prevent this critical state before it begins. To determine the state of the driver online while driving, several physiological measurement techniques, such as electroencephalogram (EEG), electrocardiography (ECG), electrooculography (EOG), and performance-based measures (e.g., steering behavior), have been evaluated in the past (Dawson, Searle, & Paterson, 2014; Dong, Hu, Uchimura, & Murayama, 2011). The term drowsiness will be used in this work as it was by Sahayadhas, Sundaraj, and Murugappan (2012), as synonymous with sleepiness. An acceptable detection system for drowsiness has to fulfill several requirements that the aforementioned methods do not completely satisfy. Preferably, to be accepted by all drivers, the system should not be attached to the body. Current serial systems using either steering behavior or drivers’ ability to stay in the lane to detect drowsy driver detection criteria meet this requirement (Daimler, 2008; Ford, 2010). However, with the development of automated driving functions, steering behavior is no longer a feasible detection method. Conditionally automated driving (CAD; based on the definition in SAE Standard J3016 [2014]), which is currently in development (Wei et al., 2013), gives drivers the freedom to let go of the steering wheel, relax their observations, and allow the algorithm to control the vehicle. Self-driving vehicles will become reality within the next 5 years (BMW Group, 2016). Among the technologies that will be needed to successfully deal with all constraints of future automated travel, the use of a camera to observe a driver’s behavior is the most promising technology for detecting drowsiness (Jafar Ali, Sarkar, Kumar, & Cabibihan, 2012). Particularly, the analysis of eyelid movements and blinking as a drowsiness indicator has been the focus of many studies as reported in reviews by Stern, Boyer, and Schroeder (1994) and Dawson et al. (2014). Onset drowsiness influences the normal blinking behavior and can therefore be used to estimate the drowsiness state of the driver (Morris & Miller, 1996; Picot, Charbonnier, & Caplier, 2012). Unlike manual driving, CAD is an automated system offering the driver the possibility to change their gaze and eye closing behavior without negatively affecting driving safety. It is crucial to have an algorithm for detecting blinks or the eye aperture in both the manual and CAD phases to be able to distinguish alert and drowsy driver behavior with high accuracy, as drivers will be able to freely switch between the driving modes.

The goal of this study was to compare the performance of a blink detection algorithm in manual driving against its performance in conditionally automated driving, in phases during which the drivers were alert and phases during which they were drowsy. This should answer the open question concerning the degree of influence that CAD has on the detection rates of a blink detection algorithm and what influence a possible behavior change in blinking has on the blink detection in the future self-driving transportation. A second goal was to compare the performance of an EOG-based blink detection system, with the performance of a simultaneously recording camera-based detection system. By these means, it should be possible to determine which technology is best suited for assessing driver behavior in future studies with CAD.

Human blinking behavior

Blinks are usually associated with the urge to clean the eyes from particles and to spread tear film. Therefore, blinks are affected by humidity, temperature, chemical factors and air particles (Stern et al., 1994; Wolkoff, Nøjgaard, Troiano, & Piccoli, 2005). The time between blinks increases when an observer watches a visual display unit (Patel, Henderson, Bradley, Galloway, & Hunter, 1991) or is distracted by a demanding task (Wolkoff et al., 2005). On the other hand, time between blinks is reported to decrease with an auditory task during driving as compared to driving without an additional task (Tsai, Viirre, Strychacz, Chase, & Jung, 2007). This suggests a relationship between the type of the task and its effect on the driver’s blinking frequency.

Studies that involve a long time-on-task, which evokes drowsiness, have reported changes in eyelid closure and gaze behavior. The palpebral aperture generally becomes smaller “associated with a downward shift in gaze angle” (Lobb & Stern, 1986, p. 17). In addition, Stern et al. (1994) reported compelling evidence of an increasing blink rate with time-on-task in their literature review.

Blinks are categorized by their origin as voluntary, reflex, or spontaneous and are often accompanied by saccades and eye movements (Collewijn, van der Steen, & Steinman, 1985; Stern, Walrath, & Goldstein, 1984). Stern et al. (1984) further distinguished longer eye closures, such as “microsleeps,” as a separate category representing nonblink closures. However, in this research, with the focus on driver drowsiness and the detection of blinks, microsleeps will be considered blinks.

Blink detection methods

EOG and video recordings are the main techniques used in driving/transportation studies to record the blinking behavior of a driver (Morris & Miller, 1996; Picot et al., 2012). Although EOG is considered to be the most reliable method due to its high frame rate, video-based assessment has gathered popularity for its practicability in the automotive industry due to its ability to measure contact free.Footnote 1

To obtain a signal for the eye blink detection with EOG, several surface electrodes are positioned around the eyes. Since the cornea has a positive electric potential in reference to the fundus of the eye, a natural occurring eyelid movement during a blink affects the electric potential between the two electrodes positioned above and below the eye. A blink can thus be measured as change in the potential distribution of the eye (e.g., Jammes, Sharabty, & Esteve, 2008; Skotte, Nøjgaard, Jørgensen, Christensen, & Sjøgaard, 2007).

Using video recordings, eyelid movement is visible in the images and can be assessed using image processing methods. Different algorithms for that purpose are based on either the motion detection derived from differencing two consecutive images (e.g., Bhaskar, Keat, Ranganath, & Venkatesh, 2003; Chau & Betke, 2005; Fogelton & Benesova, 2016; Jiang, Tien, Huang, Zheng, & Atkins, 2013), a second-order derivative method of image differentiations (Gorodnichy, 2003), a state classification (e.g., Choi, Han, & Kim, 2011; Missimer & Betke, 2010; Pan, Sun, & Wu, 2008; Pan, Sun, Wu, & Lao, 2007), an evaluation of the color contrast or amount of visible color of specific eye regions (Cohn, Xiao, Moriyama, Ambadar, & Kanade, 2003; Danisman, Bilasco, Djeraba, & Ihaddadene, 2010; Lee, Lee, & Park, 2010), the distance between landmarks or arcs representing the upper and lower eyelid (Fuhl et al., 2016; Ito, Mita, Kozuka, Nakano, & Yamamoto, 2002; Miyakawa, Takano, & Nakamura, 2004; Moriyama et al., 2002; Sukno, Pavani, Butakoff, & Frangi, 2009), the missing regions of the open eye like the iris or pupil due to their occlusion by the upper and lower eyelid (Hansen & Pece, 2005; Pedrotti, Lei, Dzaack, & Rötting, 2011), or a combination of the described methods (Sirohey, Rosenfeld, & Duric, 2002). Instead of measuring the real distance between the upper and lower eyelid, most of these algorithms use an indirect measure (motion detection, classification, color contrast, missing eye regions) to conclude whether the eye is closed. This is similar to the EOG technique, which also infers an eye closure indirectly from a potential difference between two electrodes.

The collected eye or eyelid movement signal of the EOG or the image processing is usually processed further in a second step. General approaches to detect blinks in the signal include the evaluation with thresholds (e.g., Divjak & Bischof, 2009; Grauman, Betke, Gips, & Bradski, 2001), filtering (e.g., Grauman et al., 2001; Jammes et al., 2008), derivation of the signals (e.g., Ebrahim, 2016; Torricelli, Goffredo, Conforto, & Schmid, 2009), transformation (e.g., Benoit & Caplier, 2010; Malik & Smolka, 2014), and valley/peak detection (e.g., Malik & Smolka, 2014; Radlak & Smolka, 2012). Depending on the signal quality and data processing method, more or less detailed information (start time, speed of the eye closure, or duration of the eye closure; see Picot et al., 2012) of the blinks can be parameterized.

Drowsiness detection and influence by the performance of the blink detection

By identifying changes in eye blink parameters, several studies have been successful using EOG or video recordings to estimate the drowsiness of a driver (EOG: Hu & Zheng, 2009; Picot et al., 2012; Video recordings: Bergasa, Nuevo, Sotelo, Barea, & Lopez, 2006; Friedrichs & Yang, 2010; Garcia, Bronte, Bergasa, Almazan, & Yebes, 2012). Unfortunately, they often fail to report the performance of the pre-processing algorithms used to detect blinks or the degree of eye closure, which implies a high reliability of the sensor system and error-free parameterization of the collected data. According to a study by Pedrotti et al. (2011), a commercially available eyetracker does not guarantee a correct blink detection rate of 100%. Consequently, only performance tests of the blink parameterization algorithms can reveal the influence of errors in the preprocessing of the detection of blinks. These evaluation tests for blink detection algorithms need to be carried out under the same constraints as those used in the application. Hence, the testing procedure of an eye blink algorithm for drowsy drivers needs to be evaluated in alert and drowsy driving phases. Because blinking as well as measuring systems are influenced in many ways (EOG: electromagnetic compatibility, contact with the skin; Video recordings: position of the camera, frame rate of the video, glasses, eye physiognomy), the detection rate of blinks and the accurate measurement of the eyelid distance can differ completely in various experiments.

In a comparison of the detection rate of blinks between EOG and a remote eyetracker, Picot, Caplier, and Charbonnier (2009) used the blinks detected with a 250 Hz EOG system as a reference value for blinks detected with a remote eyetracker with different frequencies. In the study with data from 14 awake participants, they noticed a decreasing false detection rate with faster frame rates of the video (false detection rate with 30 Hz is greater than 100 Hz, >150 Hz, >200 Hz). Yet, the authors could not evaluate the difference between the various frame rates on the same video recordings. It is also unclear how many errors their EOG blink detection contained.

With respect to the detection of blinks in drowsy manual driving, several studies report a drop in the correct detection rate relative to the detection rate of the alert driving phases (Ebrahim, 2016; Jammes et al., 2008; Skotte et al., 2007). Following the higher rate of long eye closures observed for drowsy drivers during CAD by Schmidt, Braunagel, Stolzmann, and Karrer-Gauß (2016), a similar drop in the correct detection rate between the manual mode and the CAD mode can be expected and will be investigated in the following sections.


Two experimental studies were conducted in a between-subjects design, using a manual and conditionally automated driving condition. None of the participants of the conditionally automated study were involved in the manual-driving study. In both studies, we intentionally used simulated evening conditions with a dark, heavily overcast sky and a monotonous roadside to induce drowsiness. The radio was switched off, and the use of any secondary devices was prohibited. The time for the whole process (introduction, pre-questionnaire, driving, and post-questionnaire) for each individual driver was limited in both studies to 4 h. The average driving time in the manual-driving study was 2 h 46 min for a distance of 335 km. In the CAD study, the participants drove on average for 2 h 45 min over 263 km. In both studies, half of the participants started the drive at 6:00 pm, and the rest at 10:00 pm. All of them had had a normal workday before the experiment. Participants had to rate their drowsiness level on the Karolinska Sleepiness Scale (KSS; Åkerstedt & Gillberg, 1990, Table 1) every 15 min (with a few exceptional extensions of up to 17 min in the CAD study). None of the participants in either study reported any known sleep disorder.

Table 1 Karolinska Sleepiness Scale (KSS)

Manual-driving study

A total of 18 people participated in the manual-driving experiment (12 males and six females) and they ranged in age between 27 and 56 (mean [MN] = 34 years, standard deviation [SD] = 8 years). On average, the participants had possessed a driver’s license for 16 years (max = 38 years, min = 8 years, SD = 9 years). The study took place in a Mercedes-Benz moving-base simulator with a dome on a hexapod platform and a 12-m axis for linear motion. A detailed Mercedes Benz S-class cabin was placed inside the dome and a 360° projection of the scenery was used throughout the drive to provide a realistic surrounding (Fig. 1). A supervisor was present in a separate observation room and had the ability to interact with the participant via intercom.

Fig. 1
figure 1

Mercedes-Benz moving-base simulator

After a short pre-questionnaire and briefing, the participants started to drive on a circular two-lane highway 200 km long. Apart from two construction sites, located at Kilometer 62 and Kilometer 88, there were no particularly interesting landmarks on or alongside the road. Participants were instructed to drive at a speed of 130 km/h. On average, slower vehicles with a speed of 100 km/h had to be passed approximately every 2 min and the participants were passed by vehicles with a speed of 160 km/h approximately every 5 min. Furthermore, participants were requested by a recorded voice command to rate their subjective drowsiness level on the KSS. The scale was attached to the inside of the car with several descriptions of the different drowsiness levels (Table 1). Previous evaluations based on recordings of manual driving conditions can be found in chapters 5–8 of the dissertation by Ebrahim (2016), the results of which are not part of this study.

CAD study

The test drives in the CAD experiment took place in a fixed-base driving simulator. All 46 participants completed the study (32 males and 14 females). Their ages ranged from 28 to 57 years (MN = 44 years, SD = 7.0 years). They had had a driver’s license on average for 26 years (max = 42 years, min = 11 years, SD = 7.6 years). To simulate the driving environment, three flat screens (each 65 in. in diameter) were positioned 2.54 m in front the steering wheel of a car mockup, with the two outer screens tilted 144° relative to the one in the middle (Fig. 2).

Fig. 2
figure 2

Mercedes-Benz fixed-base simulator

An attendant supervisor was separated by a pin board from each participant and did not interact with them. Realistic sounds of headwind, tires, and engine were produced by speakers in response to the speed and acceleration of the simulated drive on a circular two lane highway with a length of 108 km. After arriving at the simulator site, the participants had to fill out a pre-questionnaire and were informed about the CAD function for the drive. The participants started the drive with a short manual introduction phase in which they got comfortable with the simulator and its dynamics by driving manually and performing several overtaking maneuvers. Since the implemented CAD system was new to all participants, a short practical introduction phase for the CAD function followed. During this time, the participants became familiar with all the prerequisites and limitations of the system and were guided by the examiner. The total duration of the manual and CAD introduction phase was 10 min.

To activate the CAD mode, the driver had to be driving in one of the two driving lanes and pull a lever on the side of the steering wheel. Additionally, this part of the route had to be “officially” approved for CAD (which was always the case, aside from small time periods explained later). Furthermore, the braking and acceleration pedal had to be released at the moment of CAD system activation and the driving speed had to be slower or equal to 110 km/h. To ensure the different modes were recognized correctly by the driver, different images in the middle of the speedometer display showed the current state of the CAD system (CAD not available, available, or active). Once switched on, the CAD function managed all lateral and longitudinal movements with a speed of 100 km/h. During an active CAD mode, the driver always had the possibility to take back control by pressing the brake or acceleration pedal, pushing the lever used for the activation of the system, or by changing lanes by steering. In each of these cases, the automated mode switched off immediately and the driver was back in charge of driving. Since CAD is limited to specific prerequisites that cannot always be guaranteed, it still relies on the driver reacting and taking back control if the system reaches these limits. For this reason, a basic reaction ability of the driver is always required. To ensure this, the driver had to confirm alertness requests within a given time of 5 s, similar to the alertness requests in trains, at intervals of either 30 s (23 participants) or 180 s (23 participants). If the driver failed to confirm the alertness request or the system reached its limit, a takeover request prompted the driver to take back the control of driving from the system. The CAD mode remained active during this time for up to 5 s (hand-over time) until the driver took over, and it switched off after 5 s if the driver did not take back control. During the drive, the participants faced several predefined situations, in which an ending of the availability of the CAD function on the highway was simulated. Directly after the maximum 5 s of hand-over time, different simulated road scenarios required adequate action by the driver. These situations, subsequently referred to as takeover situations, were relatively easy to handle and appeared after 30, 52, 70, and 92 min. An additional final situation was triggered dynamically. This situation was used to test the drivers’ reaction ability by challenging them in their potentially severest state of drowsiness. To ensure that all participants entered the first four situations at the same time, a leading vehicle drove in front of them, which the participants were asked not to overtake. Due to a blind spot on the side of the mockup that made it impossible to see overtaking vehicles, no faster cars overtook the driver during the experiment. Apart from the vehicles at the beginning, the leading vehicle, and the vehicles in the different situations, no vehicles were standing or driving in the same direction as the driver. The 15-min interval of the request to rate the KSS was occasionally prolonged by up to 2 min if it would have occurred during one of the predefined takeover situations. During the KSS query, the scale was presented to the drivers on the center screen and was switched off by the examiner immediately after the verbal KSS estimation of the driver. Unlike in the manual study, the scale only had annotations for the steps KSS = 1, 3, 5, 7, and 9, but not for the steps in-between, shown in Table 1. A detailed explanation of the situations and the reactions of the drivers can be found in a previously published article (Schmidt, Stolzmann, & Karrer-Gauß, 2016), as well as the evaluation of the process of drowsiness.

Signal measuring and use of existing signal processing methods

To record the driver behavior, the participants were equipped in both studies with an EOG measuring system called actiCAP (Brain Products GmbH, 2009) and the head-mounted eyetracker Dikablis (Ergoneers GmbH, 2016). Two electrodes were placed above and below the right eye to get the vertical EOG signal, two on the left and right side of the head for the horizontal EOG signal, and two more on the right and left mastoid bone to gather a reference signal and exclude noise. The camera of the head-mounted eyetracker was directed toward the left eye. Both eyes were used for the evaluation process (EOG on the right eye; head-mounted eyetracker on the left eye), under the assumption that both eyes would blink simultaneously (Collewijn et al., 1985). The two measuring devices had no visible effect on the driving, which was analyzed in Ebrahim (2016) and Schmidt, Stolzmann, and Karrer-Gauß (2016). The head-mounted eyetracker used infrared light to record the movements of one eye very precisely in the dark conditions of the experiments. Its resolution was 384 × 288 pixels. The width of the eye in the video image was approximately 75% and the height 50%. The camera was adjusted individually for each participant to a fixed position. This reduced the influence by head and body movements in the results of the blink detection. Furthermore, the use of a head-mounted eyetracker excluded effects by the necessary algorithmic detection of the eye region for remote eyetrackers. The EOG signals were recorded with 250 Hz and the eyetracker data with 25 Hz. For this study, the frequency of the EOG signal was sampled down to 50 Hz (f 1) and 25 Hz (f 2). Since the timestamp normally used for the synchronization was corrupted by lags caused by the signal communication between the measuring devices, distinctive eye movements in the video were manually synchronized with peaks in the EOG signal together with the eyetracking data. The synchronized outcome was evaluated on several additional time frames from the entire drive that were distinct from the ones used for synchronization.

To detect eye blinks within the EOG signals, a signal processing procedure developed by Ebrahim (2016) was used. It was chosen due to its reported high recall and precision rate for blink events in the alert and drowsy driving phases. For a more detailed explanation, refer to pages 55 to 60 in the dissertation by Ebrahim (2016). Furthermore, the use of the algorithm for the EOG system and f 1 will be referred to as detection process a1 and the use of the algorithm and f 2 as detection process a2.

For the eye blink detection with the head-mounted eyetracker, an image processing algorithm developed by Fuhl et al. (2016) was used in combination with an algorithm for the pupil detection (Fuhl, Kübler, Sippel, Rosenstiel, & Kasneci, 2015). The algorithm of Fuhl et al. (2016) was chosen because the generated signal represents a direct measurement of the eyelid distance instead of an indirect measure based on motion detection, classification, color contrast or missing eye regions. Further, in contrast to the indirect measures by EOG, this algorithm can be later used in drowsiness detection algorithms to estimate the drowsiness level based on the direct distance of the eyelids. Since the algorithm developed by Fuhl et al. (2016) used data recorded with the same head-mounted eyetracker as in the presented manual and CAD studies, good signal estimations for the eyelid distances in the experiments were expected. Refer to the studies by Fuhl and colleagues (Fuhl et al., 2016; Fuhl et al., 2015) for more details of the image-processing algorithms. The output of the algorithm Fuhl et al. (2016) described the palpebral aperture in pixels according to the recorded video. An open eye was estimated with a value of approximately 144 pixels. The second algorithm by Fuhl et al. (2015) generated a binary signal for a detected or an undetected pupil.

Developed blink detection algorithm

Simple methods such as fixed thresholds for the classification of eye blinks without signal preprocessing did not detect blinks accurately for different participants in this study. The reasons for that are based on noise, misclassifications, and high inter- and intra-individual differences of the participants.

Therefore, a new blink detection algorithm was developed that included a preprocessing and several participant-dependent thresholds. The different steps of the process are outlined below. In the following, xi describes the measured eyelid distance at time t i in the unit pixel, MN the mean value, and SD the standard deviation.

1. Outlier removal A

The raw signal of the eyelid distance included several extrema that were not plausible, possibly derived from noise or wrong interpretations of the image processing method. Outliers were individually defined for each participant as eyelid distance values outside the range between the 1st (eyelid distance: thoutl) and 99th (eyelid distance: thouth) percentile of all eyelid distance values during the whole experiment.

All eyelid distance values xi ∉ [thoutl thouth] were replaced by xi = MN(xj + xk); xj, xk ∈ [thoutl thouth], and the associated times t j < t i, t k > t i had minimal values of t i − t j and t k − t i, respectively.

2. Outlier removal B

Outliers with lower amplitudes were identified after the first step and replaced using neighboring data points similar to a moving average filter with a constraint. All eyelid distances \( {\mathrm{x}}_{\mathrm{l}} \) that were larger than MN(xl − 2 + xl − 1 + xl + 1 + xl + 2) + 3 SD(xl − 2, xl − 1, xl + 1, xl + 2) were replaced by xl = MN(xl − 1 + xl + 1). Similar to the first step, this replaced single values outside the 99th percentile of their four neighboring values, since an eyelid closure was longer than one sample.

3. Filtering (optional)

The resulting signal was filtered using a third-order Savitzky–Golay filter and frame window size seven (Savitzky & Golay, 1964).

4. Identification of a participant-dependent eyelid distance

Each participant had different values for the eyelid distances during driving. Therefore, a general eyelid distance for open eyes (tho) was calculated for each driver using all eyelid distance values between Minutes 15 and 30 of driving. This excluded effects of the driver’s customization to the environment and driving task. To account for the eyelid closure behavior in both manual and conditionally automated driving, the selection of tho included consideration of the changing behavior of the drivers in CAD. Drivers might close their eyes for longer intervals or look downward. Thus, only the eyelid distance values above the average of all eyelid distance values were used to calculate tho. Out of those values, tho was defined as the mode to consider the most common eyelid distance position as the individual threshold for open eyes.

5. Eyelid movement minimum detection

All eyelid closures contained an eyelid closing and opening phase. On this basis, all local minima in the signal lower or equal to thowere identified during the entire drive and described by. M = {xm | xm is a local minimum ∧ xm ≤ tho}; The corresponding times are summarized by T = {tm | xm ∈ M}. If a minimum contained more than one data point, the point with the lowest index was chosen.

6. Clustering of eyelid movements

All detected eyelid closure values from the fifth step were clustered using a k-means clustering method with k = 3 (Arthur & Vassilvitskii, 2007). The two values separating the clusters are further described with thu and thd with thu > thd. A third value thm was calculated with thm = MN(thd + thu). The smallest threshold thd was interpreted as a distinctive threshold for a complete eye closure with the upper eyelid being very close to the lower eyelid: Md = {mo ∈ M | mo ≤ thd}. To distinguish blinks without a fully closed gap between the upper and lower eyelid, but with a small eyelid distance, Mm contained all local minima eyelid distances below or equal to thm: Mm = {mo ∈ M | thd < mo ≤ thm}. Minima larger than thm, and lower than the upper threshold thu had a larger gap between the eyelids Mu = {mo ∈ M | thm < mo ≤ thu}. All values larger than thu were very close to tho and could be interpreted as smaller eyelid movements rather than blinks.

7. Determining the start and end values of the detected eyelid movements

For all eyelid movements Md, Mm, and Mu, the eyelid distances at their start (Sd, Sm, and Su) and end (Ed, Em, and Eu) were determined. The start and end points are the local maxima of the eyelid distance signal before (Sd, Sm, and Su) and after (Ed, Em, and Eu) the eyelid distance signal intersected with the thresholds \( \mathrm{t}\;{\mathrm{h}}_{\mathrm{d}} \), \( \mathrm{t}\;{\mathrm{h}}_{\mathrm{m}} \), and \( \mathrm{t}\;{\mathrm{h}}_{\mathrm{u}} \). If several minima of the same subgroup (e.g., \( {\mathrm{md}}_1 \) and \( {\mathrm{md}}_2 \) with \( {\mathrm{md}}_1,{\mathrm{md}}_2\in \) Md) had the same eyelid distance value (here: md1 = md2) and start and end point (here: sd1 = sd2 ∧ ed1 = ed2 with sd1, sd2 ∈ Sd and ed1, ed2 ∈ Ed), the ones with higher indices were discarded from the sets (here md2, sd2, and ed2). If they had the same start and end point but different eyelid distance values (e.g., md1 < md2), the ones with the higher eyelid distances were discarded (here: md2). Note that several eyelid movements of the set Md can lie in the boundaries of one single eyelid movement of the set Mm; similarly, several eyelid movements of the set Mm can lie in between one single eyelid movement of the set Mu.

8. Separate eyelid movements

In case of overlapping sections between the ending of one eyelid movement and the start of the subsequent one caused by several equal values in the signal, the plateau was split into two equal parts with its center defining the end and the start point, respectively, of the two consecutive eyelid movements.

9. Approval of an eye closure

The availability of the pupil signal from Fuhl et al. (2015) was used to check the visibility of the pupil in the samples in between the start and end point of the eyelid movements Md, Mm, and Mu, excluding the start and end samples. An eyelid movement was discarded if the pupil was visible in all investigated samples of the potential eye blink.

10. Amplitude clustering

In this step the amplitude of the eyelid movements of the set Mu was calculated with the equation Ampq = min(suq − muq, euq − muq); ( (suq ∈ Su) ∧ (muq ∈ Mu) ∧ (euq ∈ Eu)). The resulting values were sorted with a 3-means cluster algorithm (Arthur & Vassilvitskii, 2007). All eyelid movements in the group with the smallest amplitudes were excluded as potential blinks.

11. Data threshold

If the eyelid distances in the minima (Md, Mm, and Mu) were greater than 100 pixels, they were excluded. 100 pixels corresponds approximately to an openness of the eye of 70%.

12. Merge eyelid movements A

In the case that the number of smaller eyelid movement events between the start and end points of an event with a bigger eyelid distance was greater than or equal to one event, a check was performed to determine whether each eyelid movement event originated from noise. As the criterion, the biggest distance between the thresholds thm or thd and the eyelid distance in the minimum or minima of the set Mm or Md, respectively, was calculated. If the distance of one of the eyelid movements was smaller than or equal to d = 7 pixels (approximately a distance of 4% of eyelid closure in relation to the threshold), the associated minimum/minima of the set Mm, or, respectively Md was/were discarded and the start and end point of the eyelid event with the bigger eyelid distance (of the set Mu, respectively Mm) was retained. If the distance/s was/were larger than d, the associated eyelid distances in the sets Su, Eu and Mu, (respectively Sm, Em, and Mm) were eliminated and the event with smaller eyelid distances was retained.

13. Merge eyelid movements B

All eyelid closure events in the three categories Md, Mm, and Mu were merged into one single set of blinks with the set of start point Sa, minima Ma, and end point Ea.

14. Combine eyelid movement events

Due to occasional erroneous peaks in the eyelid distance signal, the described procedure could split one eyelid movement of one blink into several parts. To distinguish one single event from multiple ones, the pupil detection signal was checked during the time between the start and end samples extended on each side by one sample of two consecutive eyelid closure events. If the pupil was not detected between the two potential blinks, the two events were combined.

15. Adjust detected phases (optional)

The signal contained noise that impaired the exact detection of the start and end points of a blink. To improve the detection, the raw signal was filtered using a Savitzky–Golay filter with polynomial order three and a frame window size seven (Savitzky & Golay, 1964). Similar to Step 7, a check was made afterward to determe if each eyelid distance of the sets Sa and Ea resulting from Step 14 was still located on a local maximum of the filtered signal. If a start point was on a monotonically decreasing or an end point on a monotonically increasing slope, they were extended toward the start or end of the experiment until reaching a local maximum.

Due to the trade-off between good detection of blinks through several minima in the eyelid distance signal and the smoothing of the signal, the filtering of the raw signal after the removal of outliers in a late step (without using Step 3 – with Step 15; subsequently referred to as the filtered method; detection process: a3) was additionally evaluated in a comparison with an algorithm in which we applied filtering of the data at an early step (with Step 3 – without using Step 15; subsequently referred to as the raw method; detection process: a4). To evaluate the proposed detection processes a3 and a4 in combination with the chosen image-processing algorithm, the two proposed signal-processing algorithms by Bergasa et al. (2006) (detection process a5) and Sukno et al. (2009) (detection process a6) were implemented and tested on the same sequences as the other algorithms.

Definition of eyelid movements categorized as blinks

There have been different research approaches to measuring video-recorded eye blinks. Fogelton and Benesova (2016) compared different research groups, evaluating the different developed algorithms on the same data set recorded by Pan et al. (2007). The numbers of labeled ground truth blinks varied by nearly 7%. Differences in the lowest position of the upper eyelid during blinks were also identified by Jiang et al. (2013). To guarantee the precision and consistency of the labeling process and the characterization of each detected eyelid movement event of the algorithms implemented in this work, the lowest point of the upper eyelid was analyzed between the detected start and end samples of the detected eyelid movement events. To characterize the events, the middle and border of the pupil were used as the references for separating the eye into four regions where the border of the eyelid could reach its lowest point during an eyelid movement event (Fig. 3).

Fig. 3
figure 3

Defined eye regions

For this study, an eyelid movement categorized as a blink in the video data had to fulfill the following requirements:

  1. 1.

    The palpebral aperture had to gradually decrease at the beginning, reach its minimum, and then increase at the end of the eyelid movement, all between the determined start and end samples.

  2. 2.

    The eyelid had to cover enough of the pupil in the closed phase to block the vision temporarily—that is, the lowest point of the upper eyelid had to reach either region R1 or R2, defined in Fig. 3.

  3. 3.

    The pupil had to be at least partially visible at both the start and end of the eyelid movement.

Depending on the maximum descent of the upper eyelid during the eyelid movement, the eyelid movements were further categorized as Aa (reached region R1) and Ab (reached region R2). Note that due to the complexity of the definition of the start and end of an eyelid movement, a detected eyelid movement was classified as a blink if Requirements 1–3 were fulfilled within the range of the detected start and end. To give a detailed overview of the specific types of other detections, they were defined in several subclasses:

Ac :

All eyelid closure events that fulfilled Requirements 1 and 3 but in which the lid only reached region R3 during the detected boundaries were put in this class, counted as Ac events.

Ad :

Detected eye closures with the upper eyelid in region R1 or R2, but whose determined start and/or end sample failed one or both of Requirements 1 and 3, were put in this class. The eye closures had to fulfill the additional requirement: the real start and/or end fulfilling the requirements 1 and 3 had to be in a 200 millisecond time window towards the beginning and end of the detected start and end point.

Ae :

All detected blink-related events with the upper eyelid in regions R1 or R2 but that did not fulfill the additional requirement of eye closures for Ad were assigned to this class.

Af :

All other events in which the eyelid only moved within or remained inside region R4 (e.g., gaze movements) were put in this category, describing non-blink-related (Af) events.

In case of multiple blinks inside one detected event, only one of the blinks (the one with the lowest eyelid distance) was assigned to one of the categories described above. All other blinks of a multiple-blink event were added to the undetected blinks and affected the detection rate.


To obtain a detailed evaluation of the performance of algorithms based on various recording devices and signal-processing approaches (EOG vs. head-mounted eyetracker unfiltered vs. head-mounted eyetracker filtered) in different driving modes (CAD vs. manual), signal frequencies (f 1 vs. f 2), and participant conditions (awake vs. drowsy), comparable driving sequences had to be analyzed for each case. In the experiments, an alert phase was defined as a section in which the participants reported a KSS ≤ 6, and a drowsy phase as one with KSS > 6. This was based on the instructions given the participants to rate themselves with a KSS of 8 or 9 if they were no longer able to drive. With respect to a future drowsiness detection algorithm based on blink detections, the driving phases with the state KSS = 7 were analyzed together with the phases in which the drivers had clearly rated themselves as no longer able to drive. In this way, the influence of rising drowsiness on blink detection in the transition phase from alert to drowsy was evaluated together with severe drowsiness in phases with a KSS of 8 or 9.

Due to missing data and errors in the data recordings, four participants in the manual-driving and five participants in the CAD study had to be excluded. One participant in the manual-driving experiment did not become drowsy; thus, 13 participants experienced the drowsy and 14 participants the alert phase in the manual-driving experiment. To examine a comparable number of participants in the CAD experiment, 16 participants were randomly chosen from the four groups (Group 1: Start at 6 p.m. + 30-s alertness requests; Group 2: Start at 6 p.m. + 180-s alertness requests; Group 3: Start at 10 p.m. + 30-s alertness requests; Group 4: Start at 10 p.m. + 180-s alertness requests). A further constraint on the selection of the participants of the CAD study was that they experienced an alert and a drowsy phase. For the analysis of all participants, 2 min from the alert and 2 min from the drowsy phase were chosen using a random algorithm (Wong & Easton, 1980). It was assured that the chosen parts from the manual-driving study did not occur during a KSS query or in a 2-min time window following the KSS task. For the CAD experiment, the 2-min sections in the alert and drowsy phase were chosen as continuous windows during a switch to CAD mode not occuring during a takeover situation, KSS query, or 2 min after them. Due to the short and frequently experienced alertness requests, no additional exclusion criterion was applied. This resulted in 118 min of data as input for the evaluation. Overall, 2,941 eyelid movements in categories Aa and Ab were assessed in the video of the head-mounted eyetracker. These are further described as ground truth eye closures in the two categories GTa (upper eyelid reached region R1) and GTb (upper eyelid reached region R2) (see Table 2).

Table 2 Overview of the ground truth data

Three detection rates were calculated to evaluate the algorithm: the true positive rate (TPR), the blink-related false detection rate (FDRBR), and the not-blink-related false detection rate (FDRNBR), based on the classification provided above. These were defined as follows:

$$ T P R=\frac{\left|{A}_a\right|+\left|{A}_b\right|}{\left| G{T}_a\right|+\left| G{T}_b\right|} F D{R}_{BR}=\frac{\left|{A}_c\right|+\left|{A}_d\right|+\left|{A}_e\right|}{\left|{A}_a\right|+\left|{A}_b\right|+\left|{A}_c\right|+\left|{A}_d\right|+\left|{A}_e\right|+\left|{A}_f\right|} F D{R}_{NBR}=\frac{\left|{A}_f\right|}{\left|{A}_a\right|+\left|{A}_b\right|+\left|{A}_c\right|+\left|{A}_d\right|+\left|{A}_e\right|+\left|{A}_f\right|} $$

|Az| and |GTz| define the numbers of all detected eyelid movements by the algorithms of the categories Az and GTz(z = a, b, c, d, e, or f), which were labeled and categorized as stated above.

The distinction between the blink-related events and not-blink-related events was made because the blink-related events showed a strong similarity with blinks and contained a part of a true positive (part of a real blink). Furthermore, it could not be assured that blinks in the category Ac would be reassigned to the category Ab if a video recording with a higher sampling rate than the video recording of the head-mounted eyetracker recording with 25 Hz had been used for the labeling process.

The labeling process was started by labeling all detected eyelid movement events detected with EOG and sampling frequency f 1. To limit extensive labeling, the detected eyelid movements using the EOG algorithm with a lower frequency f 2 and the presented algorithm using the eyetracker data were checked for matching areas with a true positive from the start and end points generated by the first detection process a1. If the complete intersection between the labelled eyelid movement (starting point sl, end point el) with the corresponding boundaries from a2(starting point so2, end point eo2), a3(starting point so3, end point eo3), or a4(starting point so4, end point eo4) exceeded P1 = 80% for eyelid movements of less than 500 ms, the event was categorized with the same label as the previously labeled event. If the time between sl and el exceeded 500 ms, the time interval Δt of the intersection between the boundaries derived from a1 and aw (w = 2, 3, or 4) had to exceed P2 = 100% (1 – \( \frac{100\ \mathrm{ms}}{\varDelta t} \)) in order to adopt the same label. This was based on the natural time of an eye closure or opening, which Stern et al. (1984) described for longer eye closures. Eye movements with an intersection of less than P1 or P2 were labeled manually.

Since each of the research groups of the detection algorithms a5 and a6 defined blinks differently from the definition proposed in this article, each detected eyelid movement event of the algorithms a5 and a6 was labelled separately. The events were labelled as true positives dependent the individual definition of the authors. All other detected events that fulfilled the definition of a blink relateevent as described above were categorized as blink-related events if they differed from the individual definition of a blink.

As a summary for evaluation of the main research goals, the split as a function of measurement technique (EOG 25 Hz vs. eyetracker 25 Hz), driver state (awake vs. drowsy), level of automation (manual vs. CAD), and signal frequency (50-Hz EOG vs. 25-Hz EOG) are given in Table 3, showing MN ± SD of the detection rates of the eye closures. Since the detection rates with a3 were best for the video recording, those results were selected for the subsequent evaluation. Two detailed tables showing the performance of all algorithms are included in the Appendix.

Table 3 Performance of the blink detection algorithms (in %) with their MN ± SD

To evaluate the labeling process, all events of the basis detection with EOG and the sampling frequency f 1 were labeled three additional times by an independent labeler. The event-based matching was 98.5% for the manual drives and 95.6% for CAD.


The results in Table 3 show that the detection rates TPR, FDRBR, and FDRNBR of eyelid movements vary between the different subgroups analyzed. A Friedman test showed significant differences between the TPR [χ2(2) = 22.052, p < .001] and FDRNBR2(2) = 28.056, p < .001]. In the direct comparison between the EOG system (f 2) with the best result for the head-mounted eyetracker operated at the same frequency (a3), the rates are very similar with few exceptions. The results show a significant higher FDRNBR for the video recordings (Z = –3.2, p = .001, r = .42). The Bonferroni correction was used for the signifince level of this and all subsequent post hoc tests. A larger difference between a2 and a3 is also visible in the detection rates between the awake CAD phases, favoring the video recordings. The difference was not significant anymore after correcting the significance level. This indicates that for a higher level of automation and for the purpose of simulator or real road driving studies, a camera might be better suited for blink detections than an EOG system under the same constraints. One reason could be the use of an indirect measuring technique rather than the two direct methods used with the video recordings (visibility of the pupil and eyelid distance). For manual driving, the EOG technique provides an advantage. During the labeling process, several problems related to each of the technologies were noted. EOG had problems with long eye closures, small amplitudes with vertical saccades resulting in signal courses similar to eyelid closures, and eye movements during eye closures. On the other hand, the image processing step of the eyetracker video for the eyelid detection had problems with disturbances by eyelashes, the blockage of the lower eyelid during the drop of the upper eyelid, and wrinkles in the skin or edges around the eye that were misinterpreted as eyelids. Shorter eyelid distances and longer eye closures are especially likely to occur during drowsy phases, contributing to the different problems of both measuring devices. This can be seen in the significant drop (Z = –4.172, p < .001, r = .45) of the correct detection rates from the alert to the drowsy phases (drop between 2% and 16%) independent of the measuring frequency (f 1 vs. f 2), measuring technology (EOG vs. video), and automation level (manual vs. CAD). Despite the different constraints in the experiments (speed, alertness, and takeover requests), the drivers were always aware of the relatively high travelling speed and their responsibilities in the CAD experiment. Particularly the short manual-driving portions during the takeover situations enhanced the preparation awareness for a takeover. The general and significant decrease in the correct detection rate from manual to automated driving (drop in the detection rate of the EOG and video based algorithm with 9 to 29%; U = 1,832.500, p < .001, r = .49) and significant increase in the FDRBR (U = 2,940.500, p = .004, r = .22) cannot be explained by the experiments’ constraints. Therefore, they imply a change in the behavior of the eyelid movements of the drivers, confirming the results of Schmidt, Braunagel, et al. (2016). This further shows that detection rates obtained during manual driving should not be applied universally to CAD mode. Comparing the two measurement frequencies (f 1) and (f 2), significantly better detections (rising from 3% and 10%; Z = –4.736, p < .001, r = .62) can be seen in the outcome of the blink detection with the higher frequency. This effect is greater for CAD and indicates a higher potential for detecting blinks using higher frequencies for the measuring system in CAD. On the basis of these results, along with the findings from Picot et al. (2009), this effect is also expected when using different measurement frequencies for video recordings. The four signal processing algorithms tested (a3, a4, a5, and a6) show significant differences in the accuracy of eye closure event detection (see the detailed results in the Appendix). A possible reason for that could lie in the adaption of the two re-implemented algorithms from Bergasa et al. (2006) and Sukno et al. (2009) to their own image-processing signals and study constraints, differences that do not apply to our present study.

The introduced subcategories of Ab and Ac give a valuable indication of the differences in the eyelid behavior during manual driving. Since the decrease in the rates of Ab and Ac from an awake to a drowsy driver is larger than the overall missed events and decrease in the correct detection rate, it can be concluded that alert drivers usually do not close their eyes during a blink as far as drowsy drivers. This might be due to the urge to look at the street and the attempt to reduce the vision lost during a blink to a minimum.

The other subcategories of the FDRBR show the potential of improvement of the TPR in the blink detection process. The high rates for the CAD phases imply that the adapted blink behavior of the drivers during passive driving, with features such as longer shut times and eye closure and opening phases, causes more difficulties for the detection process.

Conclusion and future work

The goal of this article was to study different influences on the blink detection. The results indicate that the detection of blinks is primarily influenced by the level of automation, driver state, the measurement frequency, and the algorithms used. In this respect, the results quantify the influence on blink detection and show, how known research methods can be used to investigate the various influences. Furthermore, the introduced detection methods offer a new approach for detecting blinks and can be implemented in other studies using our detailed description of the signal processing steps. A detailed classification of eyelid movements reveals additional states in eyelid movement and behaviors of drivers in the tested circumstances. This could also be used as the common basis for comparing different approaches of blink detection in future studies. The video of a head-mounted eyetracker enabled a highly accurate labeling process, which is more difficult to obtain with a remote eyetracker. Using the head-mounted eyetracker, it was possible to show the results of the blink detection process without any influence of head movements and to gain greater insight into the influences of the automation mode and the drowsiness state of the driver. These were the main reasons why a head-mounted eyetracker was selected as the exemplary video source for this study. The use of other signal-processing methods and their lower performance shows the dependency of single blink detection methods on specific image processing methods and the states of the drivers during the recordings of the studies. Therefore, adapting methods to different conditions and drivers is necessary for accurate blink detection. In contrast to the EOG method, the detection based on video images allows to extract more information from the video than just a single signal describing the eyelid distance. The variety of different video-based blink detection methods could be used to improve the correct detection rate by combining several of them in a single algorithm. A higher video-recording measuring rate could increase the accuracy of the detection as well. Aside from an improvement in the correct detection rate, future analyses should focus on the examination of driver behavior during CAD, especially during drowsy driving. To build on the detected blink sequences, further information about the driver behavior, such as the blink frequency, duration of the eyelid closure, amplitude, or velocity of the eyelid closing and opening, should be evaluated. In this way, drivers could be accurately classified as too drowsy to continue driving in the conditionally automated mode. Overall, we showed that video recordings can be used to detect driver behavior in the future modes of travelling as a replacement for the estimation by steering behavior. Further studies should examine the influence on the detection rates of a remote system.