Sonically-enhanced in-vehicle air gesture interactions: evaluation of different spearcon compression rates

Tabbarah, Moustafa; Cao, Yusheng; Fang, Ziming; Li, Lingyu; Jeon, Myounghoon

doi:10.1007/s12193-024-00430-3

Sonically-enhanced in-vehicle air gesture interactions: evaluation of different spearcon compression rates

Original Paper
Open access
Published: 19 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Sonically-enhanced in-vehicle air gesture interactions: evaluation of different spearcon compression rates

Download PDF

Moustafa Tabbarah¹,
Yusheng Cao²,
Ziming Fang²,
Lingyu Li¹ &
…
Myounghoon Jeon ORCID: orcid.org/0000-0003-2908-671X^1,2

239 Accesses
1 Altmetric
Explore all metrics

Abstract

Driver distraction is a major contributor to road vehicle crashes, and visual distraction caused by using in-vehicle infotainment systems (IVIS) degrades driving performance and increases crash risk. Air gesture interfaces have been developed to mitigate driver distraction, and using auditory displays showed a decrease in off-road glances and an improved perceived workload. However, the potential of auditory display was not fully investigated. The present paper presents directional research in the design of auditory displays for air-gesture IVIS through a dual-task experiment of driving a simulator and air-gesture menu navigation. Twenty-four participants utilized the in-vehicle air gesture interfaces while driving with four auditory display conditions (text-to-speech, 70% compressed spearcons, 40% compressed spearcson, and no sound). The results showed that 70% spearcon reduced visual distraction and increased menu navigation accuracy but with increased navigation time. 70% spearcon was most preferred by the participants. Driving performance or workload did not show any difference among the conditions. Implications are discussed with the design guidelines for future implementations.

In-vehicle air gesture design: impacts of display modality and control orientation

Article 14 September 2023

Design and evaluation of auditory-supported air gesture controls in vehicles

Article 15 March 2019

Augmenting Automotive Gesture Infotainment Interfaces Through Mid-Air Haptic Icon Design

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Introduction to driver distraction and IVIS design

Imagine going on a road trip with a friend driving their car. The first thing you notice is that your friend is overly invested in the touchscreen to navigate the In-Vehicle Infotainment System (IVIS). Alas, there is no auditory feedback from the touchscreen. This vignette highlights a number of human factors problems commonly seen in the driving context. The driver is distracted visually to see the touchscreen, physically to reach out to touchscreen, and cognitively to manipulate the content (NHTSA 2010; [87]). Can the IVIS be designed differently to minimize these distractions? This question motivated us to conduct our research.

Multiple resource theory explains that a human has different attentional resources. When people do more than one task, its performance will suffer less if the two tasks draw on different resource pools (e.g., visual type: focal vision vs. peripheral vision, modality: visual vs. auditory) than if they draw on the same pool [82]. For example, when the primary task (driving) requires the visual resource, the secondary task (IVIS) can be designed to require non-visual resources [77]. Because mid-air hand gestures are monitored through people’s proprioceptive senses, they can be performed with minimal visual attention, presenting an advantage over button or touchscreen controls [45]. Thus, gesture-based interfaces have emerged as a positive alternative to touchscreen interfaces for IVIS interactions by significantly decreasing off-road glances and producing lower driver workload [30, 71]. In the same line, auditory displays have been explored by researchers in driving environments and shown to decrease driver visual distraction [33, 44, 68]. Adopting both approaches, the present study aims to enhance driving safety, usability, and workload in gesture navigation by investigating the effectiveness of adding auditory displays.

1.2 Optimizing IVIS interactions: the role of auditory displays

Research has shown that a combination of gesture-based interaction and auditory display can be optimal in reducing driver distractions [65, 71]. However, the choice of auditory display should be made carefully. Although well-implemented auditory systems have the potential to reduce visual distraction and keep a driver’s eyes on the road, poorly implemented systems may have an adverse effect by imposing high levels of mental demand on the driver, requiring long glances away from the roadway to acquire information about the system status. To utilize the best of their potentials, auditory displays must be made sufficiently simple presenting accurate feedback to reduce a driver’s cognitive demand [13]. In the previous study, spearcons (compressed speech) [79] have outperformed other auditory displays, including earcons [6] and auditory icons [27]. However, the details of the spearcon design for the in-vehicle gesture interface context have not been determined yet. The current study seeks to identify this design granularity to make it a truly user-centered interface.

1.3 In-vehicle air gesture menu navigation system

Before discussing the specific application of air gesture systems in vehicles, it is crucial to understand other broader category of technologies that can be applied to the in-vehicle context. Spoken Dialogue Systems (SDS) represent a complex integration of human–computer interaction technologies designed to enable verbal communication between users and systems. These systems, which rely on speech recognition, natural language processing, and speech synthesis, facilitate tasks ranging from simple queries to complex procedural interactions [49, 86]. They are already pervasive in the vehicle contexts, but still not perfect. For example, they have problems such as language issues, detection issues, noise, repetitive control, precise control, etc. Employing touchscreens with multi-touch capabilities could be an alternative approach. It will allow drivers to use gestures they are accustomed to from smartphones [31]. However, it still requires drivers to reach out to the system, which will add physical demand. Research also shows that well-designed gesture interfaces can reduce visual distraction compared to touchscreens [71].

Transitioning from these dialogue-focused interfaces to more controllable graphical user interfaces, the systems use the WIMP paradigm—standing for 'windows, icons, menus, pointer,' describing a framework in user interface design that uses graphical elements to facilitate human–computer interaction. This interaction style, which became the foundation for most graphical user interfaces, relies on the use of graphical elements such as windows to contain different tasks, icons to represent functions and files, menus for command selection, and a pointer to navigate and select items [11]. The choice to employ a WIMP-based interaction style in our study was driven by its familiarity to most users and its proven efficiency in computer systems. While other interaction styles such as those based on touch or voice commands could also be considered, the WIMP approach provides a reliable and well-understood framework for extending traditional graphical user interfaces into the in-vehicle context, potentially easing the learning curve and enhancing the user's sense of control when interacting with the system.

Different menu types have also been investigated. For example, May et al. [47] designed a one-dimensional menu system in which users could scroll up or down to navigate a list of up to eight common in-vehicle functions. On the other hand, Sterkenburg et al. [71] developed two-dimensional grid menu navigation prototypes (2 × 2 or 4 × 4) in which users could move their hand along two axes (left–right and up-down) moving a visual cursor from a menu item to another. We followed the latter design and extended it by adding three pages.

There have been various evaluation studies around the in-vehicle air gesture menu navigation. Researchers have compared the gesture system to the existing touchscreen system, e.g., Gable et al. [24], Graichen et al. [30], May et al. [47], Sterkenburg et al. [71], and Wu et al. [84]. For example, Walker and his colleagues [24, 48, 84] showed that driving performance was equivalent between the two systems, but air gesture system resulted in more short glances away from the road and their participants perceived more overall workload when using the air gesture menu navigation system. In contrast, Sterkenburg et al. [71] showed that both systems resulted in comparable driving performance and driver workload. The auditory-supported air gestures allowed drivers to visually focus more on the road, but slightly decreased secondary task performance compared to the touchscreen. Given that people are already familiar with touchscreen systems, the outcome of the secondary task performance is understandable. Note that only an auditory-supported air gesture system led to improved visual attention, which demonstrates the importance of auditory displays in the context. In the subsequent experiment Sterkenburg et al. (2023) evaluated the control orientation – horizontal (mouse metaphor using x and z axes) vs. vertical (direct manipulation using x and y axes). Even though there were no different performance results, vertical controls showed significantly lower workload than horizontal controls. Thus, the present study adopted the vertical control method.

1.4 In-vehicle air gesture system with auditory displays

Research then converged towards evaluating feedback modalities: unimodally, bimodally or trimodally. May et al. [47], Jaschinski et al. [32], Shakeri et al. [65, 66], and Sterkenburg et al. [70,71,72] evaluated the auditory modality. Large et al. [43] and Shakeri et al. [65, 66] evaluated the tactile modality. Roider and Raad [57] and Shakeri et al. [65, 66] evaluated the peripheral visual modality.

Among research efforts that aimed at evaluating the auditory modality, May et al. [47] provided fast but intelligible speech feedback, as well as non-speech sounds for system status feedback while navigating the one-dimensional menu list. Sterkenburg and colleagues [70,71,72] conducted multiple evaluations to examine the effects of speech displays on in-vehicle air gesture controls for a two-dimensional grid menu. They showed that the auditory and visual feedback lowered the frequency of off-road glances and reduced driver workload. However, the addition of auditory displays did not have any significant impact on lane departures or secondary task performance. All three evaluations [70,71,72] showed a significant improvement of visual distraction without degrading driving performance, presenting a common inference that prototypes with auditory displays have a clear improvement compared to prototypes without. Two other evaluations were conducted by Shakeri et al. [65, 66] to assess different types of feedback-visual, auditory, haptic and peripheral visual (2017), and bimodal feedback when added to ultrasound feedback (2018) for a sequential gesture execution secondary task. In both evaluations, auditory feedback was presented in the form of earcons directly mapped to six gestures and presented after a gesture was executed. Shakeri et al. [65] showed that auditory feedback resulted in better secondary task performance than tactile feedback but worse than visual feedback, and significantly improved time spent looking away from the road. However, all feedback conditions resulted in similar driving performance. Shakeri et al. [66] then provided conforming results as the bimodal auditory-ultrasound condition provided less time looking away from the road than visual and ultrasound-visual conditions while driving performance was similar across all conditions. Additionally, the use of auditory feedback resulted in the numerically highest secondary task performance that is significantly better than unimodal ultrasound feedback and was preferred by 47% of participants, and showed significantly less physical demand compared to visual conditions. Recently, Moustafa et al. (2023) compared auditory icons, earcons, spearcons, and no sound condition in the context of in-vehicle air gesture menu navigation. They showed that spearcons provided the least visual distraction, least workload, best system usability and were favored by participants.

Although some mixed results exist, the potential of auditory displays to improve driving safety when multitasking with an IVIS air-gesture interface is undeniable. Whereas the use of auditory displays showed benefits for air gesture IVIS operation in general and menu navigation in specific, there has been no in-depth analysis on how to design each auditory display type. Most auditory menus have explored the use of speech or earcons. Although Moustafa et al. (2023) investigated different auditory displays (auditory icons, earcons, spearcons), the study used only one design for each auditory cue. There still exists an ambiguity concerning how auditory displays should be designed and how different auditory supports for air gesture navigation affect primary driving safety, then secondary task performance. Therefore, we aim to bridge the existing gap in the literature by conducting an exploratory study to ultimately provide informed design guidelines on spearcons, which showed the best outcome in the literature.

1.5 Auditory displays in vehicles

Much research has been conducted on the use of auditory displays inside the vehicle, either to support the driving task such as with warning signals [29] or to support secondary tasks such as with the navigation of infotainment systems [33, 70]. Researchers commonly classify auditory displays under two labels: non-speech sounds such as earcons and auditory icons, and speech sounds.

Earcons [6] are non-verbal synthetic sounds that are usually expressed as abstract musical tones or sound patterns, and can be used with structured combinations such as menus, and usually have an arbitrary relationship with the referent item or action. Auditory icons [27] are non-verbal brief sounds that are associated with objects, functions or actions,they use elements of the analogic sound of the referent. Auditory icons utilize familiar sounds from the environment, making them immediately recognizable and intuitive for conveying information or alerts [27]. Auditory icons also offer a repertoire of sound options to map with the referent, as they can directly represent the referent using a sound it produces, or can be indirectly related using a sound produced by a surrogate of the referent [36]. Spearcons (“Speech-based Earcons”) [80] are brief auditory cues that are created through a text-to-speech (TTS) algorithm, then time-compressed to create a faster speech without altering the tone. Spearcons provide a direct, non-arbitrary mapping to the item they represent. Although spearcons are based on speech, they can become unintelligible and can be classified as non-speech auditory cues.

Sabic et al. [60] examined the recognition of auditory icons, spearcons at two compression speeds (40% and 60% of original length) and TTS as car warning signals. They showed that auditory icons had significantly lower recognition accuracy than TTS, while spearcons’ accuracy was not significantly different from either but numerically better than auditory icons. Auditory icons also had significantly slower reaction times compared to spearcons and TTS, and 40% spearcons produced significantly faster response times compared to TTS. Results also indicated no significant differences between the 40% spearcons and 60% spearcons in terms of accuracy, reaction time, perceived temporal demand, and perceived annoyance. Moreover, a trend in the reaction time data suggested a direct correlation with the compression rate of spearcons: a 40% compression yielded the fastest reaction times, whereas a 100% compression (full speech) resulted in the slowest. Sabic et al. [59] assessed the effectiveness of spearcons, TTS and auditory icons under various background noise conditions while driving in terms of recognition accuracy, reaction time and inverse-efficiency scores. Overall, auditory icons were the least efficient, and spearcons only outperformed TTS in quiet environments without added noise sources such as music or talk-radio.

In terms of air gesture menu navigation tasks, the only one study compared all three non-speech auditory cues, including auditory icons, earcons, and spearcons (Tabbarah et al. 2023). They showed that spearcons reduced the visual distraction and workload, led to the best system usability and were most favored by participants, which led to the present study being conducted with spearcons.

The present study aims to enhance driving safety, usability, and workload in gesture navigation by investigating the effectiveness of adding auditory displays. Specifically, it focuses on the effects of different spearcon compression rates on these factors in the context of in-vehicle air gesture menu navigation.

2 Current study and hypotheses

Although the use of spearcons has shown positive results, alternative spearcon design is still scarce with a few exceptions [16, 58, 60, 69]. Sabic et al. [69] evaluated 40% and 60% compression speeds within a larger evaluation of auditory car warning displays recognition (2017), and thoroughly examined spearcon recognition at different compression speeds ranging from 100% (TTS) to 20% with 10% decrements (2016). As opposed to the other evaluations of auditory displays [17, 59, 60, 75], Sabic and Chen [58] did not provide any training on auditory displays before conducting their study. They evaluated the ability of participants to recognize a spearcon word without prior training and identified a 75% intelligibility threshold at 40% compression beyond which identification rates declined rapidly. 40% spearcons and TTS resulted in similar recognition efficiency even though 40% spearcons were responded to significantly faster, suggesting the presence of a tradeoff between compression speed and reaction time on one hand and accuracy on the other. Srbinovska et al. [69] examined the impact of training on spearcon recognition at different compression rates. Training significantly increased recognition rates when compared to untrained scenarios at 20% compression speed (82–89% versus 25–47%), 25% compression (84–95% versus 39–56%) and unintelligible compressions down to 10% compression speed (82–95% versus 6–16%) on a word-by-word basis. Trained participants expressed more confidence in their ability to recognize spearcons and found the recognition task to be less difficult with an increased familiarity. Finally, Davidson et al. [16] evaluated trained spearcon recognition in a dual-task environment while performing linguistic tasks such as reading, saying and listening. They found that sound-producing secondary tasks (saying and listening) worsened spearcon identification when multitasking while non-sound-producing tasks (reading) did not, generating a competition for the auditory modality and in verbal processing resources, conforming with the multiple resource theory.

From this background, the current study investigated in-vehicle air gesture menu navigation interfaces with a focus on alternative spearcon designs varying the compression rate. A significant gap exists in understanding the effects of speech compression in dual-task conditions, especially while driving. To this end, we posed the following research questions.

RQ1: How does adding spearcons affect air-gesture IVIS interaction in terms of driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience?

Hypothesis 1: Adding spearcons will improve driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience compared to text-to-speech (TTS) or no auditory display condition.

Literature shows that spearcons have a faster recognition time than TTS, and that 40% spearcon is the threshold at which participants understand 75% of the words. Nonetheless, spearcon recognition was evaluated as a primary task only, in which participants were able to allocate all of their cognitive resources towards the stand-alone task of spearcon recognition. It would be of interest to test if the recognition of 40% spearcon in a secondary task context would require more cognitive resources and would affect selection times as well as more visual demand for menu navigation. The inherent design of a spearcon requires the manipulation of a temporal aspect, and we suspect that 40% spearcon may induce urgency, hence, may result in higher perception of temporal demand.

RQ2: How do different spearcon compression rates affect air-gesture IVIS in terms of driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience?

Hypothesis 2: 70% spearcon will provide the most efficient secondary task performance with faster selection times than 40% spearcon and TTS, and less mental and temporal demand compared to 40% spearcon.

Hypothesis 3: 40% spearcons will result in higher visual distraction than 70% spearcons and TTS, but less than the no auditory display condition.

3 Methods

3.1 Menu and interaction design

We developed a 2 × 2 grid menu selection system (see Fig. 1) with four square targets: each measuring 10 × 10 cm in the air gesture space, inspired by Sterkenburg et al.’s [71] design that showed most efficient. To expand the number of menu items, we created two additional pages as to allow the user to access 3 (pages) * 4 (options) = 12 menu choices,better representative of high-level main menu structures in real in-vehicle displays. To preserve the level of fidelity established, each of the 12 menu items represented an IVIS option present in commercial vehicles. For four auditory display conditions, we generated four sets of equivalent menu items (Table 1).

Table 1 Menu sets for experiment

Full size table

Our gesture menu selection system is comprised of four gestures, each mapped to an IVIS action: System Activation, Search & Navigation of menu page, Switching between menu pages, and Selection (see Table 2). We introduced an “activation gesture” to initiate the operator-system interaction to avoid accidental gestures due to inadvertent hand movements. To achieve stimulus–response compatibility of the gestures used, we decided to use the familiar swiping motion to navigate between pages similar to how a finger-swipe is used on touchscreens and smartphones. To keep the design simple, the menu system wrapped around with a unidirectional swipe. If participants were at page 1, they can only swipe right to switch to page 2, then to page 3, then back to page 1. A swipe right gesture was hence synonymous with a “next page” command. For Selection, users can tap the menu item selected.

Table 2 Gesture and action library

Full size table

Visibility of system status is fundamental to an interactive UI design. Particularly with in-vehicle interfaces, we aim to provide efficient continuous visibility about the gesture menu IVIS operation. Air gesture interfaces present an additional challenge in communicating system status by the need to inform the user about hand tracking and gesture recognition states [23, 28]. Accordingly, we included a visual display to inform the user about gesture recognition status presented in the form of simple binary visual feedback that can be noticed by the user’s peripheral vision. A green background indicated that the system was activated, and a red highlight indicated a user’s hand position within the 3D menu, as depicted in Fig. 1. For all conditions, two sound cues were implemented as feedback to inform a successful gesture execution. A “swoosh” sound was provided after a swiping gesture, and a generic digital confirmation sound (“click”) was provided after a selection was made.

3.2 Spearcon design

Spearcons were created by using an online text-to-speech (TTS) engine with an American male voice, followed by applying the SOLA (synchronized overlap add method) algorithm of the Spearcon Factory software [79] to generate spearcon WAV files. We chose to evaluate spearcons at two compression rates: 40% and 70% of the original TTS length. Sabic and Chen [58] identified the intelligibility threshold of spearcon being at the 40% compression speed. Moreover, Sabic and Chen [58] and Sabic et al. [60] found that 40% spearcon were responded to significantly faster than TTS. 70% is a default compression rate of the spearcon factory [79]. In addition to showing the best performance in Tabbarah et al. (2023), 70% spearcon had a numerically faster recognition time than TTS for words that were unrelated [58]. We consequently chose to evaluate 40% spearcon and 70% spearcon in the experiment. Participants did neither receive training regarding spearcons nor were they provided with a visual representation of menu items within the menu structure. Training was shown to increase spearcon recognition from 52 to 89% for 25% spearcons [69]. The absence of training encouraged the process of recognition rather than recall and provided stronger inferences about the use of spearcons for bigger menu structures that contain a larger number of menu items.

3.3 Experimental design and independent variable

The study followed a within-subject repeated measure factorial design. Each participant engaged in a 90 min-session and experienced all four auditory display conditions: 40% spearcon, 70% spearcon, TTS, and a no auditory display condition (see Table 3). The order in which participants experienced auditory conditions was fully counterbalanced to minimize order effects.

Table 3 Experimental design

Full size table

3.4 Dependent measures

The dependent measures for this study were classified into five categories: driving (primary task) performance, eye glance behavior, menu navigation (secondary task) performance, perceived workload, and user experience.

3.4.1 Driving performance

Driving behavior can be explained as the actions taken by the driver to “maintain lateral and longitudinal control of the vehicle to safely move the occupants of a vehicle from one point to another” (Smith 2018). Four driving metrics were recorded, and their standard deviations were measured and used as dependent variables in this study. Standard deviation measures how spread and dispersed the data are relative to the mean, which is indicative of driving consistency and drivers’ ability to maintain control of their vehicle while performing a non-driving secondary task. The driving dependent measures are hence:

Standard deviation of following distance (the distance maintained by the driver between their vehicle and the leading vehicle directly ahead): indicative of longitudinal vehicle control
Standard deviation of lane deviation: indicative of lateral vehicle control
Standard deviation of steering wheel angle: indicative of lateral vehicle control
Standard deviation of vehicle speed: indicative of longitudinal vehicle control

3.4.2 Eye glance behavior

NHTSA (2012) guidelines indicate that 85% of off-road eye glances should last less than 2 s. A naturalistic analysis conducted on the last five seconds prior to a near-crash incident discovered that drivers had an average longest off-road glance lasting 1 s [39]. Of those glances, 36% targeted the visual display of an IVIS or locations similarly away from the forward roadway. To understand what a glance is, we first need to define a gaze. A gaze can be explained as the direction towards which the eyes are directed. A glance is hence defined as the transition to or from the area of interest (AOI) and maintaining visual gaze within the boundaries of the AOI for at least one fixation.

Accordingly, eye glances were placed into three categories based on their duration: Short (< 1 s), medium (1-2 s), and long (> 2 s). Four total variables were hence evaluated:

Frequency of short, medium, and long glances
Dwell Time: total glance duration for a single menu selection task

3.4.3 Menu navigation performance

Selection accuracy: the percentage of correct selection tasks during a single driving scenario.
Selection time: time elapsed between the offset of the auditory selection command and the execution of a selection gesture.

3.4.4 Workload

Subjective workload was measured using the commonly used NASA-TLX tool (Hart 1988). Participants rated, on a 20-point scale, their perceived workload on six subscales: mental demand, physical demand, temporal demand, effort, performance, and frustration. Then, they performed pairwise comparisons between the six categories based on which one contributed more to their overall workload. A weighted average was then calculated to indicate perceived overall workload, and results were presented in a percentage format as scores out of a maximum of 100.

3.4.5 User experience

System Usability Scale (SUS): widely used by usability practitioners to assess the usability of a product or service [9]. This “quick and dirty” survey consists of 10 questions of a 5-point Likert scale. The resulting SUS score varies within the 0–100 range. Bangor et al. (2009; 2008) described two qualitative interpretations of SUS scores.
Sound user experience questionnaire: given only in the conditions that contained an auditory display. The answers were given on a 5-point Likert scale.
Preference choice among four sound conditions.

3.5 Apparatus

3.5.1 Driving simulator

A medium fidelity national advanced driving simulator (NADS) MiniSim driving simulator (see Fig. 2) was used to simulate driving scenarios. Each driving scenario lasted around six minutes and consisted of a car following scenario on a suburban route with low to moderate traffic. The speed of the lead vehicle varied between 35 and 50 mph. Participants were instructed to maintain a uniform safe distance while following the lead vehicle (Figs. 3, 4).

3.5.2 LEAP motion

Leap Motion Controller (Model LM-010) used to detect and track participants' hand movements within its interactive zone. This zone extends more than 60 cm (24 inches) from the device, within a field of view spanning approximately 140 × 120 degrees. The controller's software is engineered to recognize 27 different hand elements, such as bones and joints, and is capable of maintaining tracking fidelity even when these elements are partially occluded by other parts of the hand. The device is equipped with two near-infrared cameras, each with a resolution of 640 × 240 pixels and spaced 40 mm apart. These cameras operate within an 850 nm ± 25 spectral range and typically capture images at a rate of 120 frames per second, allowing for precise motion detection within 1/2000th of a second.

3.5.3 Eye tracker

Tobii Pro Glasses 2 eye-tracking device (sampling rate of 50 Hz) was used to capture participants’ glance behavior during the study.

3.6 Participants

A power analysis was conducted to determine the minimum sample size needed to achieve 80% power with a medium effect size; a sample size of 24 was necessary. Accordingly, a total of 26 participants were recruited in a similar manner to the experiment. One participant was compensated and excused from the study as the eye tracking device did not calibrate to their eyes. Another participant’s driving data came out corrupt; we hence did not include their data. The 24 participants (14 males and 10 females; age: M = 23.25, SD = 1.88) whose data were included in the analysis came from 12 different countries. Language proficiency was not a variable of concern in this study as all participants recruited were fluent English speakers and were all students at Virginia Tech with no participants included who did not speak English. To be accepted into Virginia Tech, students need to meet the requirements for proficiency in English language. Each participant’s session lasted a maximum of 1 h and 30 min, and each participant was compensated with $15 for their time and contribution.

3.7 Procedure

Participants were first briefed about the experiment and signed a consent form, then conducted a short driving scenario serving as a training and as a simulation sickness test run [25]. Each participant watched a video tutorial on how to manipulate the menu-gesture system, and then, was offered as much time as needed to practice. Before each of the four driving scenarios, participants were introduced to the auditory display and given sufficient time to practice the hand gestures and get familiar with the interface. Familiarization with the auditory display was self-assessed, with participants proceeding only after they felt comfortable they could navigate through the system and interpret the auditory cues. During each data collection scenario, participants were asked to perform 12 trials of the secondary menu navigation task. The task consisted of a one-second command instructing the participant to select one out of the twelve menu items. After the command was given, a timer started and the participant had 20 s to perform a selection. If the 20 s elapsed and the participant did not make any selection, it was considered a failed attempt. The menu gesture system was designed to only allow one selection per command. An inadvertent selection was also considered a failed attempt. Secondary task commands were spaced out, with 25–35 s separating them. Following each driving scenario, participants completed the NASA-TLX workload assessment, responded to a subjective questionnaire, and filled out the System Usability Scale (SUS) questionnaire. Upon finishing all four scenarios, participants were asked about their preferred auditory condition.

3.7.1 Leap motion controller and gesture interaction

While the Leap Motion Controller provides high-resolution real time tracking data, its performance can be affected by environmental factors such as lighting, hand orientation and hand proximity. These factors were mitigated through a controlled lab environment with consistent lighting and orientation guidance. Furthermore, an introduction video was provided to familiarize participants with the proper hand placement and movements required to interact effectively with the system. Additionally, participants engaged in a practice session to ensure comfort and reduce the likelihood of tracking errors during the actual experiment. The setup aimed for an error margin within acceptable limits defined by the accuracy needed for the hand gestures to be recognized by the system.

3.8 Data analysis

For all data, we intended to conduct a repeated-measure analysis of variance (ANOVA) in which the auditory display condition is treated as a within-subject variable. To ensure the reliability of the obtained results, we checked for the parametric assumptions of the repeated-measure analysis of variance (ANOVA): normality of residuals, and sphericity.

Normality assumption was checked for qualitatively by observing the normal quantile plot and the histogram frequency distribution, and quantitatively using the Shapiro–Wilk goodness-of-fit test with a significance level of 0.05.

Sphericity was checked using Mauchly’s Test of Sphericity with a significance level of 0.05.

Depending on whether the data violated ANOVA assumptions or not, parametric or non-parametric analyses were performed to analyze the data. A one-way repeated measures ANOVA was conducted to analyze the data conforming with the assumptions. Partial eta-squared was also calculated to measure the effect size. When significant main effects were present, a post-hoc paired samples t-test was conducted, using the Bonferroni adjustment to control the Type-I error. All parametric tests were conducted using JMP 16.0 (SAS Institute Inc., 2020). When departures from the ANOVA assumptions were presented, an appropriate transform was applied onto the data (e.g., logarithmic, square-root, exponential) to satisfy the assumptions. When no data-transformation was appropriate, non-parametric tests were conducted. For all non-parametric data, including ordinal data, the Friedman test was conducted, followed by a Bonferroni-corrected. Wilcoxon signed rank test was used for pairwise comparisons when applicable. All non-parametric tests were conducted using the 2022 QI Macros statistical tools add-in on Excel (KnowWare International Inc., 2022).

4 Results

4.1 Driving performance

A logarithmic transformation was performed on the mean and standard deviation of following distance to satisfy parametric assumptions. A repeated measures ANOVA was conducted on following distance, lane deviation, and vehicle speed data. The standard deviation of steering wheel angle did not meet parametric assumptions; the Friedman test was conducted. ANOVA and Friedman test results are presented in Table 4; there were no significant differences between different auditory display conditions for all driving performance metrics.

Table 4 ANOVA and Friedman test results for driving performance metrics. p < 0.05 indicates significant main effect of the auditory display condition

Full size table

4.2 Eye glance behavior

Table 5 presents a summary of the eye tacking data collected during the study. Glance frequency and dwell time results are per selection task. The number of glance free selections is presented per driving scenario, in which each participant performed 12 selection tasks.

Table 5 Descriptive statistics of eye glance behavior. Glance Frequency and Dwell Time are per selection task. Number of glance-free selections is per driving scenario

Full size table

4.2.1 Short glance frequency

ANOVA results revealed significant differences in short glance frequency between different auditory display conditions, F(3, 69) = 5.4113, p = 0.0021, η²_p = 0.19. Post-hoc paired-samples t-tests were conducted, and the results are presented in Table 6.

Table 6 Results of the post-hoc paired-samples t-tests conducted for multiple pairwise comparisons of the frequency of short glances

Full size table

4.2.2 Medium and long glance frequency

Medium and Long frequency data did not meet parametric assumptions; the Friedman test was conducted. Results from Friedman tests conducted for each dependent variable are presented in Table 7; there were no significant differences between different auditory display conditions.

Table 7 Results of the four paired-samples Friedman test on medium and long glance frequency. p < 0.05 indicates significant differences

Full size table

4.2.3 Dwell time

An exponential transformation was performed on dwell time data to satisfy parametric assumptions. ANOVA results revealed significant differences in dwell time between different auditory display conditions, F(3, 69) = 6.0898, p = 0.0010, η²_p = 0.21. Post-hoc paired-samples t-tests were conducted, and the results are presented in Table 8.

Table 8 Results of the post-hoc paired-samples t-tests conducted for multiple pairwise comparisons of dwell time

Full size table

4.2.4 Number of glance-free selections per driving scenario

ANOVA results revealed significant differences in the number of glance-free selections between different auditory display conditions, F(3, 69) = 5.0984, p = 0.0030, η²_p = 0.18. Post-hoc paired-samples t-tests were conducted, and the results are presented in Table 9.

Table 9 Results of the post-hoc paired-samples t-tests conducted for multiple pairwise comparisons of the frequency of short glances

Full size table

4.3 Manu navigation performance

4.3.1 Selection accuracy

An exponential transformation was conducted on accuracy data to satisfy parametric assumptions. ANOVA results showed a significant main effect of auditory displays on selection accuracy, F(3, 68) = 3.642, p = 0.017, η²_p = 0.14 (Fig. 5). Post-hoc paired t-tests with a Bonferroni corrected alpha level of 0.083 revealed that 70% spearcons (M = 93.75%, SD = 6.14%) resulted in significantly higher secondary task selection accuracy than TTS (M = 86.46%, SD = 10.66%), t(23) = 3.58, p = 0.0016. Although not significant in terms of the Bonferroni corrected alpha, 70% spearcons had a tendency to show higher secondary task selection accuracy than 40% spearcons (M = 85.76%, SD = 15.64%), t(22) = 2.71, p = 0.0125.

4.3.2 Selection time

A logarithmic transformation of the data was performed in order to satisfy the parametric assumptions. An ANOVA on the transformed data revealed a significant main effect of auditory displays on selection time, F(3, 68) = 4.5551, p = 0.0057, η²_p = 0.167 (Fig. 6). Post-hoc paired samples t-tests with the Bonferroni corrected alpha value of 0.0083 performed between auditory display conditions revealed that 70% spearcon (M = 9.91, SD = 2.33) resulted in significantly slower selection times than the no auditory display condition (M = 8.62, SD = 2.04), t(22) = 3.36, p = 0.0029. Although not reaching the 0.0083 statistical significance, 70% spearcon had a tendency to show a slower selection times than TTS (M = 9.01, SD = 2.04), t(23) = 2. 79, p = 0.0104. Table 10 presents selection times across all four auditory display conditions.

Table 10 Descriptive statistics of Selection Time

Full size table

4.4 Perceived workload

ANOVA results did neither reveal a significant effect of auditory displays on any of the six subcategories of the NASA-TLX, nor on the overall workload score. NASA-TLX results are presented in Fig. 7, and ANOVA results are summarized in Table 11.

Table 11 F*, P-value, and partial Eta-squared results of the ANOVA conducted on NASA-TLX results

Full size table

4.5 User experience

4.5.1 System usability scale (SUS)

A summary of system usability (SUS) score results is presented in Table 12. Systems having a SUS score superior to 72.75 are described as “good”, and systems scoring above 70 are considered “acceptable, as per Bangor et al. [4]. According to those standards, the auditory-supported air-gesture menu navigation interfaces with all four auditory display conditions are considered good and acceptable.

Table 12 SUS scores and adjective description for all auditory display conditions

Full size table

4.5.2 Auditory display user experience questionnaire

Participants filled out a questionnaire about how they perceived each speech-based auditory display after interacting with it. Table 13 and Fig. 8 show the questionnaire results about how participants perceived different each of the auditory displays based on 7 sound characteristics. The auditory display that presented the best outcome is highlighted in blue.

Table 13 Sound questionnaire descriptive statistics

Full size table

4.5.3 User preference

At the end of the study, participants were asked to rank the four auditory display conditions based on their preference. The results are depicted in Fig. 9. Eleven participants chose 70% spearcon condition as their first choice.

5 Discussion

5.1 Revisiting the results

5.1.1 Driving performance

The results of this study showed that the auditory display conditions had no influence on driving performance. These results are in agreement with [70, 72] findings that speech-based feedback did not improve driving performance compared to the absence of any auditory display while navigating a 2 × 2 grid menu with air-gesture controls. Also, the results are in line with Tabbarah’s previous study (2023) which showed that 70% spearcon and the no auditory display condition did not reach the difference. This result also agrees with the previous study [65] in which there was no driving performance difference among the multimodal conditions, including visual, auditory, and tactile feedback for the in-vehicle gesture interface.

5.1.2 Eye glance behavior

Results showed that the auditory display conditions significantly influenced eye glance behavior (short glance frequency, dwell time, and number of glance-free selections per driving scenario) than the no auditory display condition. Descriptive statistics of short glance frequency, dwell time and number of glance-free selections suggest the presence of a consistent pattern; 70% spearcon induced the least visual distraction, closely followed by TTS, then by 40% spearcon, and lastly by the no auditory display condition which induced the numerically highest visual distraction across all different measures. Post-hoc pairwise comparisons between auditory display conditions strongly indicate that the use of 70% spearcon or TTS is visually safer than the no auditory display condition. A closer look at the direct comparison between spearcon conditions at different compression rates reveals a sizable difference. 70% spearcon induced 34% less short off-road glances than 40% spearcon. Participants utilizing 70% spearcon spent on average 32% less time looking away from the road towards the visual display than when utilizing 40% spearcon. Additionally, navigating the menu system with 70% spearcon feedback resulted in 44% more glance-free selections than 40% spearcon. Even though only the frequency of short glances revealed differences with a conservative significance less than 0.0083, all other spearcon comparisons had the same tendency (less than 0.05: p = 0.014 for dwell time, and p = 0.022 for glance-free selections) demonstrating the similar trends.

The pattern of visual distraction displayed by 70% spearcon, TTS, and the no auditory display condition is consistent with the results from Tabbarah (2023) and what is seen in literature. [70,71,72] found that TTS significantly reduced the frequency of off-road glances for in-vehicle air-gesture grid menu navigation, and Larsson and Niemand [44] found that spearcon reduced the frequency of off-road glances while decreasing dwell time compared to the absence of auditory displays. Both findings were replicated in our results. Nonetheless, no literature evaluated fast-paced spearcons in a driving context; our eye glance behavior results are the first inference on fast compressed speech in a driving context secondary task.

5.1.3 Menu navigation performance

Selection time and accuracy results reveal a significant effect of the auditory display condition. The selection accuracy of 70% spearcon (93.75%) was numerically superior to the cluster of accuracies of 40% spearcon, TTS and the no auditory display condition (85.76%, 86.46%, and 87.32% respectively). The only significant difference in accuracy was, however, between 70% spearcon and TTS. Also, the mean selection time of 70% spearcon (9.91 s) was significantly slower than the mean selection time of the no auditory display condition (8.62 s). Selections using 70% spearcon were also numerically slower than selections using 40% spearcon (9.08 s) and TTS (9.01 s).

Both accuracy and selection time results for TTS and the no auditory display condition conform with the findings of [70, 71] that there is no significant difference between auditory display conditions. Nonetheless, our results do not conform with Larsson and Niemand [44] that there was no difference between the selection times of the spearcon and the no auditory display condition while navigating a one-dimensional list menu while driving. However, major differences exist in the design of the in-vehicle menu system between our study and Larsson and Niemand’s. Those differences may be intrinsic to the interaction modality (air-gesture vs. button) and to the design of the menu structure (three-dimensional vs. one-dimensional list).

5.1.4 Perceived workload

There were no significant differences in all perceived workload measures across all auditory display conditions. However, there is an identifiable numerical trend in overall workload, mental demand and effort indicating relatively higher perceived workload for the no auditory display condition, followed by 40% spearcon, then 70% spearcon, and finally TTS inducing the numerically least perceived workload. The limited body of research that evaluated speech-based feedback in menu navigation agrees that speech-based feedback improves perceived mental demand and overall workload [34, 70,71,72]. Shakeri et al. [66] also showed that the auditory feedback condition showed significantly less physical demand compared to the visual condition. Therefore, the identified trend in our results conforms with the literature. The lack of significance, however, could be caused by the difference in experimental design; our design includes three speech-based auditory display conditions. The lack of significant difference in temporal demand between the 40% and 70% spearcon also conforms with findings by Sabic et al. [60] who examined the recognition of 40% and 60% spearcons as in-vehicle warning signals.

5.2 Revisiting the research questions and hypotheses

To achieve the goal of this body of research to increase understanding on how different attributes of auditory displays affect driving safety while interacting with in-vehicle information systems (IVISs) using mid-air gesture controls, the following research questions were devised.

R1: How does adding spearcons affect air-gesture IVIS interaction in terms of driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience?

Hypothesis 1: Adding spearcons will improve driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience compared to text-to-speech (TTS) or no auditory display condition.

R2: How do different spearcon compression rates affect air-gesture IVIS in terms of driving performance, eye glance behavior, secondary task performance, perceived workload, and user experience?

Hypothesis 2: 70% spearcon will provide the most efficient secondary task performance with faster selection times than 40% spearcon and TTS, and less mental and temporal demand compared to 40% spearcon.

Hypothesis 3: 40% spearcons will result in higher visual distraction than 70% spearcons and TTS, but less than the no auditory display condition.

Secondary task performance results show that menu selection using 70% spearcon resulted in the highest accuracy yet the slowest selection time. While using 70% spearcon, participants were significantly more accurate than when using TTS. Although not reaching statistical significance, 70% spearcon resulted in numerically higher accuracy than 40% spearcon and the no auditory display condition. As for selection time, 70% spearcon resulted in significantly slower selections than the no auditory display condition, and numerically slower selections than 40% spearcon and TTS. There is hence a tradeoff between selection accuracy and selection time while using 70% spearcon. Upon closer examination of participants’ subjective feedback, 70% spearcon was more comfortable to use than 40% spearcon and TTS. Multiple participants commented that 70% spearcon was easier to follow; it matched the speed of their hand movement or lined up with the speed at which they were reading. The slower selection time of the 70% spearcon condition might hence be associated with a higher level of comfort and ease of use. In terms of perceived workload, there were no significant differences between 40 and 70% spearcons. After careful consideration of all the results, we can infer that H2 is partially supported.

Statistically significant differences and numerical patterns of eye glance behavior results strongly indicate that 40% spearcon result in more visual distraction compared to 70% spearcon. Although there were no significant differences between 40% spearcon and TTS, 40% spearcon resulted in numerically higher short glance frequency and dwell time, and a numerically lower number of glance-free selections. As for the comparison between 40% spearcon and the no auditory display condition, there were neither statistically significant nor numerical differences in the frequency of short glances or dwell time. Therefore, we can infer that H3 is also partially supported. After careful consideration of all dependent measures’ outcomes, we can infer that adding spearcons resulted in differences in a number of dependent measures (RQ1). However, spearcons showed the speed-accuracy trade-off and the results also depend on the compression rate (RQ2). Therefore, taken together, H1 is also partially supported. In conclusion, 70% spearcons are recommended to use for the in-vehicle air gesture menu navigation system with further research studies being warranted.

6 Conclusion

This study explored in-vehicle air gesture menu navigation interactions by adding different compression rates of spearcons compared to TTS and no sound conditions. The results showed that 70% spearcons outperformed the other conditions in terms of reducing visual distraction and improving menu navigation accuracy. It was also the most preferred auditory display by the participants. However, 70% spearcons did not show any significant differences in driving performance and workload compared to the other conditions. It also showed a tradeoff between speed and accuracy in the menu navigation task, with 70% spearcons resulting in slower selection times but higher accuracy compared to no sound.

Based on these findings, we recommend using 70% spearcons for in-vehicle air gesture interfaces. There are some limitations to the current study that can be addressed in future research:

The LEAP motion sensor used for hand tracking has its own errors and limitations. As hand tracking technologies improve, the air gesture interactions are expected to become more robust and reliable.
The current study used a three-dimensional menu system, whereas most previous studies used a one-dimensional menu navigation task. Investigating more complex and realistic menu structures with appropriate auditory displays for each context would help refine the design guidelines.

In summary, this research provides a granular analysis of one key variable in the design of auditory displays—spearcon compression rates—for in-vehicle air gesture interactions. We believe this type of detailed evaluation of specific design parameters will lead to more user-centered interfaces that are optimized for the driving context. This study was conducted with manual driving, but it would be interesting to explore these interactions in the context of different levels of vehicle automation, as the driver's roles and responsibilities change. Further research can build upon these findings to develop robust multimodal interfaces that maximize the potential benefits of air gesture controls and auditory displays in vehicles. Future studies could also explore dynamic spearcon compression, adjusting the compression rates in real-time, to potentially enhance reaction times while still maintaining satisfactory accuracy levels. Such adaptive auditory feedback mechanisms could optimize user interaction by calibrating to the user's performance and preferences over time.

References

Babic T, Reiterer H, Haller M (2017) GestureDrawer: one-handed interaction technique for spatial user-defined imaginary interfaces. In: Proceedings of the 5th Symposium on Spatial User Interaction, pp 128–137. https://doi.org/10.1145/3131277.3132185
Baddeley AD, Hitch G (1974) Working memory. In: Bower GH (ed) Psychology of learning and motivation, vol 8. Academic Press, London, pp 47–89
Google Scholar
Bangor A (2009) Determining what individual SUS scores mean: adding an adjective rating scale. Science 4(3):10
Google Scholar
Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Hum-Comput Inter 24(6):574–594. https://doi.org/10.1080/10447310802205776
Article Google Scholar
Bellotti V, Back M, Edwards WK, Grinter RE, Henderson A, Lopes C (2002) Making sense of sensing systems: five questions for designers and researchers. Science 1:8
Google Scholar
Blattner MM, Sumikawa DA, Greenberg RM (1989) Earcons and icons: their structure and common design principles. Hum-Comput Inter 4(1):11–44. https://doi.org/10.1207/s15327051hci0401_1
Article Google Scholar
Brewster SA, Wright PC, Edwards ADN (1994) A detailed investigation into the effectiveness of Earcons
Brewster SA, Wright PC, Edwards ADN (1995) Experimentally derived guidelines for the creation of Earcons
Brooke J (1996) SUS-A quick and dirty usability scale
Butscher S (2014) Explicit & implicit interaction design for multi-focus visualizations. In: Proceedings of the ninth ACM international conference on interactive tabletops and surfaces-ITS ’14, pp 455–460. https://doi.org/10.1145/2669485.2669487
Card SK (ed) (1983) The psychology of human-computer interaction, 1st edn. CRC Press, London
Google Scholar
Cockburn A, Gutwin C, Greenberg S (2007) A predictive model of menu performance
Cooper J, Strayer D (2015) Mental workload of voice interactions with 6 real-world driver interfaces. https://doi.org/10.17077/drivingassessment.1543
Dachselt R, Hübner A (2007) Three-dimensional menus: a survey and taxonomy. Comput Graph 31(1):53–65. https://doi.org/10.1016/j.cag.2006.09.006
Article Google Scholar
Dam A van. (1997) Post-WIMP user interfaces
Davidson T, Ryu YJ, Brecknell B, Loeb R, Sanderson P (2019) The impact of concurrent linguistic tasks on participants’ identification of spearcons. Appl Ergon 81:102895. https://doi.org/10.1016/j.apergo.2019.102895
Article Google Scholar
Dingler T, Lindsay J, Walker BN (2008) Learnabiltiy of sound cues for environmental features: auditory icons, earcons, spearcons, and speech. https://smartech.gatech.edu/handle/1853/49940
Distracted Driving | Motor Vehicle Safety | CDC Injury Center. (2022) https://www.cdc.gov/transportationsafety/distracted_driving/index.html
Ecker R, Broy V, Butz A, De Luca A (2009) pieTouch: a direct touch gesture interface for interacting with in-vehicle information systems. In: Proceedings of the 11th international conference on human-computer interaction with mobile devices and services, pp 1–10. https://doi.org/10.1145/1613858.1613887
Findlater L, McGrenere J (2004) A comparison of static, adaptive, and adaptable menus. In: Proceedings of the 2004 conference on human factors in computing systems-CHI ’04, pp 89–96. https://doi.org/10.1145/985692.985704
Freeman E, Brewster S, Lantz V (2016) Do that, there: an interaction technique for addressing in-air gesture systems. In: Proceedings of the 2016 CHI conference on human factors in computing systems, pp 2319–2331. https://doi.org/10.1145/2858036.2858308
Fuentes CT, Bastian AJ (2010) Where is your arm? variations in proprioception across space and tasks. J Neurophysiol 103(1):164–171. https://doi.org/10.1152/jn.00494.2009
Article Google Scholar
Gable TM, May KR, Walker BN (2014) Applying popular usability heuristics to gesture interaction in the vehicle. In: Adjunct proceedings of the 6th international conference on automotive user interfaces and interactive vehicular applications, pp 1–7. https://doi.org/10.1145/2667239.2667298
Gable TM, Raja SR, Samuels DP, Walker BN (2015) Exploring and evaluating the capabilities of Kinect v2 in a driving simulator environment. In: Proceedings of the 7th international conference on automotive user interfaces and interactive vehicular applications, pp 297–304. https://doi.org/10.1145/2799250.2799276
Gable TM, Walker BN (2013) Georgia Tech Simulator Sickness Screening Protocol [Technical Report]. Georgia Institute of Technology. https://smartech.gatech.edu/handle/1853/53375
Gajos KZ, Czerwinski M, Tan DS, Weld DS (2006) Exploring the design space for adaptive graphical user interfaces. In: Proceedings of the working conference on advanced visual interfaces-AVI ’06, pp 201. https://doi.org/10.1145/1133265.1133306
Gaver WW (1986) Auditory icons: using sound in computer interfaces. Hum-Comput Inter 2(2):167–177. https://doi.org/10.1207/s15327051hci0202_3
Article Google Scholar
Goth G (2011) Brave NUI world. Commun ACM 54(12):14–16. https://doi.org/10.1145/2043174.2043181
Article Google Scholar
Graham R (1999) Use of auditory icons as emergency warnings: Evaluation within a vehicle collision avoidance application. Ergonomics 42(9):1233–1248. https://doi.org/10.1080/001401399185108
Article Google Scholar
Graichen L, Graichen M, Krems JF (2019) Evaluation of gesture-based in-vehicle interaction: user experience and the potential to reduce driver distraction. Hum Factors 61(5):774–792. https://doi.org/10.1177/0018720818824253
Article Google Scholar
Harrison C, Benko H, Wilson AD (2011) OmniTouch: wearable multitouch interaction everywhere. In: Proceedings of the 24th annual ACM symposium on User interface software and technology, pp 441–450
Jaschinski L, Denjean S, Petiot JF, Mars F, Roussarie V (2016) Impact of interface sonification with touchless gesture command in a car. In: D de Waard, A Toffetti, R Wiczorek, A Sonderegger, S Röttger, P Bouchner, T Franke, S Fairclough, M Noordzij, K Brookhuis (Eds.) Human factors and ergonomics society europe chapter 2016 annual conference, pp 35–46. https://hal.archives-ouvertes.fr/hal-01396196
Jeon M, Davison BK, Nees MA, Wilson J, Walker BN (2009) Enhanced auditory menu cues improve dual task performance and are preferred with in-vehicle technologies. In: Proceedings of the 1st international conference on automotive user interfaces and interactive vehicular applications-AutomotiveUI ’09, 91. https://doi.org/10.1145/1620509.1620528
Jeon M, Gable TM, Davison BK, Nees MA, Wilson J, Walker BN (2015) Menu navigation with in-vehicle technologies: auditory menu cues improve dual task performance, preference, and workload. Int J Hum-Comput Inter 31(1):1–16. https://doi.org/10.1080/10447318.2014.925774
Article Google Scholar
Jeon M, Walker BN (2011) Spindex (Speech Index) improves auditory menu acceptance and navigation performance. ACM Trans Access Comput 3(3):1–26. https://doi.org/10.1145/1952383.1952385
Article Google Scholar
Keller P, Stevens C (2004) Meaning from environmental sounds: types of signal-referent relations and their effect on recognizing auditory icons. J Exp Psychol Appl 10(1):3–12. https://doi.org/10.1037/1076-898X.10.1.3
Article Google Scholar
Khan Dim N, Kim K, Ren X (2017) An exploratory study of marking menu selection by visually impaired participants. IEEE Int Conf Smart Comput (SMARTCOMP) 2017:1–7. https://doi.org/10.1109/SMARTCOMP.2017.7946986
Article Google Scholar
Kim N, Kim G, Park C-M, Lee I, Lim S (2000). Multimodal menu presentation and selection in immersive virtualenvironments. https://doi.org/10.1109/VR.2000.840509
Article Google Scholar
Klauer C, Dingus TA, Neale VL, Sudweeks JD, Ramsey DJ (2006) The impact of driver inattention on near-crash/crash risk: an analysis using the 100-car naturalistic driving study data. https://vtechworks.lib.vt.edu/handle/10919/55090
KnowWare International Inc. (2022) QI Macros for Excel [Software add-in]. KnowWare International Inc. https://www.qimacros.com/
Kurtenbach G, Buxton W (1993) The limits of expert performance using hierarchic marking menus. In: Proceedings of the SIGCHI conference on human factors in computing systems-CHI ’93, pp 482–487. https://doi.org/10.1145/169059.169426
Landry S, Jeon M (2017) Participatory design research methodologies: a case study in dancer sonification. Science 6:52
Google Scholar
Large DR, Harrington K, Burnett G, Georgiou O (2019) Feel the noise: mid-air ultrasound haptics as a novel human-vehicle interaction paradigm. Appl Ergon 81:102909
Article Google Scholar
Larsson P, Niemand M (2015) Using sound to reduce visual distraction from in-vehicle human-machine interfaces. Traffic Inj Prev 16(sup1):S25–S30. https://doi.org/10.1080/15389588.2015.1020111
Article Google Scholar
Loehmann S, Knobel M, Lamara M, Butz A (2013) Culturally independent gestures for in-car interactions. In: Kotzé P, Marsden G, Lindgaard G, Wesson J, Winckler M (eds) Human-computer interaction-INTERACT 2013. Springer, Berlin, pp 538–545
Chapter Google Scholar
Lou X, Li XA, Hansen P, Du P (2021) Hand-adaptive user interface: improved gestural interaction in virtual reality. Virtual Reality 25(2):367–382. https://doi.org/10.1007/s10055-020-00461-7
Article Google Scholar
May KR, Gable TM, Walker BN (2014) A multimodal air gesture interface for in vehicle menu navigation. In: Adjunct proceedings of the 6th international conference on automotive user interfaces and interactive vehicular applications, pp 1–6. https://doi.org/10.1145/2667239.2667280
May K, Gable TM, Wu X, Sardesai RR, Walker BN (2016) Choosing the right air gesture: impacts of menu length and air gesture type on driver workload. In: Adjunct proceedings of the 8th international conference on automotive user interfaces and interactive vehicular applications, pp 69–74
McTear MF (1998) Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit. Development 5(7):52
Google Scholar
Mitchell J, Shneiderman B (1989) Dynamic versus static menus: an exploratory comparison. ACM SIGCHI Bull 20(4):33–37. https://doi.org/10.1145/67243.67247
Article Google Scholar
Moody N, Fells DN, Bailey DN (2007) Ashitaka: an audiovisual instrument. Springer, New York
Book Google Scholar
Motor Vehicle Traffic Crashes as a Leading Cause of Death in the United States, 2008 and 2009 (2013) Ann Emerg Med 61(4): 484. https://doi.org/10.1016/j.annemergmed.2013.02.009
Overview of the National Highway Traffic Safety Administration’s Driver Distraction Program: (729192011–001) (2010) American Psychological Association. https://doi.org/10.1037/e729192011-001
Rahman ASMM, Saboune J, El Saddik A (2011) Motion-path based in car gesture control of the multimedia devices. In: Proceedings of the first ACM international symposium on design and analysis of intelligent vehicular networks and applications, pp 69–76. https://doi.org/10.1145/2069000.2069013
Ren G, O’Neill E (2012) 3D Marking menu selection with freehand gestures. In: 2012 IEEE symposium on 3D user interfaces (3DUI), pp 61–68. https://doi.org/10.1109/3DUI.2012.6184185
Riener A (2012) Gestural interaction in vehicular applications. Computer 45(4):42–47. https://doi.org/10.1109/MC.2012.108
Article Google Scholar
Roider F, Raab K (2018) Implementation and evaluation of peripheral light feedback for mid-air gesture interaction in the car. In: 2018 14th international conference on intelligent environments (IE), pp 87–90. https://doi.org/10.1109/IE.2018.00021
Sabic E, Chen J (2016) Threshold of spearcon recognition for auditory menus. Proc Hum Factors Ergon Soc Ann Meet 60(1):1539–1543. https://doi.org/10.1177/1541931213601353
Article Google Scholar
Sabic E, Chen J, MacDonald JA (2021) Toward a better understanding of in-vehicle auditory warnings and background noise. Hum Factors 63(2):312–335. https://doi.org/10.1177/0018720819879311
Article Google Scholar
Sabic E, Mishler S, Chen J, Hu B (2017) Recognition of car warnings: an analysis of various alert types. In: Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems, pp 2010–2016. https://doi.org/10.1145/3027063.3053149
SAS Institute Inc (2020) JMP 16 [Computer software]. SAS Institute Inc. https://www.jmp.com/
Scarr J, Cockburn A, Gutwin C (2013) Supporting and exploiting spatial memory in user interfaces. Found Trends Hum-Comput Inter 6(1):1–84. https://doi.org/10.1561/1100000046
Article Google Scholar
Schmidt A (2000) Implicit human computer interaction through context. Pers Technol 4(2–3):191–199. https://doi.org/10.1007/BF01324126
Article Google Scholar
Sears A, Shneiderman B (1994) Split menus: effectively using selection frequency to organize menus. ACM Trans Comput-Hum Inter 1(1):27–51. https://doi.org/10.1145/174630.174632
Article Google Scholar
Shakeri G, Williamson JH, Brewster S (2017) Novel multimodal feedback techniques for in-car mid-air gesture interaction. In: Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, pp 84–93. https://doi.org/10.1145/3122986.3123011
Shakeri G, Williamson JH, Brewster S (2018) May the force be with you: ultrasound haptic feedback for mid-air gesture interaction in cars. In: Proceedings of the 10th international conference on automotive user interfaces and interactive vehicular applications, pp 1–10. https://doi.org/10.1145/3239060.3239081
Sherrington CS (1907) On the proprio-ceptive system, especially in its reflex aspect. Brain 29(4):467–482. https://doi.org/10.1093/brain/29.4.467
Article Google Scholar
Sodnik J, Dicke C, Tomažič S, Billinghurst M (2008) A user study of auditory versus visual interfaces for use while driving. Int J Hum Comput Stud 66(5):318–332. https://doi.org/10.1016/j.ijhcs.2007.11.001
Article Google Scholar
Srbinovska M, Salisbury IS, Loeb RG, Sanderson PM (2021) Spearcon compression levels influence the gap in comprehension between untrained and trained listeners. J Exp Psychol Appl 27(1):69–83. https://doi.org/10.1037/xap0000330
Article Google Scholar
Sterkenburg J, Landry S, Jeon M (2017) Eyes-free in-vehicle gesture controls: auditory-only displays reduced visual distraction and workload. In: Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications Adjunct, pp 195–200. https://doi.org/10.1145/3131726.3131747
Sterkenburg J, Landry S, Jeon M (2019) Design and evaluation of auditory-supported air gesture controls in vehicles. J Multimodal User Interfaces 13(2):55–70. https://doi.org/10.1007/s12193-019-00298-8
Article Google Scholar
Sterkenburg J, Landry S, Jeon M, Johnson J (2016) Towards an in-vehicle sonically-enhanced gesture control interface: a pilot study. Science 5:96
Google Scholar
Stevens C, Brennan D, Parker S (2004) Simultaneous manipulation of parameters of auditory icons to convey direction, size, and distance: effects on recognition and interpretation. https://smartech.gatech.edu/handle/1853/50916
Sun Y, Jeon M (2015) Lyricon (Lyrics + Earcons) improves identification of auditory cues. In: Marcus A (ed) Design, user experience, and usability: users and interactions. Springer, Berlin, pp 382–389
Chapter Google Scholar
Tislar K, Duford Z, Nelson B, Peabody M, Jeon M (2018) Examining the learnability of auditory displays: music, earcons, spearcons, and lyricons. Science. 5:63
Google Scholar
Tsimhoni O, Green P (2001) Visual demand of driving and the execution of display-intensive in-vehicle tasks. Proc Hum Factors Ergon Soc Ann Meet 45(23):1586–1590. https://doi.org/10.1177/154193120104502305
Article Google Scholar
Victor T, Strategic Highway Research Program Safety Focus Area, Transportation Research Board, & National Academies of Sciences, Engineering, and Medicine. (2014). Analysis of naturalistic driving study data: safer glances, driver inattention, and crash risk, p 22297. Transportation Research Board. https://doi.org/10.17226/22297
Visual and Task Demands of Driver Information Systems. (n.d.). Retrieved April 15, 2022, from https://deepblue.lib.umich.edu/bitstream/handle/2027.42/1269/92250.0001.001.pdf?sequence=2
Walker BN, Lindsay J, Nance A, Nakano Y, Palladino DK, Dingler T, Jeon M (2013) Spearcons (Speech-Based Earcons) improve navigation performance in advanced auditory menus. Hum Factors 55(1):157–182. https://doi.org/10.1177/0018720812450587
Article Google Scholar
Walker BN, Nance A, Lindsay J (2006) Spearcons: Speech-based earcons improve navigation performance in auditory menus. https://smartech.gatech.edu/handle/1853/50642
WHO-NMH-NVI-18.20-eng.pdf. (n.d.). Retrieved April 11, 2022, from http://apps.who.int/iris/bitstream/handle/10665/277370/WHO-NMH-NVI-18.20-eng.pdf?ua=1
Wickens CD (2002) Multiple resources and performance prediction. Theor Issues Ergon Sci 3(2):159–177. https://doi.org/10.1080/14639220210123806
Article Google Scholar
Wobbrock JO, Findlater L, Gergle D, Higgins JJ (2011) The aligned rank transform for nonparametric factorial analyses using only anova procedures. Res Methods 4:36
Google Scholar
Wu S, Gable T, May K, Choi Y, Walker B (2016) Comparison of surface gestures and air gestures for in-vehicle menu navigation. Archiv Des Res 29(4):65–80
Google Scholar
Yalla P, Walker BN (2008) Advanced auditory menus: design and evaluation of auditory scroll bars. In: Proceedings of the 10th international ACM SIGACCESS conference on computers and accessibility, pp 105–112. https://doi.org/10.1145/1414471.1414492
Young SJ (2000) Probabilistic methods in spoken–dialogue systems. Philos Trans R Soc Lond Series A Math Phys Eng Sci 358(1769):1389–1402. https://doi.org/10.1098/rsta.2000.0593
Article Google Scholar
Young K, Regan M (2007) Driver distraction: a review of the literature. In: Faulks IJ, Regan M, Stevenson M, Brown J, Porter A, Irwin JD (Eds) Distracted Driving. Australasian College of Road Safety, Sydney, NSW, pp 379–405

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Moustafa Tabbarah, Lingyu Li & Myounghoon Jeon
Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Yusheng Cao, Ziming Fang & Myounghoon Jeon

Authors

Moustafa Tabbarah
View author publications
You can also search for this author in PubMed Google Scholar
Yusheng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Ziming Fang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Myounghoon Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myounghoon Jeon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tabbarah, M., Cao, Y., Fang, Z. et al. Sonically-enhanced in-vehicle air gesture interactions: evaluation of different spearcon compression rates. J Multimodal User Interfaces (2024). https://doi.org/10.1007/s12193-024-00430-3

Download citation

Received: 06 January 2024
Accepted: 03 May 2024
Published: 19 May 2024
DOI: https://doi.org/10.1007/s12193-024-00430-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sonically-enhanced in-vehicle air gesture interactions: evaluation of different spearcon compression rates

Abstract

Similar content being viewed by others

In-vehicle air gesture design: impacts of display modality and control orientation

Design and evaluation of auditory-supported air gesture controls in vehicles

Augmenting Automotive Gesture Infotainment Interfaces Through Mid-Air Haptic Icon Design

1 Introduction

1.1 Introduction to driver distraction and IVIS design

1.2 Optimizing IVIS interactions: the role of auditory displays

1.3 In-vehicle air gesture menu navigation system

1.4 In-vehicle air gesture system with auditory displays

1.5 Auditory displays in vehicles

2 Current study and hypotheses

3 Methods

3.1 Menu and interaction design

3.2 Spearcon design

3.3 Experimental design and independent variable

3.4 Dependent measures

3.4.1 Driving performance

3.4.2 Eye glance behavior

3.4.3 Menu navigation performance

3.4.4 Workload

3.4.5 User experience

3.5 Apparatus

3.5.1 Driving simulator

3.5.2 LEAP motion

3.5.3 Eye tracker

3.6 Participants

3.7 Procedure

3.7.1 Leap motion controller and gesture interaction

3.8 Data analysis

4 Results

4.1 Driving performance

4.2 Eye glance behavior

4.2.1 Short glance frequency

4.2.2 Medium and long glance frequency

4.2.3 Dwell time

4.2.4 Number of glance-free selections per driving scenario

4.3 Manu navigation performance

4.3.1 Selection accuracy

4.3.2 Selection time

4.4 Perceived workload

4.5 User experience

4.5.1 System usability scale (SUS)

4.5.2 Auditory display user experience questionnaire

4.5.3 User preference

5 Discussion

5.1 Revisiting the results

5.1.1 Driving performance

5.1.2 Eye glance behavior

5.1.3 Menu navigation performance

5.1.4 Perceived workload

5.2 Revisiting the research questions and hypotheses

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation