Keywords

1 Introduction

A modern flight deck with large visual-display units provides pilots with a massive amount of information. Although humans perceive with various senses, most information is gathered visually [1]. An increasing number of pilot assistance systems in current and prospective cockpits will raise further challenges in human-machine interactions [2, 3]. Perception issues will worsen with new and advancing complex missions, which will require pilots to manage greater amounts of data, fly according to higher precision standards, and adopt new responsibilities [4]. Currently, new information systems like head-up or head-worn displays and the increasing number of high-resolution displays in the cockpit mainly target pilots’ visual perception [1, 5]. With the expanding number of systems that the cockpit crew must use, manage, or monitor, it creates new operational burdens and types of failure modes in the overall human-machine system [6, 7]. Besides the well-known ways to present information visually, audio seems to be a reasonable way to support flight crews. In present commercial aircraft, audio only provides simple warnings or informational sounds, mainly drawing the pilot’s attention to a designated display. In comparison to an increasing number of synthetic vision systems and visual 3D presentation, avionics systems rarely use spatial or 3D audio cues the moment [8]. In civil commercial airliners, it is not present. With the increasing complexity of operations, pilot assistance systems must be designed to relieve the already overloaded visual channel in order to improve safety [9]. Assuming this, research in the domain of audio is necessary. Nevertheless, audio research has been sparse in aviation and mostly covers spatial audio with a set of loudspeakers around participants’ heads or simple left-right volume differences in the headset [10]. However, several studies have suggested a multitude of applications for the use of 3D audio in the cockpit [1, 9, 11, 12]. The author has developed and tested a 3D audio system called Spatial Pilot Audio Assistance(SPAACE) to support pilots in future cockpits. The localization precision was tested in various setups, and trials show that participants can locate 3D audio with high precision [13, 14]. Thus, the audio system is worth considering as an additional or supplementary information channel in future aircrafts. However, all previous experiments were conducted in a clean experimental room environment without distraction or specific training. Both points must be considered when thinking about a pilot assistance system. This paper aims to fill this gap and introduce the design and results of a psychoacoustic 3D audio experiment with a focus on the effects of background noise and visual-based training on 3D audio. This paper is organized into five parts. The first chapter gives a short introduction to the existing research. The following chapter describes the experimental setup and the developed software. The findings of the experiment are presented in the results section and are then discussed. Conclusions and outlines of future work are given in the final chapter.

2 Research Question

As mentioned, most information on flight decks is still presented visually [8]. Human visual processing can become overloaded in high-workload flight phases. This happens especially when flying a helicopter at low altitudes in a degraded visual environment such as a brown-out, white-out, or during night operations. In the domain of fixed-wing aircrafts, high-workload flight phases can be found during takeoff, approach, or in the case of a system malfunction. Several advanced technology concepts that support pilots during flight are integrated into the cockpits side by side, and each is proven to benefit safety and performance. However, those systems convey no, or only limited, audio information.

It is apparent, that a cross-modal time-sharing technology improves situational awareness, dividing attention between the eyes and ears instead of two visual sources or two auditory sources. Previous research has shown that during high visual attention tasks like flying, auditory information is better and more quickly recognized than additional visual cues [12, 15,16,17,18]. It becomes conceivable that audio has a positive effect in high-workload situations in the cockpit.

When introducing spatial audio to the cockpit, several boundaries are concerned. This work addresses two of them. One is the impact of distraction from background noise on localization performance. Obviously, modern airplane and helicopter cockpits are quieter compared to those in earlier days. Cabin noise level is around 85 dB(A) for transition flight in a common helicopter and average 70 dB(A) in an airliner [19, 20]. Additionally new hearing protection with active noise reduction helps to lower external noise. However, since long verbal communication with air traffic control (ATC) or inter-crew communication still plays a major role in cockpit work, this dynamic distraction must be considered.

The second question is the influence of visual feedback as training. Commercial and military pilots are highly trained professionals. They become familiar with all components and systems of their future aircraft during training. Following the literature [21, 22], it is worth testing the influence of spatial audio as a part of this training.

3 Method

The experiment took place at the Institute of Flight Guidance at German Aerospace Center (DLR), Braunschweig, Germany. Participants were randomly selected from the employees and students at the research facility without focusing on an aviation background. In total, 27 participants, 4 female and 23 male, with ages ranging from 22 to 54 years (\(m=34.26, SD=8.89\)), volunteered for the experiment. All participants declared unobstructed hearing abilities. As the results from previous experiments showed no significant impact on localization performance from different hearing performance [14], audiometry testing was not conducted this time. A 360-degree round room, normally used as Apron and Tower Simulator (ATS), was used for the experiment. The inside simulator wall was in a monotone light blue color with a reference mark for 0 degrees for orientation and the initial calibration of the system. As shown in Fig. 1, the participants sat on a swivel chair in the center of the room. The shape of the room gave participants the ability of a free 360-degree movement around the vertical axis with a 360-degree field of vision. The combination with a head tracker gives participants the possibility of natural head movement, which is essential for localizing virtual sounds sources [14, 23]. The experiment operator sat in the same room at approximately 100-degrees, 2 m away from the participant. Due to the running projectors in the ATS, a constant noise of 48 dB(A) (equal to quiet suburb at daytime) was present during the experiment.

Fig. 1.
figure 1

The 360-degree tower simulator during the experiment. The participant wears a headphone with an attached head tracker to control the sound position and the digital red ball on the wall. (Color figure online)

Two audio objects with a frequency of 440 Hz, which is the pitch tone A, form the target sound. An introductory sound, rising in volume, emanates in a second sound, the actual signal tone. This signal tone was generated based on a sine tone. The characteristics are the short, hard transient and the clearly audible release time. Harmonics were added by distortion for better acoustic performance. To create the introductory sound, the actual signal tone was copied and inverted to play backwards. This sound was placed before the signal tone, whereby a direct transition was created. The length of the introductory sound has been manipulated by fading. The created target sound can be described as a warm, friendly Bing, motivating participants to follow it, without becoming annoying, with a clear, synthetic character. The sound remained unchanged throughout the experiment and was played repeatedly for 1 s with a 1 s pause until the participant pressed a button and determined the sound’s location. The randomly distributed target sound was located clockwise from 0 degrees to 350 degrees in 10-degree intervals. All tones throughout the experiment were located at the participant’s eye level, with a fixed elevation angle of 0 degrees for the horizontal plane. Participants were informed of the characteristics of the target sound.

All sounds in the experiment were played using off-the-shelf, over-ear Beyerdynamic DT 880 stereo headphones. This semi-open headphone has a frequency range from 5 Hz to 35 000 Hz and no built-in 3D audio features. A Carl Zeiss Cinemizer head tracker was attached to the headphone to transmit the participants’ head movements. This head tracker sent information to the DLR experiment software SPAACE. Figure 2 illustrates the structure with the head tracker linked to the headphones and the ATS.

Fig. 2.
figure 2

Structure of the experiment test system.

By continuously sending head tracking information to the 3D audio test application, recalculated sounds were played inside the participants’ headphones, giving the impression of spatial sounds at a steady position in relation to the screen’s 0-degree line. During the experiment, the participants were instructed to point to the perceived sound position. By rotating their whole body and head on the swivel chair, participants moved a virtual red ball on the wall of the simulator. At the selected position, participants confirmed their input by clicking the button on the wireless presenter. For the experiment, 20 sound positions were defined. The first six angles always started in the order at 90, 30, 270, 330, 150 and 210 degrees. The next 14 angles were defined randomly. The presented sound had an offset of at least 40 degrees from the preceding sound. For the test sessions, two distinct and independent angle sets were defined.

The experiment was split into two sequences, and each sequence was separated into three sessions. As Table 1 shows, every participant started with the sequence-No Training and went through three sessions. After each session, a questionnaire was presented followed by the sequence-With Training, which was also finalized by questionnaires. The experiment finished in a final questionnaire and an open interview. The sequences and sessions are explained in detail.

Table 1. Procedure of the experiment.

Sequence - No Training: During the first sequence, participants received no feedback about their localization performance. They completed all three sessions without visual feedback. This sequence was used as a baseline for how accurately participants could locate the spatial sound.

Sequence - With Training: The second sequence was set up to evaluate the impact of visual feedback as training on location error. Visual feedback was given only during the second sequence. After participants affirmed the perceived sound location, a yellow ball appeared on the ATS screen at the real target sound position. By comparing the red ball (the participant’s perceived sound location) with the yellow ball (the real target sound location), participants could see their localization offset. Participants were instructed to correct further sound localizations accordingly. The offset information was given directly after each of the 20 sound positions.

Both sequences were split into three sessions: Introduction, No Background, and ATC Background.

0 - Introduction: Prior the two main sessions, a brief introduction session was conducted. Participants became familiar to the 3D audio and the research procedure. Each introduction session comprised five spatial sound positions, which had to be located. Further, this session provided the opportunity for questions.

A - No Background: In this session, only the target sound was played without any background distraction or further auditory information. Each participant had to locate 20 predefined audio positions. The session provided basic information on location error rates without external disturbance and was used as a reference for the last session.

B - ATC Background: As stated in the introduction, aircrafts have become quieter. However, ATC and crew communication still takes place. Thus, this session evaluated if background voices at low volume (half the volume of the target sounds) distracted participants or affected localization performance. During this session, a non-spatial ATC recording was played continuously, imitating a realistic aviation environment, and participants had to locate 20 target sounds. Concentrating solely on the accuracy of target sound localization, participants were advised not to react to the ATC instructions. Further, it was evaluated if background voices, with a range between 500 Hz and 3000 Hz [24], can mask a target sound in the same frequency band.

During the experiment, SPAACE logged all audio positions, related head tracker information, and perceived and target sound location. The applicable data was later imported into SPSS for analysis and visualization.

4 Results

For this experiment, homogeneity of variance was assumed, and the main sessions, A and B, were evaluated. The introduction sessions 0, were used only as an opening for participants and were excluded from statistical evaluation. Target sound positions are written as x degree, calculated values in x\(^\circ \).

During the whole experiment, a total of 2430 sound positions were evaluated, and eight were corrected due to a system malfunction. Five participants experienced front-back confusion at sound angle 160 degrees and 180 degrees. These were treated as spikes and have been manually corrected by calculating the mean location error for each affected participant. Certainly, the impact of front-back confusion is critical for understanding the possible risks of 3D audio and was therefore analyzed separately.

For better understanding of the impact of ATC background noise and visual-based training on 3D audio, the overall localization capability in this experiment is presented. Figure 3 presents the mean localization error for all sessions. Although the results show a relatively constant allocation, participants improved over time, leading to a decreasing mean in location error from \(m=10.09^\circ \) (\(SD=11.90^\circ \)) to \(m=6.56^\circ \) (\(SD=5.26^\circ \)). The mean absolute location error ranged from \(m=1.91^\circ \) (\(SD=3.17^\circ \) at 0-degree) to \(m=10.66^\circ \) (\(SD=11.79^\circ \) at 320-degree), overall absolute location error of all target sound angles scattered around \(m=8.16^\circ \) (\(SD=8.43^\circ \)). The results show that participants located target sounds at 180 degrees with high accuracy (\(m=4.87^\circ , SD=4.72^\circ \)). Participants had the highest deviation between the target and perceived sound angles at the front-left and front-right side at 30 degrees (\(m=9.80^\circ , SD=13.60^\circ \)) and 320 degrees (\(m=10.66^\circ , SD=11.79^\circ \)).

Figure 4 depicts the high performance of participants for target angles 0 degrees (\(m=1.19^\circ , SD=3.15^\circ \)) and 180 degrees (\(m=4.87^\circ , SD=4.72^\circ \)). The location errors at 90 degrees (\(m=8.82^\circ , SD=6.37^\circ \)) and 270 degrees (\(m=9.66^\circ , SD=7.07^\circ \)) scattered recognizably more.

Fig. 3.
figure 3

Comparison of absolute location errors by session and sequence.

Fig. 4.
figure 4

Scattering of location error around 0, 90, 180, and 270 degree target sound position. The orange line represents the target sound position. The blue circles mark participants’ determined sound locations. (Color figure online)

Fig. 5.
figure 5

Comparison of the mean absolute location error between session A and session B. The blue circles represent outliers (\(>\!1.5~IQR\)). (Color figure online)

4.1 Background Noise

To evaluate if ATC background noise affects spatial perception, the results of session A (no background) are compared to those of session B (ATC background). Since the results are not normally distributed (Shapiro-Wilk test: \(p<0.05\)), the Wilcoxon signed-rank test is used.

Figure 5 presents the localization error for session A and session B. The results suggest that background noise has a positive influence on localization error. The error reduces with ATC background to \(Mdn=6.70^\circ , z=-2.27, p<0.05\); compared to the slightly higher error without background noise (\(Mdn=7.48^\circ \)). Although the results are not normally distributed, by comprising the average localization error it becomes visible that the error reduces from \(m=8.40^\circ , SD=9.34^\circ \) (no background) to \(m=7.37^\circ , SD=6.34^\circ \) with ATC background. These findings are reflected in the questionnaire results, 17 participants (61%) out of 28 participants reported that the ATC background did not interfere with the 3D audio presented with SPAACE. Only one participant reported problems with localization due to the presence of background voices, but this was not detectable from the allocated localization results.

One further effect can be identified from the time duration participants needed to locate target sounds. The time measurement started with the first 3D target sound and stopped when participants pressed the button to confirm the perceived sound location. Although there was no time limit in any session and participants were not instructed about the time tracking, the mean time to locate a target sound was fast with low variance among all participants (\(m=12.54\), \(SD=6.84\) s). Splitting the needed, not normally distributed (Shapiro-Wilk test: \(p<0.05\)), time between session A (no background) compared to session B (ATC background), it is apparent that the time to locate the target sound with ATC background (\(Mdn=11.51\) s) was almost identical to the session without background distractions (\(Mdn=11.50\) s, \(z=-2.28, p<0.05\)).

4.2 Visual Training

The second objective of the investigation is the influence of visual feedback as training on the tracking performance. Comparing the no-training sequence with the training sequence, according the Wilcoxon signed-rank test, the location error without training (\(Mdn=7.96^\circ \)) is significantly higher. As expected, training has a positive impact on location error (\(Mdn=6.51^\circ , z=-4.33, p<0.05\)). Figure 6 shows how participants improved during sessions with training (\(m=6.86^\circ , SD=1.97^\circ \)) compared to sessions without training (\(m=9.47^\circ , SD=4.86^\circ \)).

Fig. 6.
figure 6

Comparison of absolute location error by separating sequence 1 and sequence 2. The blue circles represent outliers (\(>\!1.5~IQR\)). (Color figure online)

To understand the effects of feedback as training, the localization error is analyzed for each participant. Comparing the mean location error of the first half with the corresponding mean location error of the second half, a positive effect on the location error is detectable. In numbers, 19 participants (70%) reduced their location error in the second half of the no background session, whereas 15 participants (56%) reduced their location error during the second half of the ATC background session. The questionnaires show that more than 40% of all participants reported rarely or never being able to correct subsequent 3D localizations with the help of visual feedback. However, 50% of participants judged the visual feedback as helpful in understanding their localization error. Further, less than 20% of all participants were confused sometimes by the visual feedback. Although the participants did not adequately value the feedback in subsequent rounds, there was a statistically significant improvement.

Again, participants’ times to locate target sounds were measured. According to the Wilcoxon signed-rank test (Shapiro-Wilk test: \(p<0.05\)), the time needed to locate the target sound with training was significantly higher (\(Mdn=12.02\) s) than during the sequence without visual training (\(Mdn=10.18\) s), \(z=-3.7, p<0.05\). By comparing sequence 1 (no training) with sequence 2 (with training), a worsening from \(m=11.23\) s (\(SD=3.94\) s) to \(m=13.76\) s (\(SD=6.33\) s) can be observed.

5 Discussion

Based on the findings of this research, the effects of background noise and visual training on 3D audio are evaluated.

As the participants’ absolute mean location error during this study was \(m=8.16^\circ \) (\(SD=8.43^\circ \)), it can be assumed that humans can locate spatial sounds with the developed SPAACE system within a range of \(\pm 10^\circ \). This accuracy is similar to free-field findings and listening tests with an array of loudspeakers. Concerning background noise generated by voices, this experiment shows no increase in location error during sessions with ATC background noise (\(m=7.37^\circ , SD=6.34^\circ \)) compared to sessions without background noise (\(m=8.40^\circ , SD=9.34^\circ \)). In addition, 61% of participants reported that ATC background noise rarely or never interfered with spatial target sounds. Besides, background noise neither affected spatial perception nor masked the target sound. Participants performed better during sessions with background noise compared to sessions without background noise. These findings match other research, e. g. stated by Godfroy-Cooper [25], and all participants agreed that it was easy to discriminate the sonifications from the background noise. Further, Ericson shows that the advantages of spatial separation are greatest when listeners are subjected to an ambient noise field [26]. With these results, multiple spatial applications in the cockpit are feasible. The pure presence of background communication seems not to have a negative influence. Future trials may consider active rather than passive ATC background, demanding that participants react to the background noise.

Training improves spatial perception. This becomes evident when comparing the no training sequences with the training sequences. The experiment shows an average improvement in location error of \(2.61^\circ \) (from \(m=9.47^\circ , SD=4.86^\circ \) down to \(m=6.86^\circ , SD=1.97^\circ \)). Further, participants improved during sessions without background noise by 70.37% and during sessions with ATC background by 55.56%. However, overall improvements within the training sequence were only marginal (session A \(m=6.71^\circ , SD=5.25^\circ \) and session B \(m=6.56^\circ , SD=5.26^\circ \)). The low improvement between session A and session B in sequence 2 seems to be a result of the overall good performance of participants close to natural human limits. As Majdak et al. [22] state, there is always a remaining error in localization, even when subjects receive extensive training. The improvement of participants’ localization was significant with visual feedback as training, which is consistent with other studies [1, 22, 27].

Information about the time taken to locate spatial audio was also collected during the study. As participants were not informed about time limits, they could locate spatial sounds without time-pressure. On the other hand, participants were not told to speed up, so the time duration of task should not be overrated. The overall mean time needed to locate spatial target sounds was \(m=12.54\) s (\(SD=6.84\) s), with participants needing more time during sessions with training. It can be assumed that the increase in time during visual feedback sessions is a result of a more cognizant execution of tasks when receiving training in the form of visual feedback. Overall, it can be assumed that without any time pressure, humans can locate spatial target sounds in this setup and for the given task within 15 s. Further analysis of the experiment should investigate how time and accuracy correlate.

6 Conclusion

This experiment has been conducted without any audiometry testing for the participants, but major hearing impairments would have been visible in the results. Minor hearing impairments cannot be excluded. However, since this experiment targets later use in aviation, audiometry testing for aviation personnel, according to the ‘Commission Regulation (EU) No 1178/2011’ 2011, is only done every five years (or every two years for personnel older than 40), so minor hearing impairment in aviation personnel can never be excluded completely. Also, the results from previous experiments show no significant impact of different hearing performance on localization performance [14]. The results build a base for future 3D audio applications in aviation.

Today, cockpits rely mainly on visual warnings as a primary information source. However, by looking at the underlying theories of human perception, e. g. the multiple resource theory, humans are better at dividing their attention across separate pools of information-processing resources (cross-modal), rather than using a single sensory channel (intra-modal) [18]. Presently, mono-aural message systems are already incorporated in modern cockpits to distribute information across different sensory channels. By using spatial audio, systems are able to attach relevant information to the aural alert, e. g. directional information during Traffic Alert and Collision Avoidance System (TCAS) alerts or Enhanced Ground Proximity Warning System (EGPWS) warnings. Especially under high workloads, spatial audio can complement visual instruments to relieve the visual channel.

The experiment shows humans’ ability to locate spatial sounds and provides a basis for developing aviation-related 3D audio applications. The main findings are:

  • the overall mean location error of participants is \(8.16^\circ \) (\(SD=8.43^\circ \))

  • ATC background noise has no negative influence on sound localization

  • visual feedback as training improves participants’ localization ability

  • the overall mean time to locate the target sounds is 12.54 s

According to the research results, it can be assumed that 3D audio can complement visual instruments to improve information distribution and enhance situational awareness. By designing applications according to human limitations, 3D audio is a technology capable of supporting flight crews.

Further studies must be conducted to investigate the impact of visual feedback. Also, research must be done in a demanding environment, e. g. during simulated flights, to examine the effects of stress on location error. Additionally, further research should be conducted to examine people’s ability to react to different auditory information simultaneously and respond to ATC or inter-crew communication while locating target sounds.