1 Introduction

For many years, the mission task elements (MTEs) method has served as a cornerstone in the qualitative assessment of rotorcraft handling qualities (HQs) during flight tests [1]. MTEs are specific flight maneuvers or tasks that pilots execute to evaluate the handling qualities of helicopters and other rotorcraft. These elements are meticulously crafted to test the aircraft’s performance in scenarios that mimic real-world operational or combat missions. Through the evaluation of MTEs, it is possible to gauge how effectively a rotorcraft meets essential standards for maneuverability, stability, and overall ease of operation across various conditions. Each MTE is carefully chosen to align with the intended roles and missions of the rotorcraft. These tasks may include maneuvers such as hovering, slalom, nap-of-the-earth flight (low-level, terrain-hugging flight), and other precision tasks that demand substantial pilot skill and interaction with the aircraft’s control systems. The fundamental aim of incorporating MTEs in handling quality evaluations is to verify that the rotorcraft can be operated safely and efficiently in the gamut of anticipated operational environments. Systematic assessment of handling qualities through MTEs enables evaluators to pinpoint potential deficiencies in aircraft design or performance, thereby guiding enhancements that boost both pilot safety and mission effectiveness.

To carry out these evaluations for maneuvers at low speeds or during hover, intricate ground courses are typically required, as demonstrated in Figs. 1 and 2 during an MTE evaluation campaign performed with an ultralight helicopter [2]. However, this traditional approach poses several significant challenges in the face of the changing landscape of rotorcraft design and flight profiles:

Fig. 1
figure 1

CoAX2D during hover MTE evaluation [2]. Hoverboards, cones, and ground markings are used as visual cues

Fig. 2
figure 2

CoAX2D during acceleration/deceleration MTE evaluation [2]. Cones and ground markings are used as visual cues

Infrastructure requirements Implementing physical MTE courses for HQ evaluation requires considerable infrastructure [3]. This could include setting up specific geographical features or constructing physical markers for the course [4]. Furthermore, it could involve acquiring or allocating dedicated airspace for these tests. The development, maintenance, and management of such infrastructure can be time-consuming, and costly, and may necessitate approvals or permits from regulatory bodies.

Limited to low-speed tasks Physical MTE courses have their limitations, especially when it comes to evaluating tasks for higher speeds. They are most suitable for low-speed maneuvers or hovering tasks close to the ground. High-speed or complex maneuvers might not be feasible or could pose safety risks, thereby restricting the scope of the tests. As a consequence, most high-speed testing has to be performed using heads-down displays (HDD) [5, 6]. As a result, the full range of a rotorcraft’s capabilities in operationally relevant tasks might not be evaluated effectively.

Inflexibility for research The physical nature of MTE courses makes them less adaptable to changes or advancements in rotorcraft design and technology. Adjustments or modifications to the course to accommodate new research findings or to test different aspects of HQs could involve considerable manufacturing efforts, which can be both time and resource intensive [2]. This lack of flexibility can hinder the quick testing and iteration of innovative designs or technologies. In addition, the fixed nature of physical courses may not allow for easy replication or standardization across different testing sites or conditions, which is critical for robust and reliable research outcomes.

As rotorcraft designs become more advanced and flight profiles with specific demands on system HQs become available, the drawbacks mentioned above are becoming increasingly significant in both civil and military aerospace domains. The following paragraphs give a short introduction to current and future HQ evaluation drivers of the two domains:

1.1 Civil applications

The emergence of new air vehicles, such as urban air mobility (UAM) vehicles, has sparked a multitude of vertical takeoff and landing (VTOL) vehicle designs. These designs include electric vehicles (eVTOL), many of which have significant funding and are in various stages of prototyping and testing. These vehicles, aimed for use in transportation in congested urban environments, will primarily feature advanced fly-by-wire flight control systems. However, the processes and regulations necessary for certifying these vehicles are still under development.

To support this certification process, The European Organization for Civil Aviation Equipment (EUROCAE) is developing a flight task maneuvers (FTM) approach. This approach provides preliminary guidelines for HQ testing for VTOL, including a set of flight tasks intended for means of compliance (MoC) demonstration [7], as part of the EASA SC-VTOL [8] certification program. These FTMs utilize a scaling technique based on the vehicle’s size, inspired by the MTE approach from ADS-33E [9], and reflect operational requirements [10]. At the same time, US authorities are also adopting a mission-oriented approach to define and assess MTEs, which will serve as MoCs for FAA’s flight test certification [11]. This means that, for the first time in civil rotorcraft aviation, handling quality evaluation will be a mandatory part of the certification process. This creates a significant additional responsibility for manufacturers, especially considering the already extensive flight hours required for certification. Furthermore, given the prevalence of electric engine solutions and lengthy recharge periods, a streamlined testing environment could save flight test time and reduce overall certification costs.

A shift away from physical task cueing toward a digital solution for HQ evaluation could lead to a more uniform testing environment across manufacturers and reduce testing time. For instance, adjustments to test courses could be made more swiftly, eliminating the need for repositioning in between hovering tasks.

1.2 Military applications

Both NATO’s Next Generation Rotorcraft Capability (NGRC) and the US Future Vertical Lift (FVL) programs are focusing on upgrading military rotorcraft designs. Their goal is to engineer advanced configurations that outperform traditional helicopters in terms of speed, range, and efficiency. They are considering unique designs such as lift-offset coaxial compound helicopters [12], tilt rotors [13], and single main rotor helicopters with wings [14]. Additionally, they are researching new solutions for human–machine interfaces and pilot assistance systems. These advanced designs, with their enhanced maneuverability, speed, and user-friendliness, are expected to pave the way for entirely new missions and flight scenarios. To ensure these potential enhancements translate into effective mission performance, it is vital to create and validate HQ requirements that align with these new designs and mission profiles, especially at higher speeds [15, 16].

According to ADS-33E, all forward-flying tasks are classified as "up and away" tasks. This means no additional visual cues beyond the primary flight display (PFD) are necessary. However, several future mission profiles of interest are primarily categorized as "eyes out" tasks, such as nap-of-the-Earth (NOE) flight, contour flight (CF), or air-to-air refueling (AAR).

The use of head-down-displays for showing task performance could interfere with the pilot’s natural scanning pattern so much that it fails to accurately reflect the pilot–cockpit interaction during the mission task. Moreover, for forward flight or high-altitude tasks, providing physical visual cues is just not feasible. For example, the Break Turn MTE recommends using geographical features such as railroad crossings or airport runways as visual cues [17, 18]. This severely limits flight testing feasibility if these features are not readily available or if the airspace above them is restricted for testing. This underscores the need for alternative solutions to the visual cueing issue when conducting HQ evaluations under the MTE process for the next generation of military rotorcraft.

1.3 Holographic visual cue approach

DLR has designed a novel approach to HQ evaluation, aimed at reducing reliance on physical courses, adapting to evolving regulations, and simplifying the testing process. This involves using an augmented reality (AR) system to give test pilots a virtual visual experience of the MTE courses. The head-mounted display (HMD) utilized in this study is the Microsoft HoloLens 2 [19]. This HMD is a mixed-reality headset designed for use in enterprise and industrial settings. It uses holographic displays to create the illusion of digital objects existing in the real/simulator world. The virtual objects displayed on this device are often referred to as “holograms”Footnote 1. Consequently, in this context, we will refer to the visual cues provided by the device as "Holographic Visual Cues", or HVC for short.

The system is intended to not interfere with the pilot’s natural visual scanning, as the display elements are fixed in the outside viewing space of the pilots, and position changes are driven by the aircraft’s actual movements in flight, or by data from a flight simulator during ground testing. This AR approach intends to assist in demonstrating compliance with MTE-based requirements for SC-VTOL and Part 23 eVTOL certification. Furthermore, it supports research into evaluating HQ for FVL vehicles, particularly during high-speed forward flight.

This paper explores the effectiveness of the AR system by presenting results from a piloted simulation campaign. The study compares holographic and dome-projected visual cues for two established ADS-33E MTEs, analyzing pilot ratings and task performance. The objective is to validate the feasibility of this technology as a tool for HQ evaluation.

1.4 Previous work

Head-mounted displays, also known as augmented reality or conformal displays, have been a subject of extensive research at DLR since the 1990s [22]. Studies conducted over the years have shown numerous advantages to integrating HMDs in cockpits. These benefits include reducing the time required for pilots to scan between instrument information and the outside world [23], decreasing instances of attentional capture, and enhancing situational awareness.

On the other hand, the use of HMDs can also lead to disadvantages such as increased weight causing pilot discomfort, cognitive overload from excessive information, visual interference obscuring visibility, reliability issues due to system failures, and a significant training and adaptation period for pilots.

The Institute of Flight Systems at DLR has been working on the use of HMDs as part of two significant projects: HELMA (Helicopter Flight Safety in Maritime Environments) and HEDELA (Helicopter Deck Landing Assistance). Both of these projects are designed to enhance flight safety and operational availability in offshore environments. The methods employed involve evaluating and utilizing various visual and guidance assistance systems. While the HELMA project is mainly concerned with offshore wind farms, the focus of the HEDELA project is centered around helicopter ship deck landings [24,25,26]. Following a thorough investigation and testing process, the Microsoft HoloLens 2 was selected as the foundational technology for further development within these projects [27]. The HoloLens 2 was seamlessly integrated into DLR’s air vehicle simulator (AVES) as shown in Fig. 3, a research flight simulation facility that offers a six-degree-of-freedom hexapod motion system and the ability to use multiple cockpit layouts [28]. Drawing upon valuable insights gained from these extensive integration and evaluation endeavors, the HoloLens 2 emerged as the optimal choice for the HVC integration showcased in the presented study.

Schulze et al. recently introduced an innovative cockpit display technology designed for HQ assessments in UAM and rotorcraft vehicles [29]. The presented technology is a tablet-based cockpit display. It provides virtual courses for MTEs, against which vehicles can be evaluated. The system is proposed as an alternative to physical test courses, reducing infrastructure needs and creating more efficient test campaigns. A piloted simulation study found a first-person display to be insufficient due to a lack of adequate visual cueing. A top-down display was successful, but provided potentially excessive cues, which might oversimplify the task. A multi-view display also led to successful task completion, but pilots often used it like the top-down view. While these displays show promise, they need further refinement to ensure adequate visual cueing without diminishing task complexity.

The authors of this paper thoughtfully consider the implications of using a head-down display (HDD) view, noting that it could meaningfully alter the pilot’s information flow and potentially require different control strategies, which might affect handling quality (HQ) assessments compared to traditional "eyes out" visual cues. This study aims to explore the feasibility of helmet-mounted displays (HMDs) as an alternative, by integrating insights from previous tests of the HoloLens 2 for pilot assistance systems along with the concepts presented by Schulze et al., and carefully evaluating the potential disadvantages of HMD systems to determine their effectiveness for the MTE evaluation process.

Fig. 3
figure 3

Microsoft HoloLens 2 as HMD in DLRs ACT/FHS cockpit

1.5 The MR spectrum

Mixed reality (MR), augmented reality (AR), and augmented virtuality (AV) are all immersive technologies that blend the physical and digital worlds in different ways. Some argue that MR serves as an umbrella term, covering a spectrum that ranges from AR, closer to the fully real environment, to AV, closer to the fully virtual environment. Other definitions follow the argument that AR overlays digital elements onto the real world, and in contrast, MR not only overlays but also anchors digital objects to the real world, allowing them to interact with the physical environment.

In the context of this work, the HVC implementation utilizes digital objects that are anchored to the real world, but do not offer interactive capabilities. These digital objects maintain a consistent, fixed position relative to real-world objects, but do not react to or interact with the physical environment or the user. While some might classify such experiences as MR due to the anchoring of digital objects, the authors decided to categorize this application as AR. This decision reflects the alignment with the perspective that AR involves the overlay of digital elements onto the physical world, with limited interaction between the digital and physical elements.

However, it is important to note that the terminology is still evolving and can be somewhat subjective as these technologies continue to develop and mature. The classification of these experiences largely depends on where one chooses to draw the lines within the MR continuum.

2 Task selection and visual cueing

The intention of the presented work is to evaluate a solution that aims to transition from physical MTE visual cues to HMD-based digital visual cues. It is important to note that the aim is not to fully recreate the test scene on the display, but rather to provide an adequate visual cueing environment that allows for a comprehensive evaluation of the vehicle’s handling characteristics. An environment offering insufficient or excessive cueing is considered equally disadvantageous.

Utilizing a multi-role helicopter as a reference, an initial batch of MTEs was selected from the ADS-33E catalog. The focus was on maneuvers that apply across all helicopter categories to ensure wide applicability during testing. It is imperative to create HVC representations for both hover or slow flight and forward flight maneuvers to examine the capabilities and limitations of virtual representations under varying flight conditions. The MTE Slalom was identified as the most suitable forward flight maneuver due to its high demands on agility and PFD cross-checks. This in turn should provoke a high degree of head movement by the pilot, thus highlighting deficits with the HMD setup. The MTE Hover was chosen based on its scope for future expansion and applicability. For instance, the visual cue setup can be used directly for the Landing MTE, and in an expanded form for the Hovering Turn and Vertical Maneuver MTEs. This makes the Hover course a versatile choice that can easily be adapted for four different tasks in future work.

2.1 Mission task elements

For reference, short descriptions of the selected MTEs taken from ADS-33E [30] are provided in the following section. The detailed task descriptions are provided in Appendix A.

Fig. 4
figure 4

MTE Hover course layout for AVES dome projection (H-DP)

Fig. 5
figure 5

MTE Slalom course layout for AVES dome projection (S-DP)

The MTE Hover task involves initiating a maneuver at a ground speed of 6–10 knots and an altitude of less than 20 feet, aiming to precisely hover over a target point oriented approximately 45 degrees relative to the rotorcraft’s heading. This task tests the pilot’s ability to smoothly transition from translating flight to a stabilized hover, maintaining precise control of the rotorcraft’s position, heading, and altitude under moderate wind conditions, as well as in calm winds, ensuring precise maneuverability and stability in varied environmental conditions. The detailed task description is provided in Appendix A.1.

The MTE Slalom task involves performing a maneuver starting from level unaccelerated flight, aligned with the centerline of the test course, and executing a series of smooth, coordinated turns at 500 feet intervals, with at least two turns to each side of the course. These turns must be made at least 50 feet from the centerline and maintain a maximum lateral error of 50 feet. The objective is to complete the maneuver below a specific reference altitude, finishing on the centerline in a straight flight. This task assesses the pilot’s ability to handle the rotorcraft aggressively and smoothly in forward flight while maintaining precise coordination and monitoring for any unwanted interactions between control axes during dynamic maneuvering. The detailed task description is provided in Appendix A.2.

2.2 Visual cues

The following gives a detailed overview of the different visual cue setups used for comparison during the piloted simulation campaign described in Sect. 3.

2.2.1 Dome projection (DP)

The baseline case is designed to emulate a real-world environment, serving as a proxy for live evaluations in an actual vehicle. In the simulator, the pilots were presented with a conventional dome-projected view (see Sect. 3.1 for further information) along with the suggested visual cues. Figure 4 shows the dome-projected setup for the MTE Hover (H-DP) and follows the design guidelines provided by ADS-33E as shown in Fig. 19.

Figure 5 shows the dome-projected setup for the MTE Slalom (S-DP) and follows the design guidelines provided by ADS-33E as shown in Fig. 20.

2.2.2 Holographic standard visual cues (AR1)

The holographic standard cues are meant to represent a replica of the baseline environment on the HMD.

For AR integration of MTE visual cues, a heads-up display was used in the form of the Microsoft HoloLens 2 as explained in Sect. 1.3.

An example of the running application in AVES (as seen through the HMD) is shown in Fig. 6, which replicates the Hover MTE course. The MTE Hover holographic standard cue setup (H-AR1) consists of essential elements like a hover point (1), guide cone field (2), target direction (3), and a hoverboard (4). Guide cones visibly mark the start and end points.

Fig. 6
figure 6

Holographic standard cues for the MTE Hover course as seen through the HMD (H-AR1)

Per ADS-33E, the approach corridor is a grid of guide cones, and the distance between this and the hover point equals the length of the approach path. The hover point is distinctly marked and encircled by two squares in the HMD display to indicate the permissible position deviation while hovering. The guide cone field indicates the longitudinal position deviation. This field is symmetrically established on both sides of the hover point, offering the pilot a bidirectional view. The target direction is indicated by a line extending from the outer square of the hover point to the hoverboard. An additional parallel line is provided at the start point for alignment.

Figure 7 shows the running application in AVES (as seen through the HMD) for the Slalom MTE course (S-AR1). Parallel running start and finish lines (1) act as ground markers and are interconnected by a centerline (2). The design follows the ADS-33E suggestions for MTE Slalom, mandating two side turns. The start and finish lines host gates, represented by guide cones. These gates, along with four additional ones (3), mark the two slalom turns per side.

Fig. 7
figure 7

Holographic standard cues for the MTE Slalom course as seen through the HMD (S-AR1)

2.2.3 Holographic advanced visual cues (AR2)

To evaluate the benefits and potential pitfalls of the design freedom within AR visual cueing, “advanced” courses have been implemented which provide the pilot with additional data regarding the current task performance.

The running application in AVES (as seen through the HMD) for the advanced MTE Hover cues (H-AR2) is shown in Fig. 8. The primary objective of the design is to ensure that pilots receive comprehensive task performance information within a unified direction, mitigating any potential loss of information caused by the limited FOV of the HMD and the inability to display task-related information in the peripheral view.

Fig. 8
figure 8

Holographic advanced cues for the MTE Hover course as seen through the HMD (H-AR2)

The image highlights an approach tunnel semicircular in shape (1), which starts from a platform and ends at a hover point (2), defined by a desired frame.

This tunnel is intended to help guide height maintenance and highlight the approach corridor through intentional over-cueing.

The starting platform is a square area, equal to the acceptable lateral and longitudinal deviation. This platform, along with two squares surrounding the target point, helps visualize the permitted positional deviation during hovering. An inclusive hover performance display is also included, containing three indicators. The display is designed as a two-dimensional square with an inner desired area and a square reference symbol (3). Finally, two displays of longitudinal position are introduced to the side of the hover display (4). These help separate the longitudinal and sideway movement information, reducing information overload. In addition, a semicircular course display is implemented above the hover display (5). All displays have dynamically colored frames to alert pilots of any boundary value violations.

The running application in AVES (as seen through the HMD) for the advanced MTE Slalom cues (S-AR2) is shown in Fig. 9.

The primary design objective was to provide the pilot with additional altitude performance information to counteract any potential loss of visibility on the primary flight display (PFD), which could arise due to the darkening effect of the HMD screen. This supplementary altitude information is crucial for ensuring the pilot can meet the altitude requirements specified in the MTE procedures, particularly when the PFD’s readability is compromised.

The MTE Slalom advanced cues include a start and finish line (1), a central line (2), and holographic slalom gates (3). Additional intersection lines (4) are placed midway between slalom turns, serving as visual aids for pilots. The gates, allowing collision-free passage, define the acceptable area for a turn. To further assist pilots, a horizontal height line (5) is added to the gates, providing an additional visual indicator of maximum permissible flight height.

Fig. 9
figure 9

Holographic advanced cues for the MTE Slalom course as seen through the HMD (S-AR2)

Table 1 Tested MTEs & Visual cue configurations
Table 2 Experience of pilots in the study

3 Test approach

The overall approach of this study aimed to clarify whether suitable visual references can be created and used for HQ evaluation with an AR representation, and how the technical limitations like field of view (FOV) and of the HMD and diminished legibility of the PFD impact the HQ ratings and the perceived workload of the pilots. To do so, two AR visual cue setups were compared against the baseline dome projection. Table 1 provides an overview of the different MTEs and visual cueing setups that have been tested and compared, as discussed in Sect. 2.2.

A total of five pilots participated in the simulation study, with their experiences detailed in Table 2. Alongside the three test pilots (A, D, and E) required for qualitative HQ evaluations, the study also incorporated feedback from two additional pilots. One was a novice with a low number of flight hours (C), and the other was an exceptionally experienced operational pilot (B). This was done to diversify the feedback and better understand how acceptance levels toward new technologies may vary in relation to the flight experience.

All participating pilots were previously familiar with both the Cooper–Harper Handling Quality Rating Scale and the Bedford Workload Rating Scale. Specifically, Pilot C, despite accumulating limited flight hours, has acquired substantial exposure to the Cooper Harper rating principles and mission task element (MTE) evaluations, facilitated by a comprehensive background in HQ research.

The three test pilots (A, D , and E) all had either military and/or research flight backgrounds and were familiar with different kinds of current HMD technologies for in-flight usage.

3.1 Simulation environment

The simulation facility AVES is shown in Fig. 10. The simulator features three interchangeable modules: an Airbus A320, a Eurocopter EC135 cockpit as well as a single aisle passenger cabin. These modules can be exchanged via a roll-on/roll-off system to utilize a full-sized six-degree of freedom, hexapod motion platform, or a fixed-base platform. For the investigations described within this paper, the EC135 cockpit on the fixed-base platform was used. The projection system in both platforms consists of nine LED projectors each with a resolution of 1920x1200 which provide a horizontal FOV of 240\(^\circ \) and a vertical FOV of -55\(^\circ \) to 40\(^\circ \) [28]. All hardware and software systems within the AVES can easily be modified, which qualifies the simulator for a broad spectrum of research activities. In the development process of new systems and applications, the AVES is used as the test platform after a desktop simulation and before the flight testing using DLR’s research helicopter ACT/FHS (active control technology/flying helicopter simulator).

Fig. 10
figure 10

The air vehicle simulator (AVES) at DLR Braunschweig

3.2 Flight model description

The flight dynamics of the helicopter were calculated using the HeliWorX real-time model [31]. This model is developed in line with the flight dynamics of the ACT/FHS. To conduct the simulator studies, a flight stability and control augmentation system (FCS) was implemented on top of the inherent airframe dynamics. This system offers attitude command and hold for the pitch and roll axis (ACAH), a stability augmentation system (SAS) for the yaw axis (SAS), and direct control for the collective axis (DI). This type of control system is typically preferred for low-speed and hover operations. The model was evaluated against a range of ADS-33E objective requirements to determine its overall predicted handling qualities (PHQs). The results of the ADS-33E criteria analyzed are listed in Table 3. The findings suggest that the vehicle is likely to achieve Level 2 HQs when performing MTEs defined by ADS-33E. This implies that while certain deficiencies were identified that could benefit from improvements, these were not deemed necessary. Despite the HQ deficiencies, the vehicle should still be able to meet the desired performance standards. The detailed results are replicated in Appendix B and were first published by Atci et al. [32].

Table 3 Predicted handling qualities at low speeds

3.3 HoloLens 2 integration

The HoloLens 2 was integrated into DLR’s air vehicle simulator (AVES). It uses WiFi to receive data, such as the helicopter’s model state data. The application is built using the Unity3D game engine and is written in the C# scripting language.

One of the challenges is the alignment of the outside world presented on the simulator screen with the holographic world inside the HoloLens 2. The dome projection of the simulator is a transformation of the generated world onto a sphere. This transformation needs to be repeated for the holograms in the HoloLens 2. To position the hologram sphere at the same position as the simulator dome sphere, a calibration process is necessary. The head tracking of the HoloLens 2 is an inside-out tracking system that relies on an IMU and four environmental cameras. In various tests, it has been determined that the internal tracking is suitable within a stationary environment (i.e., a fixed-base simulation facility). To use the system within a dynamic environment (e.g., moving platform, vehicle), additional compensation or external head tracking is required.

A detailed technical description of the integration process can be found in the work published by Walko [27].

4 Results and discussion

The following material summarizes the results of the formal piloted evaluations conducted during the piloted simulations in AVES. The analysis results (i.e., handling quality and workload ratings, task performance, control activity and pilot comments) are broken into separate sections. Multiple practice runs were made available to the pilot to ensure they were well acquainted with the task, controls, and simulation environment including the use of the HMD before taking evaluation runs.

During the trials, pilots were not subjected to simulator sessions exceeding 1 h without a subsequent break of 15–30 min, depending on task intensity. During these sessions, no instances of simulator sickness or any other physiological issues were reported, nor did any pilots express discomfort or experience eye strain while using the HMD.

4.1 Handling qualities and workload

HQ ratings and perceived pilot workload ratings were evaluated using the Cooper–Harper Handling Qualities Rating Scale [33] and Bedford Workload Scale [34], respectively, and can be found in Tables  4 and 5. The scales are provided for reference in Appendix C.

4.1.1 Baseline validation

The baseline case is designed to emulate a real-world environment, serving as a proxy for live evaluations in an actual vehicle. In the simulator, the pilot was given a conventional dome-projected view along with the standard task cues, as explained in Sect. 2.2. The average Cooper–Harper Handling Qualities Ratings (HQR) for the dome projection (DP), shown in Table  4, are given by HQR 3.75 (Level 1/2) for MTE Hover and HQR 4.6 (Level 2) for MTE Slalom, which aligns with the predicted ratings based on objective requirements as discussed in Sect. 3.2. This alignment, coupled with the absence of any substantial negative feedback from the pilots regarding the dome-projected visual setup, validates the setup’s effectiveness as a reliable baseline comparison for the HVCs.

Table 4 Cooper–Harper Handling Quality Ratings awarded by pilots
Table 5 Bedford Workload Scale Ratings awarded by pilots

4.1.2 MTE Hover

Figure 11 presents the Bedford Workload Ratings (BWRs) and HQRs assigned by the pilots for the MTE Hover. When evaluating the mean ratings from all pilots, the visual cues in the H-AR1 setup, symbolized as "\(\blacktriangledown \)", yielded almost identical ratings to those of the baseline dome projection, designated as H-DP or "\(\bigstar \)". Contrarily, the visual cues in the H-AR2 setup, denoted as "\(\blacksquare \)", were rated as HQR+1 and BWR+2 above the baseline. This suggests a significant rise in workload and a higher level of compensation required from the pilots, compared to the baseline setup.

When delving into individual pilot ratings, it becomes clear that pilots’ assessments diverge. For instance, Pilot A discerned marginal differences between the baseline H-DP and H-AR1 setup, both represented in green (”” and ""). However, Pilot A also assigned the H-AR2 setup, represented as a green "", a maximum of BWR 10 and a Level 3 rating with an HQR 7 and commented:

I found myself trying to manipulate the display to optimize my performance, which is not ideal and threw me off.

Pilot D, on the other hand, rated the H-AR1 setup, marked as a blue "", slightly better than the baseline shown as a blue "" (with BWR-1 and HQR-1 under baseline), and H-AR2, symbolized as a blue "", slightly worse (with BWR+1 and HQR+1 over baseline). Pilot D commented:

I found this [H-AR1] experience to be more enjoyable and preferable to using the dome projection.

and continued that

... presenting three-dimensional information in a two-dimensional form [H-AR2] feels a bit off, it does not translate well.

Conversely, Pilot B noticed no difference in HQRs between the baseline, depicted as a red "", and the H-AR2 setup, denoted as a red "". However, Pilot B rate H-AR1, shown as a red "", marginally worse with an HQR+1 over baseline. The pilot argued that

I prefer to have the lateral and longitudinal position information centered in my field of view, as is the case here [H-AR2].

The broad range of these pilot ratings underlines the sensitivity of the hover task to various pilot control and visual scanning strategies. Clearly, the H-AR2 setup proved the most contentious among the pilots, exhibiting substantial deviations from the baseline ratings.

Contrastingly, the H-AR1 setup, on an individual basis yielded very similar results for most of the pilots and on average (HQR 3.75 and BWR 3.75), received almost identical ratings to the baseline (HQR 3.75 and BWR 3.5) in terms of workload and required pilot compensation. This suggests that H-AR1 presents a suitable alternative visual cue environment for HQ evaluation under the Hover MTE.

Fig. 11
figure 11

Bedford Workload Ratings and Cooper–Harper Ratings for MTE Hover

Fig. 12
figure 12

Bedford Workload Ratings and Cooper–Harper Ratings for MTE Slalom

4.1.3 MTE Slalom

Figure 12 presents the BWRs and HQRs assigned by the pilots for the MTE Slalom. When considering the mean average ratings across all pilots, the visual cues from both the S-AR1 and S-AR2 setups, represented by "" and "", respectively, achieve an HQR 4.6. This is identical to the baseline dome projection S-DP, denoted by "". The subtle difference lies in the workload, where the dome-projected cues result in a slightly higher average workload.

In terms of individual pilot ratings, compared to the MTE Hover results, the ratings for the MTE Slalom are more closely clustered. For example, Pilots A and D found no discernible difference between the baseline S-DP, and the S-AR1 and S-AR2 setups, all marked in green and blue, respectively (",", ",", and ","). Pilot A commented that:

It is pretty much the same experience as before, but I did notice there’s more head movement compared to the dome projection.

Pilot D added:

Although the limited field of view is noticeable, it is not necessarily a problem, just something to adjust to.

Pilot B, however, rated differences in workload over the same HQRs. The baseline, depicted as a red "", was rated as BWR 5, while S-AR1, shown as a red "", scored a BWR 4 and S-AR2, denoted as a red "", a BWS 3. Pilot B argued that:

I found myself estimating the height here [S-AR2] more from the corner of my eye rather than looking at the PFD.

Pilots C and E gave identical HQRs for the baseline S-DP and S-AR1, both marked in magenta and cyan ("," and ","). However, Pilot C rated the advanced cues S-AR2, depicted as a magenta "", one rating better than the baseline. Meanwhile, Pilot E assigned one rating higher to S-AR2, shown as a cyan "" and gave the following explanation:

The effect of the reference disappearing with head movement is more pronounced [S-AR2], which takes some getting used to.

Fig. 13
figure 13

Time history performance for MTE Hover

Overall, the range of these pilot ratings shows that MTE Slalom seems to be less sensitive compared to the hover task to various pilot control and visual scanning strategies. Both the S-AR1 and S-AR2 setup, on an individual basis, yielded very similar results for most of the pilots and on average received identical ratings to the baseline in terms of required pilot compensation. The workload was rated on average BWR\(-\)0.5 below baseline for both HVC variants, which is a surprising result. The author’s opinion is that this could be due to a higher contrast of the visual cues provided by the HMD compared to the dome projection. The definitive determination of the underlying reason for this remains inconclusive, primarily due to conflicting pilot comments on the matter. Although the observed difference is relatively minor and may potentially be resolved through an increase in collected ratings, further investigation is required in subsequent research endeavors. The results suggest that both S-AR1 and S-AR2 present a suitable alternative visual cue environment for HQ evaluation under the Slalom MTE.

4.2 Task performance and visual cues discussion

The subsequent section presents a compilation of task performance data and pilot comments. The authors specifically chose exemplary evaluation data runs from individual pilots (A, B, and D) that emphasize noteworthy aspects identified and discussed during the simulation trials.

4.2.1 MTE Hover

Figure 13a illustrates the task performance of Pilot A during an evaluation run of the MTE Hover, while Fig. 13b does the same for Pilot D. Here, the green areas represent the desired task performance boundaries (), and the orange areas represent adequate task performance boundaries ().

Figure 14 displays the cyclic control activity for Pilot A, and Fig. 15 that of Pilot D during the MTE Hover. The extreme ends of the whiskers (\(\top \) \(\bot \)) on the box plot represent the maximum and minimum values of control deflection recorded during the 30-s hover sequence. The blue box outlines the 25th to 75th percentile range, with the red line indicating the median value ().

Pilot A’s control input on the lateral cyclic axis saw a significant increase in both the maximum and minimum values during the H-AR2 configuration. This coincides with the exceeding of the longitudinal position requirement. The pilot also struggled to maintain the altitude and heading limits within adequate boundaries. This performance highlights the extraordinarily high workload and HQR assigned. Evidently, the H-AR2 configuration introduced control effects that reduced Pilot A’s performance to an unacceptable level.

On the other hand, Pilot D achieved the desired task performance only when using the H-AR1 cue configuration. For the H-DP and H-AR2 configurations, violations of the longitudinal and lateral position, along with altitude, were observed.

When examining the control activity, a significant decrease in control activity on the lateral cyclic axis can be noted when transitioning from DP cues to the HVCs. In addressing the observed variability in Pilot D’s H-DP inceptor activity, which was significantly higher compared to other visual cues, it is critical to emphasize the adaptive nature of pilot behavior in response to changing flight conditions. Research into pilot dynamics has consistently demonstrated that pilots are active, adaptive components within the aircraft’s feedback control system, capable of modifying their "gain"—the level of responsiveness to control inputs relative to deviations from desired flight states. This ability to adjust gain is a fundamental aspect of pilot interaction with aircraft dynamics. From the foundational studies in the 1960s, which introduced mathematical models of pilots as dynamic elements in control systems, to more recent investigations, the concept of pilot gain has been a pivotal factor in understanding pilot behavior. Notably, the research by Mitchell, Aponso, and Hoh in the late 1980s highlighted not only the variability of pilot gain across different pilots, but also within individual pilots who might change their control strategies during a flight. These changes can be attributed to a range of factors including differing aircraft behaviors, mission requirements, or pilot perceptions of task complexity and risk. Pilot D’s increased inceptor activity using H-DP cues likely indicates a deliberate adjustment in gain ("high-gain pilot" vs. "low-gain pilot"). The underlying reasons for the observed change in gain remain ultimately unclear, as no specific comments were provided by the pilot regarding this adjustment. It is possible that part of the explanation lies in the fact that the pilot was missing critical information through the dome projection, details of which were not communicated. Additionally, the pilot might have actively or subconsciously altered their control strategy, either to enhance task performance or to reduce workload during the task. This adjustment could have occurred irrespective of the visual cues presented, indicating a complex interplay of cognitive factors and situational awareness in pilot decision-making.

The H-AR2 cues were ranked lowest in terms of HQR and BWR, but interestingly they still resulted in the smallest control deflection overall. This suggests that the ratings were not due to an increase in control activity-induced workload.

Fig. 14
figure 14

Pilot A time history cyclic control activity for MTE Hover during the 30-s stable hover phase

Fig. 15
figure 15

Pilot D time history cyclic control activity for MTE Hover during the 30-s stable hover phase

In summary, with the above, combined with other comments collected from the pilot during the evaluations and in rigorous debriefing, the following statements and potential areas of improvement could be identified for the HVC configurations:

Fig. 16
figure 16

Time history performance for MTE Slalom

Holographic standard cues (H-AR1)

Overall experience The pilots reported a more "comfortable" and "pleasant" flight experience using the H-AR1 setup, which they found more "enjoyable" compared to the dome projection visual cues. This indicates a positive reception of the system’s current design in terms of contrast, FOV, and ergonomics. A design change could be unnecessary in this respect, considering the favorable pilot feedback.

Impact of peripheral vision For navigation and position maintenance during flight, pilots mainly depend on the visual references provided and their peripheral vision. Contrary to initial assumptions, the impact of information loss resulting from the limited horizontal FOV of the HMD and the inability to display task-related information in the peripheral view, on average, did not significantly influence pilot ratings. Still, possible improvements could include enhancing the visual cues and optimizing the system to better support horizontal peripheral vision.

Need for improved ground texture Some pilots noted that the ground texture in the simulator lacked quality and impacted overall hover performance due to a lack of drift information compared to the real aircraft. This suggests a need for a design upgrade focused on enriching the visual details of the ground texture in the simulator to provide a more realistic flight experience.

Holographic advanced cues (H-AR2)

User interface simplification Several pilots have expressed that they find certain aspects of the system, such as the approach tunnel, to be disconcerting and distracting. Furthermore, they believe that standard visual cues are sufficient for their needs. Based on these comments, one potential design change could involve simplifying the user interface and reducing the complexity of visual cues to minimize distraction and increase user-friendliness.

Longitudinal performance information Pilots suggest that the position in the longitudinal axis should ideally be displayed on the horizontal plane instead of the vertical (from the pilot’s point of view). Design changes could focus on presenting three-dimensional information more intuitively by integrating axis and height data more cohesively and intuitively

Relevance of advanced cues Based on the findings, it may be concluded that the provision of advanced cues is unnecessary. Despite initial assumptions, the average impact of information loss due to the limited horizontal FOV of the HMD and the inability to display task-related information in the peripheral view did not have a substantial influence on pilot ratings. Therefore, the inclusion of additional cues beyond H-AR1 appears to be unwarranted.

4.2.2 MTE Slalom

Figure 16a illustrates the task performance of Pilot A during an evaluation run of the MTE Slalom, while Fig. 16b does the same for Pilot B.

Figure 17 displays the cyclic control activity for Pilots A, and Fig. 18 that of Pilot B during the MTE Slalom.

During the use of HVCs, Pilot A’s control input significantly increases, reaching a peak with nearly full-range control inputs for the S-AR2 configuration. Comparing this with the achieved task performance, it is clear that this heightened control aggression corresponds to higher ground speeds during the maneuver. All visual cues received the same rating in terms of HQR and BWR, suggesting that the pilot felt more confident and thus increased aggression during the maneuver.

The pilot argued that this was because the height reference supplied by S-AR2 made it possible to concentrate fully on increasing cornering speeds, eliminating the need for occasional altimeter cross-checks. While the pilot believed that altitude is typically assessed intuitively during maneuvering, the additional height information in the gates removed any guesswork, freeing up mental resources to enhance task performance. Interestingly, the pilots did not consider this to negatively impact the nature of the task, despite a clear change in control strategy. This underscores the idea that a change in control strategy and activity is not necessarily an indicator of deviation in task design goals. Instead, it should be evaluated in conjunction with the pilot’s intentions and task aggression.

After getting familiar with the highest level of performance achievable through training runs, Pilot B chose to prioritize adequate performance right from the start. It was decided to lower the target ground speed to 50 knots for all evaluation runs. Comparing Pilot B’s control behavior with Pilot A’s, it was observed that there were no significant variations in control input intensity when transitioning between visual cues. Pilot B also mentioned that having an additional height reference in S-AR2 was beneficial. However, unlike Pilot A, who used the extra available compensation capacity to increase task aggression, Pilot B decided to maintain a consistent level of aggression and instead chose to decrease the BWRs for the HVCs.

Fig. 17
figure 17

Pilot A time history cyclic control activity for MTE Slalom

Fig. 18
figure 18

Pilot B time history cyclic control activity for MTE Slalom

In summary, with the above, combined with other comments collected from the pilot during the evaluations and in rigorous debriefing the following statements and potential areas of improvement could be identified for the HVC configurations:

Holographic standard cues (S-AR1)

Diminished legibility of PFD Pilots noted the dark glass tinting of the HoloLens 2 during PFD scans. While it was not necessarily a problem, it was a noticeable feature. A design change could be unnecessary in this respect, considering the favorable pilot feedback.

Losing sight of gates during cornering

Pilots observed that during tight turns and rapid head movements to focus on subsequent gates, the limited horizontal field of view (FOV) of the HMD occasionally caused the gates to disappear from their peripheral vision. Although this phenomenon did not significantly impact pilot ratings, it was identified as a notable characteristic of the current setup. To address this issue, design enhancements could prioritize either improving the system’s horizontal FOV or incorporating additional scene elements to support peripheral vision. These adjustments may offer a more feasible implementation compared to adopting a new HMD with an increased horizontal FOV.

Losing sight of gates during PFD scan The limited vertical FOV of the HMD resulted in pilots encountering challenges while attempting to maintain track of the gates during PFD scans. This information gap necessitates design improvements to enhance the vertical FOV capabilities of the HMD, enabling pilots to maintain better situational awareness and more effectively track the gates during their PFD scans. To address this concern, potential design enhancements could prioritize improving the system’s vertical field of view (FOV), taking advantage of an alternative HMD with an expanded vertical FOV.

Holographic advanced cues (S-AR2)

Disappearance of gates during head movements Pilots noted that the disappearance of references with head movement was even more pronounced and required adjustment. This suggests that the system’s visual dynamics might be a little challenging for new users. A potential design change could involve developing more stable or persistent visual references that do not change drastically with head movement.

Object within the direct flight path The comments indicated that the pilots were able to navigate around and through the objects in the flight field without any problems. There was the consideration that pilots may be reluctant to fly through "solid" objects such as the S-AR2 gates which was not the case. This suggests that the system’s current object rendering and placement are effective and may not require significant modifications.

5 Conclusion

This research examined the effects of holographic visual cues (HVCs) on pilot handling qualities ratings, workload ratings, and task performance during piloted simulations, particularly within the MTE Hover and the MTE Slalom tasks. Although the presented analysis indicates that HVCs, when appropriately designed, on average do not significantly affect the ratings of pilots, individual reactions varied. The findings underscore the potential of this technology as an alternative visual cue environment for HQ evaluations, but also highlight areas for improvement and further exploration.

Handling qualities assessment Pilots demonstrated comparable handling qualities ratings when using holographic visual cues compared to the traditional dome projection setup. However, individual pilot ratings highlighted the influence of task sensitivity, personal control, and visual scanning strategies on these ratings. Some pilots expressed a preference for the holographic standard cues (H-AR1 and S-AR1), while others found the advanced cues distracting or even unsettling. Notably, the H-AR2 setup introduced control effects that some pilots deemed unacceptable. These findings suggest that H-AR1 provides a suitable alternative visual cue environment for handling qualities evaluation in the Hover MTE. Additionally, both S-AR1 and S-AR2 are indicated as suitable alternatives for handling qualities evaluation in the Slalom MTE.

Design implications Pilots primarily rely on visual references and peripheral vision for flight navigation and maintaining position. Surprisingly, the limited horizontal field of view (FOV) and inability to display task-related information in the peripheral view did not significantly affect pilot ratings as initially assumed. However, the HMD’s limited vertical field of view (FOV) posed challenges for pilots in tracking gates during PFD scans. To improve situational awareness, design enhancements should focus on expanding the vertical FOV of the HMD. Considering alternative HMD options, with an improved vertical FOV could address this concern. Furthermore, pilots’ feedback indicated that they successfully navigated around and through objects in the flight field without encountering difficulties. Contrary to expectations, pilots did not hesitate to fly through "solid" objects like the S-AR2 gates. Finally, pilots observed the dark glass tinting of the HoloLens 2 during PFD scans. Although it was noticeable, it did not pose a significant issue. Based on positive pilot feedback, it may not be necessary to make design changes in this regard.

Future work

While this study provides valuable insights into the influence of holographic visual cues on the handling qualities assessment process, further research is required for validation. Future studies should address identified hardware limitations, increase pilot sample sizes, and consider additional potential confounding factors such as pilot experience. Expanding holographic representation concepts to other maneuvers from ADS-33E and EASA SC-VTOL Handling Qualities MOCs would establish a foundation for comprehensive simulator studies and subsequent flight tests.

To enhance the validation process, future work will incorporate eye-tracking technology to analyze the visual behavior of pilots, allowing for a comparison of how pilots interact with different visual cueing hardware solutions. This approach will provide a deeper understanding of their engagement and attentional focus during maneuvers. By examining specific areas where pilots direct their gaze and the duration of their fixations on certain cues, insights can be gained into the effectiveness of holographic visual cues and any discrepancies in pilot performance can be identified. This targeted analysis will address the variability in pilot responses, contribute to refining the integration and design of visual cueing systems in simulators and flight tests, overcome hardware limitations, account for factors such as pilot experience, and support the expansion of holographic representation concepts to other critical maneuvers, thus establishing a more robust foundation for future evaluations.

Fig. 19
figure 19

Suggested course for MTE Hover [30]

The next steps for advancing the system involves conducting flight tests, for which initial integration testing has already been carried out. Feedback from these preliminary tests indicates the necessity for several improvements. Pilots have noted issues such as the poor visibility of green elements, which could be mistaken for the flight path marker or other established green-colored visual elements, suggesting a need for color adaptations. Additionally, while the incorporation of a 20% light transmissivity foil improved the readability of the primary flight display (PFD), visibility issues persist with standby instruments, particularly after extended use, highlighting the importance of further enhancing the display features for long-duration flights. Significant technical challenges with the internal head tracker of the HoloLens 2, such as drift in heading and gravity swim in pitch and roll during vehicle accelerations, have prompted the integration of an external head tracking system. This system aims to mitigate the latency and stability issues observed with the internal inertial measurement unit (IMU), ensuring more reliable and accurate tracking capabilities essential for dynamic flight conditions. With these adjustments, future flight tests will critically assess the refined HMD technology in various MTE evaluations.

figure a