1 Introduction

Attempts to support human operators during work are increasing because the number of seafarers onboard is being reduced owing to automation. This situation will be exacerbated in the future due to plans for autonomous shipping. The importance of human factors is magnified by the fact that 80% of the incidents in maritime operations are caused by human error (Wróbel 2021). More specifically, 71% of human errors in maritime operations include the lack of situational awareness (SA) as a causal factor (Grech et al. 2002). In addition to the SA problem, alarm systems designed to support the operator can become problematic. Alarm systems are simple forms of automation; however, they increase the operators’ mental workload when installed inappropriately. Several alarm systems are reported to be misprioritized and nuisances because it contains false alarms (Jones et al. 2006). Therefore, to enable a suitable workplace for operators in engine supervisory control, human factor aspects, such as SA, should be considered in the design process (Man et al. 2018).

The application of wearable devices, such as smart glasses, often called head-worn or head-mounted displays, to support operators during work has been examined in several studies. These studies have been conducted in different workplace settings, such as marine operations (Ostendorp et al. 2015), manufacturing (Aromaa et al. 2016; Danielsson et al. 2018), warehouses (Stoltz et al. 2017), and patient monitoring (Pascale et al. 2019; Klueber et al. 2019). A head-worn display is a wearable device with an optical display for at least one eye. It displays a projected immersive augmented reality (AR) blended with the surrounding environment, along with projected digital overlay data (Khakurel et al. 2018). The implementation of head-worn displays has been gaining momentum, and the number of studies in this area is increasing (Bal et al. 2021). In addition, many studies have confirmed the use of head-worn displays to balance job control and demand in complex work situations. However, compared with other work environments, few studies have implemented head-worn displays in ship operations, especially in engine department work. Therefore, this study aims to examine the use of head-worn displays as a cognitive aid to support onboard operators.

1.1 Engine department work and task allocation

The work of the engine department onboard a ship is divided into two areas: the engine room (ER) where the machinery is located, and the engine control room (ECR) where supervisory control is conducted. These two areas are located close to each other for flexible access and ease of movement. During the day, operators perform engine watchkeeping and maintenance work in the ER, leaving the ECR primarily unmanned. However, in case of trouble, the engineer returns to the ECR to confirm activated alarms displayed on the engine control console (ECC) and take necessary action. As stated in the guidelines, an alarm on the ECC originating from the machinery, steering gear system, control system, and bilge should be audibly and visually indicated (International Maritime Organization 2009). Currently, operators can follow the warning system by using display extensions, such as indicator columns, in the engine room. This helps transmit the alarms, but because of space limitations and simple basic installation, it provides less information than the actual alarm display installed in the ECR.

One problem with the alarm system onboard a ship is that the number of alarms increases as more sensors are installed (Vu et al. 2019; Maglić and Zec 2020). This situation is mainly followed by an increased number of alarm floods and false alarms, often triggered by sensor problems (Adhitya et al. 2014). A study on alarm performance assessment in engine supervisory control reported false alarms as the main issue in engine resources management (Nizar et al. 2021). When considering human performance, it has also been reported that false alarms lead to inefficient handling of alarms in engine operations (Lundh et al. 2011). In addition, false alarms affect the task allocation of the operator during work because the operators must perform unnecessary movements from the ER to the ECR to check incoming machinery alarms. Over time, false alarm exposure may reduce the operators’ reliance on the alarm system, as their trust in automation declines. This leads to the disuse and misuse of alarm systems (Parasuraman and Riley 1997).

The STCW has already included task allocation as an aspect of the non-technical skills (International Maritime Organization 2011). During cadet training, seafarers are taught to prioritize which task must be attended first. For instance, in engine departments with different engine operation profiles, the operator should focus on the engine supervisory control during leaving and entering the port. This is particularly important during the stand-by engine procedure. During these situations, other tasks, such as maintenance, are less crucial and can be postponed. When the engine position is more stable, such as in ocean-going operational conditions, operators are more flexible and perform maintenance tasks more efficiently. Providing the operator with more time for maintenance is one strategy that supports task allocation (Wróbel 2021). However, distractions such as false alarms that require the operator to manually acknowledge them in the ECR should be eliminated. Moreover, repeatedly moving from one task to another increases the possibility of slips or lapses in maintenance tasks, leading to poor performance and increased human error. Considering future operational projections, task allocation becomes even more critical when the number of onboard working operators is reduced.

1.2 Head-worn display and its application

Accessing information on a head-worn display can support the operators in selecting attention-requiring tasks and offer better SA. This allows the operators to focus more on essential stimuli (McLaughlin and Byrne 2020). The same function has been implemented in several process control operations using a handheld mobile device such as a personal digital assistant (PDA) (Blauhut and Seip 2018). However, this is less practical for operators because they usually require both hands to perform maintenance tasks. In addition, the head-worn display acts as a cognitive aid to store information, enabling the operators to grasp more working memory available to perform another task. As operators can access information without confirming the details with another operator in another location, using a head-worn display eliminates unnecessary social support between operators (Bal et al. 2021). Implementing this type of device will allow operators to perform more tasks, drastically changing the work environment (Danielsson et al. 2018).

Alarm displays are essential in supervisory control to support data-driven monitoring, in which the form of information affects the operator’s monitoring behavior (Vicente et al. 2004). However, operators should also rely on knowledge-driven monitoring, in which the attention can be directed deliberately to specific information that is expected to generate a situation model. Knowledge-monitoring behavior allows operators to detect problems before they become significant (Mumaw et al. 2000). Continuous information displayed on a head-worn display can be useful in achieving this. In other work environments such as patient monitoring, several studies have already examined its application with human subjects in experiments (Pascale et al. 2019; Klueber et al. 2019). When participants used head-worn displays during multi-tasking, they reported fewer physical and temporal demands. The participants also reported being able to recognize and discriminate against false alarms. In supervisory control, it is also crucial to have the ability to distinguish between instrumentation and component failure (Mumaw et al. 2000). However, displaying basic information is not mainly to reduce false alarms but rather to provide information on trends and context before an alarm sounds; this can help the operator judge its urgency (Klueber et al. 2019). The operator can use whatever the system offers in information acquisition as long as the operator has access to the raw data (Parasuraman and Manzey 2010). In the correlation of trust in automation, if such information is not readily available on display, trust cannot be developed appropriately (Lee and See 2004).

The information on the head-worn displays to disambiguate the alarms may reduce workloads by simplifying the decision-making process and thus encourage better alarm prioritization (Pascale et al. 2019). However, the use of head-worn displays while working has some disadvantages. First, the possibility of misfocus is high because the projected visual display is blended with the environment. If the real environment demands constant visual attention, the information displayed on the head-worn display may go unnoticed (Pascale et al. 2019). By contrast, if the display draws too much visual attention, information from the environment may be missed. Second, when a stimulus-saturated environment occurs on the display, the operator’s workload increases because of the increased amount of information in the visual field (Pascale et al. 2019). Considering the operator’s skill discretion, using a head-worn display may reduce the operator’s skill acquisition process because the provision of additional task information disrupts the learning curves or familiarization (Terhoeven et al. 2018; Bal et al. 2021). Thus, in the study of user acceptance of head-worn displays, operators fear that recent skill may not be valued and may become obsolete (Aromaa et al. 2016; Stoltz et al. 2017). This trade-off should to be considered, as there is merit in comparing it with the situation of not having information at all.

1.3 Current study

The use of head-worn display in the work area has several advantages; however, its implementation should be examined to reduce the drawbacks in design and installation. This study examines the impact of head-worn display applications and how its utilization can improve engine department work, especially in handling false alarms. Several aspects of human performance shaping factors such as workload, SA, and trust in automation, are assessed via subjective measurements. These measurements are generally used to evaluate the human-machine interface in process control (Xu et al. 2018).

2 Methods

2.1 Participants

Twelve undergraduate and graduate marine engineering student with an average age of 21.75 (± 1.07) participated in the experiments. The participants were cadets with a month to a year of experience onboard a ship. Based on their curriculum and sea training experience, we assumed that the participants were equally capable of conducting the designated tasks during the experiments. All the participants provided informed consent by signing the participant agreement prior to the experiment. The participants’ recruitment and experimental procedures followed the ethical codes provided by the faculty board.

2.2 Experimental design

We used a 2 × 2 within-participant experimental design. All participants followed two information display status conditions on the head-worn display: information-on and information-off. The participants continuously wore the head-worn displays in both conditions to reduce any additional workload interference. All participants also followed two conditions of task load levels that changed during the task: high and low-load conditions, made by the number of alarms activated in one session. The experiment was conducted using an engine room simulator, as illustrated in Fig. 1. The simulator was divided into two rooms: ER and ECR. The ER consisted of two monitoring displays that mimicked the equipment and components of an actual engine plant on a ship. The participants could control the system process during the experiment, including opening and closing a valve, starting and stopping a pump, and performing other machinery operations. In the ECR, an ECC was installed to monitor engine operation conditions and alarm indicators. Subsequently, the participants could acknowledge the alarms and take the necessary action to return to normal conditions.

Fig. 1
figure 1

Engine plant simulator layout: separation between ER and ECR

The participants performed two tasks to replicate work onboard a ship. They were required to perform several maintenance tasks on the ECR side. Simultaneously, the participants were asked to conduct a monitoring task by acknowledging the alarm from the ECR on its activation. The maintenance tasks include opening or closing a valve, starting or stopping a pump, and conducting a tank-sounding task. These maintenance tasks demand cognitive and spatial task-load capabilities from the participants because they need to focus on finding the component location along the flow line of the engine system on the monitor displays. In some tasks, such as taking a tank-sounding measurement, the participants should also to conduct simple calculations to estimate the tank level.

Two types of alarms were introduced during the experiment: true and false alarms. True alarms were activated when the process value increased (or decreased) beyond a threshold, owing to the engine operation conditions. When false alarms were activated by a sensor failure or ship movement, they did not need to be handled. During each session, we introduced 12 and 6 alarms in the high and low-load scenarios, respectively. Consequently, the ratios of the true and false alarms in the two scenarios were identical. There were 4 true alarms and 8 false alarms in the high-load scenario, and 2 true alarms and 4 false alarms in the low-load scenario.

The head-worn display used in this experiment was a binocular Epson Moverio BT-200, shown in Fig. 2. The binocular type is more comfortable and easier to focus because the optical display can be observed by both eyes (Kim et al. 2019). During both the information-on and information-off scenarios, the participants could observe the engine indicators as shown in the left part of Fig. 2, on the ECC located at the ECR. However, only in the information-on scenario, participants could access the same information on the head-worn display. The parameters included in the engine indicator were selected by a subject matter expert (SME) based on their importance during engine supervisory control. The three engine indicators include the process value, high-alarm threshold, and low-alarm threshold. If a true alarm is activated, the process value slowly increases (or decreases) and crosses the high-alarm threshold (or low-alarm threshold). If a false alarm occurs, the process value suddenly drops (or climbs) to the lowest level (or highest level); for instance, the L.O INLET indicates 0.01 (or 0.99). An active alarm has different indicator colors on the display, followed by audible alarms. When the participants acknowledged the alarm, all alarms were automatically configured to turn off after 20 s.

Fig. 2
figure 2

Engine indicator projected on the head-worn display

The participants were asked to visit the engine room simulator five times: three times for training sessions and two times for data measurements. The minimum interval between each training session was 1 day, from training to data measurement and that between each data measurement was a minimum of 6 days. The training sessions allowed the participants to become familiar with the experimental setup and tasks. The first data measurement session was divided into two trials with different workloads with the same display status on the head-worn display. The second data measurement session followed the same pattern but had different head-worn display status conditions.

2.3 Dependent variable (measurement tools)

In evaluating the human-machine interface, human performance evaluations such as SA, trust in automation, and workload are used as they are common in process control environments (Xu et al. 2018). In this experiment, the measurements were divided into subjective and objective categories. Objective measurements were taken during the experiments using simulator-recorded events and the participants’ response times. Subjective measurements were obtained using questionnaires after the participants finished each trial. At the end of the experiments, we verbally asked the participants to provide open feedback on the scenarios.

We used the NASA-TLX (NASA Task Load Index) questionnaire to measure the task workloads in each trial (Hart and Staveland 1988). NASA-TLX has 20 Likert-type scales to measure six dimensions of workload: mental demand, physical demand, temporal demand, performance, effort, and frustration. The perceived workload was then defined as the sum of these dimensions after weighting using multiple comparisons. To measure SA, we applied SART (Situational Awareness Rating Technique) (Taylor 1990). It covers ten dimensions grouped into three categories: attentional demand, attentional supply, and situational understanding. The perceived SA was then defined as the total situational understanding less the difference between attention demand and supply. Further, trust in automation was measured using the TiA (Trust in Automation) questionnaire (Jian et al. 2000). TiA has 12 dimensions, divided into groups with seven dimensions measuring trust and five dimensions measuring distrust.

For objective measurements, the response time of each participant was recorded using simulator system. We defined the response time (RT) as the time that elapsed from alarm activation until the alarm was deactivated. Therefore, if the participants neglected an alarm, it was automatically deactivated, resulting in the maximum response time (RT = 20 s). Furthermore, the simulator-recorded system measured the percentage of completed maintenance tasks.

3 Results

Two-way analysis of variance (ANOVA) was used to determine if there were any significant difference in perceived workload. The results of the two-way ANOVA testing the interference of display status and task load on perceived workload showed no statistically significant difference (F(1,11) = 1.63, p = .23). For each main effect analysis, the participants perceived a higher workload in trials with information-off (M = 11.43, SD = 2.57) than in trials with information-on (M = 9.35, SD = 3.20), with a statistically significant difference (F(1,11) = 7.96, p < .05). The participants perceived similar workload within high and low task loads (F(1,11) = 1.34, p = .27). Figure 3 (a) shows the interaction between the variables.

Fig. 3
figure 3

Interaction between display status and task load with (a) perceived workload, and (b) perceived SA

Similar to the comparison of perceived workload, the perceived SA was analyzed using a two-way ANOVA for the display status and task load effect. No two-way interference was observed (F(1,11) = 0.04, p = .85). The interaction in the single effect of display status did not show a statistically significant difference (F(1,11) = 2.58, p = .14). The interaction of the task load showed that the participants perceived a higher SA in the low-load session (M = 24.21, SD = 5.94) than in the high-load session (M = 22.08, SD = 6.47), with statistically significant difference (F(1,11) = 6.97, p < .05). The interaction between these variables is shown in Fig. 3(b).

The interaction between the display status and task load with perceived trust is shown in Fig. 4 (a). There is no two-way interaction, tested with a two-way ANOVA (F(1,11) = 0.55, p = .47). Interference in the main effect of display status showed that the participants perceived more trust in trials with information-on (M = 31.21, SD = 7.01) than in trials with information-off (M = 27.75, SD = 8.39), with a statistically significant difference (F(1,11) = 7.32, p < .05). Interference of the task load did not have an effect (F(1,11) = 0.33, p = .58).

Fig. 4
figure 4

Interaction between display status and task load with (a) perceived trust, and (b) task completed

The interaction between the display status and task load with the completed task is shown in Fig. 4 (b). There was no two-way interference as tested with two-way ANOVA (F(1,11) = 0.38, p = .55). The main effect of display status showed that participants completed more tasks in the information-on trials (M = 122.13, SD = 27.61) than in the information-off trials (M = 109.08, SD = 23.11) with a small statistically significant difference (F(1,11) = 12.97, p = .07).

The alarm response ratio, as an objective measurement, is shown in Fig. 5 (a). There was a two-way interaction between the variables (F(1,11) = 41.18, p < .01). A pairwise t-test for each variable revealed that the participants responded less often to false alarms when in the information-on display status (M = 0.97, SD = 0.36 ) than when in the information-off (M = 0.98, SD = 0.51), with a statistically significant effect (p < .01). Furthermore, the response time in seconds as an objective measurement is shown in Fig. 5 (b). There was no two-way interaction between the variables (F(1,11) = 0.51, p = .49). The main effect of the display status showed that the participants responded to the alarm slower in the session with display status information-on (M = 10.27, SD = 1.48) than in the session with display status information-off (M = 8.94, SD = 1.27), with a statistically significant difference (F(1,11) = 7.69, p < .05). The interference of the task load did not have any effect (F(1,11) = 0.99, p = .34).

Fig. 5
figure 5

Interaction between display status and alarm type with (a) alarm response ratio, and (b) alarm response time

4 Discussion

This study examined the advantages of using head-worn displays in engine department work. An experiment involving human subjects was conducted using an engine plant simulator, and cadets were asked to participate. The participants followed two conditions of information on the head-worn display (information-on and information-off) and two conditions of task load (high-load and low-load). Workload, SA, and trust in automation were measured subjectively. In addition, response frequency and time were measured objectively.

Workload measurement revealed that the participants perceived a lower workload in the information-on trials. This result suggests that when the participants have adequate information to identify an incoming alarm, as true or false alarm, they can safely avoid returning to the ECR. This effectively reduces their workload by moving less frequently. However, in this experiment, the task load did not affect the perceived workload of the participants. It prevents examining the correlations between workload and SA with trust in automation. Moreover, several participants verbally reported that both scenarios (high-load, low-load) were similar.

One of the human performance factors, SA, that we predicted would increase if participants had information on head-worn displays during the task, did not appear to make any difference. This can be explained by the trade-off between attention demand and supply. As the attention supply increases with information on the head-worn display, the attention demand also increases because the layer of information creates a more complicated visual environment. In conclusion, the continuous input of information on a visual display is less effective in improving SA than expected. Moreover, in practice, information on the head-worn display should be turned off when the engine parameters are in the average running condition and automatically turned on when the parameters move toward the alarm threshold range.

Trust in automation measured subjectively indicates that the participants had more trust in the alarm system when wearing a head-worn display with additional information. Moreover, the response ratio results show that the participants did not respond to the ECR when the alarms were false. The participants were reluctant to return to the ECR to attend to false alarms when information was available on the head-worn display. Meanwhile, participant adopted a “better-safe-than-sorry” approach in a trade-off between maintenance and monitoring tasks when no raw information was available; they took the safe course of action and returned to the ECR every time an alarm sounded.

The total maintenance also improved because participants were more likely to perform maintenance tasks when information was available on the head-worn display. Consequently, false alarms from the ECR could be safely neglected. With the information available to confirm a false alarm, the participants placed more trust in the alarm system and could prioritize multiple tasks efficiently. However, the response time exhibited a different tendency. Participants with information on their head-worn displays took longer to respond to the alarms. The participants said that they only confirmed the information after the alarm sounded, and this confirmation time made the response time longer than when there was no information to confirm. Nonetheless, a suitable response action for the engine supervisory control is more important than a short response time. We suggest that this trade-off between confirmation and response time can be accepted for better performance.

The limitations of this study include the experimental setup and procedures. For example, the presence of information on head-worn displays was permanent. Participants may have the option to activate or deactivate it if they wanted to. Furthermore, the study was conducted at an ER simulator with a screen display. Therefore, it may be more suitable to conduct an experiment in an environment closer to an actual situation without using another display to confuse the visual environment when using a head-worn display. Such ER environment issues from temperature and humidity may attract attention to provide comfort to the operator when using a head-worn display during the work. Moreover, only a small number of participants were included in this study. This implies that the participants had to perform all scenario combinations. Therefore, a carry-over effect may exist, and the participants may have performed better in the last scenarios. This limitation can be overcome by inviting more participants. Participants can be divided into several group scenarios for more reliable outcomes.

5 Conclusion

The use of head-worn displays as a cognition aid is gaining attention in various work areas. However, its utilization for onboard ship operations, especially in engine department work, has not yet been examined. Informative data such as sensor measurements and alarm indicators from the source console can be projected onto a head-worn display. This may help tackle alarm-handling issues on board a ship. In this study, the implementation of head-worn displays was examined using a full-mission engine plant simulator. The findings explained the trade-off between the availability of information and visual complexity in constructing situational awareness and its interaction with the workload and trust in automation.

The ability of human factors concepts such as workload, situational awareness, and trust in automation to explain different states of human behavior is beneficial in evaluating new technologies or devices in the work environment. However, as this study was conducted in an engine plant simulator, future studies may consider examining it in an actual engine room situation to investigate if such an advantage of using head-worn displays persists. These limitations must be addressed, and further study is required to extend these findings.