Effects of coherent, integrated, and context-dependent adaptable user interfaces on operators’ situation awareness, performance, and workload

Nautical traffic management in The Netherlands is shifting from local traffic control to corridor traffic management. Current traffic management systems do not sufficiently support operators in perceptual and cognitive process to interpret and understand the large amounts of information needed for corridor traffic management. Newly developed user interface concepts aim to overcome deficiencies of current interface designs that insufficiently support situation awareness assessment. The effects of these new user interfaces, however, are insufficiently known due to the intricate relations between situation awareness, task performance, and workload. The objective of this study is to evaluate the effects of the three previously developed user interface concepts on operators’ situation awareness, task performance, and workload to gain better insights into the benefits and limitations of the user interface design concepts. The effects were tested in a simulator environment. The results show that user interface features of an integrated user interface allowed operators to apply more effective information processing, which resulted in better task performance. Features of a context-dependent adaptable user interface triggered proactive behavior of operators, which resulted in better task performance for tasks in which operators require insight into future activities of the elements in the environment.


Introduction
Nautical traffic management in the Netherlands is shifting from local traffic control to corridor traffic management (Van Doorn et al. 2017b). Corridor traffic management operators, called nautical operational network management (N-ONM) operators, remotely manage a traffic corridor, such as the main route and alternative routes, between Port of Rotterdam and Germany. They need to gain and maintain situation awareness (SA) based on large amounts of information about the corridor. Previous work, however, showed that current traffic management information systems do not sufficiently support users in perceptual and cognitive processes to interpret and understand the presented information (Van Doorn et al. 2015, 2017b. In our previous work, we designed and developed three user interface (UI) concepts to overcome deficiencies of current traffic management systems to increase operators' SA and to improve operators' task performance (Van Doorn et al. 2017a). The three concepts are built upon each other. As a first concept, a coherent UI was developed, in which the UI is a logical, consistent, orderly, and harmonious interface, where the multiple UI windows form a coherent whole. The second concept, an integrated UI, furthermore uses information fusion, clustering, and interaction between UI windows. The third concept, a context-dependent adaptable UI, additionally captures context information, assesses the implications of context, and accordingly adapts the interface content and composition. The context-dependent adaptable user interface, thus, includes all features of the other two concepts and as such is the most elaborated UI of the three. Although this UI concept is more difficult and expensive to develop and maintain compared to the other two, the assumption in our previous study was that this user interface would also provide significantly better user support. Utility and usability testing, however, showed that operators did not report a significant difference in how much the different UIs supported them when comparing the integrated UI and context-dependent adaptable UI. We also found that operators did not experience deficiencies when working with the integrated UI. Despite an overall preference of operators for the context-dependent adaptable UI, this raised the question "whether using a context-dependent adaptable UI instead of an integrated UI will improve the SA of the operators to such extent that it warrants the consideration of the first one despite of the higher efforts and overheads of implementation" (Van Doorn et al. 2017a).
The objective of this study is to evaluate the effect of the three previously developed UI concepts on operators' SA, task performance, and workload in order gain better insights into the benefits and limitations of the UI design concepts. For this purpose, we used the same experiment as the previous study in terms of research set-up, scenarios, and participants. But different data, which were gathered but not analysed earlier. Instead of subjective measures obtained through structured interviews, this study uses objective performance measures in a within-subject design. Data logged by the simulator system were combined with SA assessment according to the Situation Awareness Global Assessment Technique (SAGAT) developed by Endsley (2000). Raw NASA Task Load Index (RTLX) (Hart 2006) was used to evaluated operators' workload.
To structure the reasoning about the effects of the UI concepts, Sect. 2 provides a brief overview of current knowledge on the intricate relations between SA, task performance, and workload in relation to user interface design. In Sect. 3, the research approach to test the effect of the UI concepts on operators' SA, task performance, and workload is introduced. A good understanding of the UI concepts was required for selecting the methods to measure their effect. Section 3, therefore, includes a thorough description of the UI concepts. Section 4 gives an overview of the results and findings of our study. The discussion in Sect. 5 gives a short resume of the results followed by a thorough discussion of the research question in relation to the literature provided in Sect. 2. Section 6 concludes which implication this study has on selecting a suitable UI design for operators working in dynamic task environments, where operators require SA for time-constrained decisions and actions.

Relations between situation awareness, task performance, and workload
The concepts of SA, task performance, and workload are intricately intertwined. SA knowledge consists of the combination of perceptual knowledge (factual knowledge of elements in the current situation), comprehended knowledge (understanding of the meaning and relationships of knowledge in the current situation) and projected knowledge (insight into future activities of the elements in the environment) (Van Doorn et al. 2014). Sufficiently correct and complete SA knowledge is required for correct decisionmaking and, thus, low situation awareness can have a negative effect on operators' task performance (Endsley 1995). While SA could be improved by working harder, high mental workload can negatively affect operators' SA (Endsley 1995;Vidulich and Tsang 2012). A low cognitive task load on the other hand can result in boredom and under-load, which also may negatively affect SA assessment and task performance (Edwards et al. 2017). SA knowledge is the understanding of dynamic information associated with operator' goals and does not include more static knowledge stored in long-term memory (Endsley 2000). Command and control operators commonly require understanding of the current and prospective meaning and relationships of large amounts of information about their dynamic environment. Due to the need to have such a large amount of information mentally available, the main challenge of operators in gaining and maintaining SA is their ability to locate and process such information (Endsley 2000). As such, SA and mental workload make use of the same cognitive processes, for which capacities are limited (Vidulich and Tsang 2015). A higher level of workload means that more attention is needed for performing tasks and less is left for maintaining SA.
The relation between SA and workload is especially relevant in cases of mental overload. It is argued that stress reduces working memory capacity and retrieval of information, and that overload, thus, negatively influences SA assessment (Endsley 1995). Understressed operators tend to focus their attention on a limited number of dominant pieces of information. They tend to (1) arrive at a decision without exploring all available information, (2) put more attention to negative information, and (3) have a more scattered and poorly organized scanning of stimuli (Endsley 1995). While focusing on negative information can be a positive strategy, as negative information is a cause of problems to be solved, and this may also result in operators missing other relevant information.
Which information operators focus on, however, also depends on how the information is presented to the user (Treisman 1985;Wolfe and Horowitz 2004). And thus, to which degree the working memory decrements affect SA also depends on the UI design of the systems used. At the same time, system design also influences operators' required SA to achieve operators' goals (Van Doorn et al. 2017b). Consider a 'System A' which simply displays long lists with all available data, and a 'System B' that processes data and only presents a small overview of derivative information.
With 'System A', operators might need to use memorybased information processing strategies for timely task performance. While with 'System B', the cost of accessing information is much lower. Therefore, operators could use display-based information processing strategies with 'System B'. The information contained and presented by the technical system, thus, influences which information operators need to have mentally available (Mogford 1997;Patrick and Morgan 2010;Stanton et al. 2006;. With 'System B', less information might be part of operators' SA, but this does not mean that operators have better SA with 'System A'. It is the human-machine interaction that determines which information is essential for operators' SA (Van Doorn et al. 2017b). This makes it difficult to predict or understand the effect of automation, such as implemented in the context-dependent adaptable UI, on operator's performance (Edwards et al. 2017;Parasuraman and Riley 1997). SA is also influenced by the design of interfaces and the type of information provisioned by the machine during interaction. Level-3 SA requires information about the situations that trigger and facilitate proactive behavior of operators (Koester 2019). Design of interaction and interfaces has several aspects, such as perceptual and processing proximity, that facilitate integration of information, reduction of attentional distribution and improve operator's perception, understanding and projection (Li et al. 2020).
In our study, the three UI concepts differ in terms of how operators can access information, and additional information is displayed in the integrated and context-dependent adaptable UI. Thus, the information processing strategies applied by the operators might differ among the UIs. Considering the above, different information might be part of operators' required SA. This makes it less straightforward to evaluate the effect of the different UI concepts. While the UI concepts were designed to better support operators SA assessment, more information as part of SA might not result in better task performance. Instead, combined measures of SA, task performance, and workload are required to understand the effect of UI concepts and, thus, to evaluate the added value of additional user interface features.

Methods
The method used in this study needed to support our objective, to gain better insights into the benefits and limitations of the UI design concepts. Therefore, the method used to evaluate the UI concepts needed to quantify the effects in terms that are meaningful for practice. It is not useful to test the effects of the UIs in extreme situations that will never occur in practice. Besides, the measured values should be meaningful. For traffic management operation, it is relevant to evaluate task performance in terms of speed, accuracy, and order. Effects not only needed to be statistically significant, but also meaningful in practice. For example, a difference in speed of task performance of a second is not meaningful, while a difference of minutes is very relevant in incidental situations. Furthermore, the measures are needed to be able to capture differences in operators' SA, workload, and task performance that occur due to differences between the UI concepts. Therefore, a good understanding of the UI concepts was required for selecting the method of measurement.
In our previous work, we presented three UI concepts to better support N-ONM SA and task performance: (1) a coherent UI, (2) an integrated UI, and (3) a context-dependent adaptable user interface (Van Doorn et al. 2017a). The coherent UI concept is the concept that is closest to current practice. In current practice, however, there is not yet a uniform UI concept for N-ONM operators. Each traffic management control room uses different workplaces and partly different information systems. In this study, we aimed to measure the differences between the concepts. For this purpose, it was important that all other variables, such as the workplace design and design aesthetics, were the same in our research set-up. The only differences between the UIs to be compared were the features of the concepts. It, however, was impossible to present a current N-ONM UI on the same hardware set-up and with the same design aesthetics as the three newly developed UI concepts. Besides, features, such as using the same UI interactions and style guides, everywhere in the UI are part of what distinguishes a coherent UI from current N-ONM UIs. We, therefore, did not include the current UI in our comparison. Instead, the coherent UI was used as best practice for current system set-up, developed in a way that we could measure the effects of the other concepts. The other two concepts were based on this coherent UI, with extra features implemented to overcome identified deficiencies. See Table 1. Paragraph 3.1 provides an overview of the coherent UI, which also formed the basis of the two other UI concepts. Paragraph 3.2 describes the features implemented to create an integrated UI. The features implemented to create a context-dependent adaptable user interface are explained in paragraph 3.3.

Coherent user interface
The implemented coherent UI consisted of six user interface windows, which were coherent in use of colors, buttons, and menu structures, as well as consistent in interactions. For example, double-mouse-click was used in all windows to display more information about the clicked information element. The different windows all used the same data source in case they presented the same information. For example, all windows used the same vessel information database. If this information was adapted by the user in one window, then this information also changed in the other windows. In terms of features, the coherency of the UI was summarized as feature 1; 1. UI windows together form a coherent whole (logical, consistent, orderly, and harmonious). The total coherent UI concept consisted of the following windows (see Fig. 1): 1. Area of Focus window with static geographic information. 2. Area of Focus window with dynamic vessel traffic information. 3. Information overview window, which listed all available information elements. This window contained a tab per information cluster (vessel traffic information, nautical object information, event information, hydro-meteo information, etc.). 4. Information detail window, which provided an overview of all detailed information about one object of interest (one vessel, one nautical object, one event, one hydro-meteo location, etc.). 5. Area of Control window which displayed the entire area under control of the operator. 6. Notices-window, which displayed the top priority notices relevant for the specific operator role.

Integrated user interface
The implemented integrated UI was a coherent UI with three extra features, see also Table 1 and Fig. 2. Present all geographic information that is needed for the same task(s) in the same map Two different types of N-ONM tasks required geographic information presentation. One set of tasks was related to handling local traffic management events, such as incidents. This required detailed information about the area of focus. Another set of tasks was related to corridor management, which required an overview of the entire corridor, or area of control. For both sets of tasks, operators needed both static geographic information and dynamic vessel and event information. The integrated UI, therefore, consisted of two windows containing a geographical information system (GIS), instead of the three that were present in the coherent UI. The area of focus displayed detailed information about both static and dynamic elements, such as anchorage type, vessel course, and event type. The area of control window only displayed location information of most elements. Only of events, the area of control window also displayed detailed information, as this information was needed for both sets of tasks. This was in contrast to the coherent UI, where event information was only displayed in the 'Notices-window', 'Information overview window', and 'Information detail window'. Support filtering of vessel information by human operators Operators were able to filter the information that was displayed in the vessel information overview window by selecting a location in the corridor. Only vessels that would pass this location then were displayed, in order of projected arrival at this location. The selected location was also displayed in both the area of control and area of focus map.
Interactions between UI windows and visualized relations between windows and elements The following interactions between UI windows were implemented: (1) Highlight location of object (vessel, lock, event, hydro-meteo station, etc.) on both maps by clicking on this object in the information overview window. (2) Open-detail window of object (vessel, lock, event, hydro-meteo station, etc.) by double-clicking this object on the map.  1. UI windows together form a coherent whole (logical, consistent, orderly, and harmonious) X X X 2. Present all geographic information that is needed for the same task(s) in the same map X X 3. Support filtering of vessel information by human operators X X 4. Interactions between UI windows and visualized relations between windows/elements X X 5. Context-dependently show relevant location in extra Area of Focus map X 6. Automatically show available alternative routes in case of obstruction on main route X 7. Context-dependently show traffic prognoses information if traffic intensity exceeds limit X Type, status, and location of events/notifications were also visible on both maps.

Context-dependent adaptable user interface
The implemented context-dependent adaptable UI was an integrated UI with three extra features, see also Table 1 and Fig. 3.
Context-dependently show relevant location in extra area of focus map The system assessed context to automatically display an extra area of focus window. The coordinates of the center of the map visualized in this window were the coordinates of the event. If there were multiple events, then the event type (priority) and event start time determined which coordinates were taken as the center of the map. In our research set-up, this feature was implemented as a 'Wizard of OZ' method (Green and Wei-Haas 1985). This means that the participants believed that it was the system who opened this window, but actually it was the test leader that opened the extra Area of Focus window. Since the events were part of the script, and not initiated by the participants, the test leader could do so without the need to understand user' actions.
Automatically show available alternative routes in case of obstruction on main route If there was no obstruction of the main route, then all waterways were visualized in the color blue. If there was an obstruction on the main route, then the main route was visualized in grey blue and if at that moment there is no obstruction on an alternative route, then this alternative route was visualized in violet. Since the participants influenced which routes were available, we programmed the system to automatically carry out this feature. Thus, this feature did not depend on accurate understanding of the test leader of participants' actions.
Context-dependently show traffic prognoses information if traffic intensity exceeds limit The system constantly showed a simple bar with prognoses information below the area of control map. Only in cases of traffic intensities that would hinder traffic flows, the system automatically also displayed more detailed prognosis information as a layer on top of the waterways in the area of control window. Since the participants influenced prognosis information and prognosis information was complex to calculate, we programmed the system to automatically carry out this feature. Thus, this feature did not depend on accurate human calculations and understanding of the test leader of participants' actions.

Relation between user interface features and required measurements
Only feature 1, the coherency of information content and presentation in all UI windows, was specifically designed to better support operators' Level 1 SA. This feature was present in all UI concepts. Consequently, no differences between operators' Level 1 SA were expected. Accessing information, however, differed between the coherent UI and the other two interfaces. This could influence information processing strategies and, therefore, SA knowledge. Measures were needed to evaluate whether this affected operators Level 1 SA. We assumed that features 2, 3, 4, 5, and 6 would support operators in gaining Level 2 SA, as these features visualized relation between information elements. These features were not implemented in the coherent UI. Features 5 and 6 were only implemented in the context-dependent adaptable UI. Level 1 and Level 2 SA are required for gaining Level-3 SA. Additionally, we assumed that feature 7 that showed prognosis information, only implemented in the context-dependent adaptable UI, would support gaining Level-3 SA. We assumed that the features 2, 3 and 4 of an integrated UI, which were implemented in the integrated UI and context-dependent adaptable UI, would help operators to more quickly access information that is required to gain SA about an incidental situation. We assumed that increase of speed of gaining SA would also result in quicker task performance and more support for gaining SA would result in more accurate task performance. To understand the effects of the UIs on operators' SA and task performance, it was necessary to reflect on the intricate relations between workload, SA, and task performance.

Test environment and scenarios
The three UI concepts were all implemented in a nautical traffic management workplace simulator, which consisted of an operator desk, a test leader desk, and an observant desk, see Fig. 4. The simulator software logged all operator's actions. Communication was logged by both the test leader and an observant. They could log foreseen communication by clicking items in a script. Unforeseen communication was logged as typed text. Three realistic challenging traffic management scenarios were developed together with four SMEs. We instructed the SMEs to aim for highly similar scenarios in terms of structure, duration, traffic intensity and level of difficulty. The content, however, differed, see Table 3. Each scenario included communication to handle the events which were part of that scenario. Communication was imitated by a SME using scripts. Additionally, each scenario included communication scripts for the test leaders to initiate questions from skippers who were not involved in those events. The simulator software controlled when and which script needed to be activated.

Participants
Twenty traffic management operators were randomly selected to participate in the experiment. Data from one participant were not available because of errors made by the test leader. Four subject-matter experts (SMEs) were involved as test leader, responsible for imitating communication using scripts. One SME, however, had only limited training prior to the experiments and only participated once. Data from this experiment were also excluded. Data from four participants only included test results for experiments with the coherent and integrated UI, because of bugs in the simulator system. Counterbalancing required six orders of treatment; therefore, the number of participants had to be a multiple of six. Consequently, the effect of the context-dependent adaptable UI was evaluated using a dataset of twelve operators, while the difference between the coherent UI and integrated UI was evaluated using a dataset of eighteen operators.
The majority of the involved operators were highly experienced and had prior experience as steersman and/or skipper, see Table 2. This is consistent with the entire population of N-ONM operators working in the Netherlands.

Procedure
Prior to the experiments, participants were sent a description of the research background, including which tasks were part of the experiment. This information was repeated at the beginning of the experiments. Participants read and signed the informed consent form and filled in a survey about their work experience. After that, the three UIs were explained to the participants and they completed a 10-min tutorial scenario for each UI. The UIs were referred to as UI1 (coherent UI), UI2 (integrated UI) and UI3 (context-dependent adaptable UI). The participants then performed the N-ONM tasks in the three traffic management scenarios, in a counterbalanced manner. Counterbalancing was used both to ensure that each scenario was equally often played with each UI concepts, and that the order of UI use was evenly distributed among the experiments. Each scenario took approximately 1 h and was followed by a short break.

Measurements
As explained in Sect. 2 and paragraph 3.4, we needed to measure operators' SA, task performance, and workload to understand the differences between the three UI concepts. Measures were based upon the assumptions about the effects of the different UI concepts. The most commonly used and properly validated SA measurement technique is the Situation Awareness Global Assessment Technique (SAGAT) (Endsley 2000). A limitation of SAGAT is that it is a freezeprobe technique; it requires freezing a situation and blank system displays. It, therefore, is advised to use successful techniques in parallel, such as including performance measures . Therefore, in this experiment, we measured what information was part of operators' SA knowledge during freeze probes, and besides measured the speed of gaining SA. Additionally, we looked at speed of task performance and accuracy of task performance. In our analysis, we evaluated if there is a significant relation between the used UI and the (1) execution of required actions, (2) the speed of executing required actions, (3) the accuracy of executed required actions, and (4) the order in which required actions were executed. As such, we measured both operators' reported SA as well as objective SA data.
Finally, we aimed to gain insight into whether operators' workload was too high or too low, as this would influence operators' SA and task performance. The performance measures (1) execution of required actions, and (2) speed of executing required actions already provide some insight  into operators' workload. Measuring the amount of tasks executed within a defined amount of time, however, does not specify whether workload was too high or too low. This in general is true. Most workload measures are suitable to compare the effect of experiment conditions on operators' workload, but do not provide proper insight into whether the measured workload was too high or too low. We, therefore, compared the observed workload scores with ranges and percentile ranks found in similar studies. Because no scores of other studies concerning nautical traffic management have been found, we compared our scores to the scores reported by Grier (2015). Grier evaluated the outcomes of the commonly used (Raw) NASA task Load Index (RTLX). Her analysis of 1173 reported workload scores in 237 publications showed that 80% of the reported scores are between 26.08 and 68.00. Of those task environments which were taken into consideration by Grier, process control is most relevant when comparing our workload scores. For 38 process control test cases, the reported percentile ranks were: 25th: 31.91, median: 42.00, and 75th: 51.83 (Grier 2015). In our study, RTLX (Hart 2006) was used to measure subjective workload at three moments in each scenario, see Table 3. This method requires participants to respond to six questions about their workload. Since our operators were all native Dutch speakers, we translated the questions to Dutch. SAGAT was used to measure the quality of operators' SA at two moments (see Table 3) in each scenario. The SAGAT included queries about perception of data (Level 1 SA), comprehension of meaning (Level 2 SA) and projection of the near future (Level-3 SA). These queries were developed together with the four SMEs after analysis of operators' required SA. The SMEs unanimously agreed on the desired answers for each freeze. Table 4 lists the SA queries that were used. SAGAT suggests that a portion of the SA queries may be randomly selected and asked each time if it may be impossible to query subjects about all SA requirements in a given stop due to time constraints (Endsley 2000). Trials with SMEs showed that a freeze with all queries took less than 5 min, which is short enough to allow subjects to access SA information without memory decay (Endsley 1995). To be able to collect sufficient SAGAT data, we, therefore, asked all queries in every freeze.
Operators required to perform two actions in their UI to gain SA in case of an incident with one or more vessels involved: (1) they needed to search for the vessel(s) on their area of focus map, and (2) they had to open the vessels' detail information window. The data logged by the simulator system were used to calculate how quickly operators carried out these actions. An operator was assumed to have identified the vessel involved in an incident when the operator for at least 4 s did not adapt to the location and/or zoom level of the area of focus window while the vessels' location was displayed in the area of focus window with a zoom level that allowed to read the vessels' name.
For each scenario, we defined which actions were required for accurate task performance. The SME, observer, and simulator software logged execution of required actions and the speed of actions. These actions included communication with stakeholders, such as skippers, emergency services, and the officer of duty. Additionally, operators needed to send notices to skippers through VHF radio, and they had to activate and release traffic measures using their computer system. In each scenario, operators additionally had to answer questions of skippers who were not involved in the incident. In each scenario, an equal amount of questions required perception of data (related to Level 1 SA), comprehension of meaning (related to Level 2 SA), or projection of the near future (related to Level-3 SA).

Data analysis
For data analysis, we first evaluated whether (and how big) there was an effect of the UIs on operators' SA, task performance, and workload. Accuracy of SA knowledge was evaluated for each SAGAT query separately, rather than combining queries to evaluate SA Level 1, Level 2 and Level 3. Effects of the UIs can be different on each query and combined queries are likely to reduce sensitivity of the measures (Endsley 2000). To evaluate the effect of the UIs, the effect size was calculated. Data were analyzed using a within-subject design. The data could not be considered normally distributed due to the relatively small sample size. Two within-subject tests are commonly used for testing differences between conditions in human factors research if the assumption of normally distributed data is violated; Friedman's ANOVA is used for more than two categories and Wilcoxon signed-rank test is used for two categories (Willages 2007;Field 2009). Friedman's ANOVA only shows whether there is difference between the tested conditions, but does not show where this difference occurs. For that purpose, a post hoc analysis is required. Wilcoxon signedranks test is commonly used as post hoc analysis for Friedman's ANOVA. Since our sample size (n = 18) is large relative to the population (N = 60), it is needed to apply a correction to the formulas used to compute standard error (SE). This correction is called the finite population correction (FPC), which is calculated by FPC = √((N − n)/(N − 1)) (Ramachandran and Tsokos 2009). The standard error must be corrected by multiplying it with FPC. To calculate the significance of the test statistic (T), Wilcoxon signed-rank test looks at the mean (₸) and standard error (SE₸) by the formula Z = (T − ₸)/ SE₸ (Field 2009). To apply FPC in case of Wilcoxon signedrank test, therefore, means that the test statistic Z needs to be divided by FPC. The formula used to calculate Friedman's ANOVA test statistic does not include standard error. Consequently, it is not possible to correct Friedman's test statistic with FPC. Therefore, we used Wilcoxon signed-rank test only in our analysis.
Pearson's correlation r = Z/√N is commonly used as an effect size for Wilcoxon signed-rank tests. Here, Z is the test statistics as defined by the formula above and N is the number of observations. Cohen (1988;1992) gives guidelines for evaluating effect sizes for Wilcoxon signed-ranks test; an effect size between 0.10 and 0.29 is considered a small effect. An effect size between 0.30 and 0.49 is considered to be a medium effect. An effect size of 0.50 or more is considered a large effect.
If an effect was found, then the second question was: how likely is it that there is a true effect in the entire population of N-ONM operators? In line with common practice, we considered the effect statistically significant if p ≤ 0.05. In 1 Click on the map to enter the location of all current events. Provide a short description for each event 2 Which of the following vessel types are the vessel type of vessels involved in an incident? 3 Which of the following names are the names of vessels involved in an incident? 4 Which of the following cargoes are the cargo of the vessels involved in an incident? 5 Which of the following names are of vessels with wounded persons on board? 6 Which of the following names are of vessels leaking fuel or cargo or that make water? 7 Which of the following locks are currently obstructed, or to a limited degree available 8 For which of the following locks do skippers over an hour need to take into account that there will be extra crowds and possible longer delays as a result of blockages or restrictions elsewhere on the waterway? 9 Which of the following service vessels is currently the closest to an incident? 10 How long will it take for the closest service vessel to be on site of the incident? 11 Which of the following vessels need to take into account that there are obstructions on their current route? 12 Which of the following restrictions apply to a motor cargo (length 85.00 m, width 9.60 m, height 7.90 m, depth 1.30 m) that is currently at lock Weurt, when she wants to arrive at the Port of Amsterdam as quickly as possible? 13 Which of the following routes is best advised to a motor tanker (no cones, length 109.00 m., width 11.40 m., height 6.00 m., depth 2.25 m.) which plans to depart in one hour from the Port of Rotterdam towards Enschede? 14 Which of the following routes is best advised to a container vessel (length 135.00 m., width 17.40 m., height 10.30 m., depth 2.10 m.) that plans to depart in one hour from the Port of Rotterdam towards Duisburg in Germany?
cases where an effect is found, but this effect cannot be considered significant, then we cannot be sure at the 95% level that what we see is not due to a random fluctuation. It can be that there indeed is an effect, but than our sample was too small for statistically significant results.

Speed of gaining situation awareness
Speed of gaining SA was assumed to be influenced by features 2, 3, and 4. These were features of an integrated UI, and thus not implemented in the coherent UI. With an integrated UI (UI 2), operators were significantly quicker in identifying the involved vessels in the area of focus window than when using a coherent UI (UI 1) (effect size = − 0.34 and p = 0.04). The difference in speed is not only statistically significant, but with a difference of up to minutes, also significant in terms of relevance for N-ONM tasks, see Table 5. The same effect was expected when comparing the coherent UI (UI1) with the context-dependent adaptable UI. Our results, however, do not show a significant effect when comparing these interfaces. This could be due to the small sample size (n = 12).
With an integrated UI, operators were not significantly quicker in opening the detail information window of vessels involved in an incident (effect size = − 0.22 and p = 0.19). With a context-dependent adaptable UI, operators even seemed slower in opening this window compared to when using a coherent UI (effect size = − 0.38 and p = 0.03). Operators, however, did significantly more often opened a detail information window with a context-dependent adaptable UI instead of a coherent UI (effect size = − 0.34 and p = 0.05). An incident can be handled without opening this window. Instead, operators can ask the skipper about this information. SMEs, however, agreed that opening the detail information window is the quickest and most accurate way to access this information. Besides, evaluation of the communication scripts revealed that those operators that did not open this window did not interrogate the skippers about this information. Indeed, several operators mentioned during the evaluation that they forgot to use the feature that allowed them to quickly open a detail information window by clicking the element of interest in the area of focus window, while they did consider it useful or very useful. They expected to commonly use this feature once they are used to it. A 10-min tutorial might have been too little to change the way in which they searched for detail information.

Accuracy of situation awareness knowledge
Accuracy of SA knowledge was evaluated for each SAGAT query separately, rather than combining queries to evaluate SA Level 1, Level 2 and Level 3. Effects of the UIs can be different on each query and combined queries are likely to reduce sensitivity of the measures (Endsley 2000). SAGAT query 11 was significantly more often answered correctly when operators used a coherent UI instead of an integrated UI (effect size = − 0.52 and p = 0.00). A similar trend was found when comparing the coherent UI with the context-dependent adaptable UI (effect size = − 0.31 and p = 0.06). There was no significant difference in how well operators answered the other SAGAT queries when comparing the coherent UI and integrated UI. When comparing the coherent UI with the context-dependent adaptable UI, results show that for several queries, operators more often answered correctly when using a coherent UI: query 3 (effect size = − 0.42 and p = 0.02), query 4 (effect size = − 0.52 and p = 0.01), query 12 (effect size = − 0.33 and p = 0.05), and query 13 (effect size = − 0.38 and p = 0.03).
In Scenario A (collision near Houten), several operators reported the wrong location of the collision. Incorrect understanding of the location of an incident has major impact: service vessels, the officer of duty, and emergency services are sent to the wrong location, traffic measures are wrongly placed, and skippers get incorrect advice. Since this occurred in just one scenario, our data are not sufficient to find statistical differences between UIs in how often this occurred. The trend, however, is serious enough to be mentioned. With a coherent UI, 33% of the operators thought that the incident took place at different waterway section. With an integrated and context-dependent adaptable UI, 17% of the operators made the same mistake. SMEs reported that operators working with a coherent UI were not able to identify their mistake. With an integrated or context-dependent adaptable UI, operators were able to identify their mistake when the officer of duty arrived at the wrong location and contacted the operator.

Execution of required actions
Wilcoxon signed-rank tests showed a difference in which of the required actions operators were more likely to execute depending on which UI concept was used. Operators were more likely to report an incident with a marine VHF radio when using an integrated UI instead of a coherent UI (effect size = − 0.37 and p = 0.03). This trend was the same, although not significant, when comparing the context-dependent adaptable UI with the coherent UI (effect size = − 0.24 and p = 0.12). However, the opposite was found for sending out notices to skippers using their traffic management information system. Operators were more likely to send out a notice to skippers using a coherent UI instead of an integrated UI (effect size = − 0.34 and p = 0.04) or context-dependent adaptable UI (effect size = − 0.42 and p = 0.02). When looking at the total amount of required actions that were executed, no significant difference was found, using a coherent UI instead of an integrated UI (effect size = − 0.15 and p = 0.37) or context-dependent adaptable UI (effect size = − 0.13 and p = 0.26).

Speed of executing required actions
Of all required operators' actions, only a significant difference in speed of executing required actions was found for communication to priority stakeholders. Wilcoxon signedrank test shows that operators are up to minutes quicker (median = 154 s quicker) in speed of communication with priority stakeholders when using an integrated UI instead of a coherent UI (effect size = − 0.44 and p = 0.01). A small effect was found when comparing the coherent UI with the context-dependent adaptable UI, but the data cannot confirm that this effect is not due to a random fluctuation (effect size = 0.19, p = 0.18). This result might be due to the small sample size (n = 12) in combination with an extreme outlier (speed = 1528) in the data of an operator using the contextdependent adaptable UI, see Table 6. Due to the already small sample size (n = 12) and the need of counterbalancing, we were not able to repeat this analysis after removing the extreme outliers. An evaluation of only six experiments would not be meaningful.

Accuracy of executing required actions
The data analysis shows that with all UI prototypes, most operators were able to correctly answer the skipper's questions related to Level 1 SA and Level 2 SA, see Table 7. Apparently, all UIs sufficiently supported answering these questions. Several operators, however, were not able to correctly answer the skippers' questions related to Level-3 SA. The analysis shows a medium and significant effect of the used UI on accuracy in answering questions related to Level-3 SA in favor of UI3 (effect size = − 0.33, p = 0.05) compared to UI1. Although the data showed a similar trend when comparing UI1 with UI2, no significant difference was found (effect size = − 0.13, p = 0.44).

Accuracy of order of task execution
Operators significantly more often execute the necessary actions in the required order when using an integrated UI instead of a coherent UI (effect size = − 0.32, p = 0.03). The same trend, although not significant, is found when comparing the coherent UI with the context-dependent adaptable UI (effect size = − 0.25, p = 0.12).

Workload
Based on the results of the Wilcoxon signed-rank test, we can conclude that the UI prototypes do not differ in their impact on operators' workload. The workload measures are all at the lower end of the range found by Grier (2015), see Table 8. There was no statistically significant difference in measured workload between the different UIs.

Discussion
The objective of this study was to evaluate the effects of a coherent UI, an integrated UI, and a context-dependent adaptable UI on operators' SA, task performance, and workload. The theoretical framework provided in this paper showed that the concepts of SA, task performance, and workload are intricately intertwined. Evaluation and reflections on our findings, therefore, require to consider the relations between results. When using a coherent UI, operators were (1) slower in gaining SA, (2) slower in communication with priority stakeholders, and (3) less likely to execute necessary actions in the required order compared to using an integrated UI or context-dependent adaptable UI. On the other hand, the results showed that operators had more information as part of their SA when using a coherent UI instead of an integrated UI or context-dependent adaptable UI. No significant difference was found in how likely operators were to execute the required tasks. There also was no significant difference found when comparing operators' workload.
In evaluating the relations between these results, we should consider that a too high or too low mental workload could negatively influence operators' SA or task performance. In case of our study, this could have explained our findings if the workload of operators' working with a coherent UI was all right, while the workload with the other two UI was either too high or too low. Adding support to an already highly automated system should be done with caution as under continuous, non-interrupted conditions, the change detection for SA support can result in higher operator workload (van der Kleij et al. 2018). Since no significant difference in workload was found, this apparently was not the case.
Another possible explanation that followed from Sect. 2 is that operators used a different information processing strategy when using a coherent UI compared to the other two UI concepts. Indeed, information access with a coherent UI was more difficult and time-consuming than with the other two interfaces. This makes it plausible that operators were more likely to use memory-based information processing when using the coherent UI, while they used a display-based information processing strategy with the other two interfaces. This line of reasoning is confirmed by our findings. More information was part of operators' SA when using a coherent UI, but this did not result in better task performance. The measures used in this study, however, were selected to study the effects of concepts and not to provide a thorough overview of causes of these effects. Further research is needed to verify if it is indeed the difference in information processing that causes the opposite effects on operators' SA and task performance. Such research requires additional measures. For example Langer et al. (2017) proposed to combine electrophysiology and eye tracking as resource for investigation of information processing.
While clear differences were found between the effects of a coherent UI and the other two UIs, only one aspect distinguishes the results of the integrated UI from the effects of the context-dependent adaptable UI. Operators were more accurate in answering skippers' questions related to Level-3 SA when using a context-dependent adaptable UI instead of one of the other two interfaces. This was not due to the fact that operators could rely on display-based information processing when answering these questions when using the context-dependent adaptable UI. The information required to answer these questions was not literally shown in either of the UIs. However, the context-dependent adaptable UI, in contrast to the other UIs, did display prognosis information and information about available routes. This can be considered information about the situation that triggers and facilitates proactive behavior of operators. It is possible that the prognosis information and information about available routes acted as early warnings, indicating that skippers may contact the operator with questions about the situation. This insight is important when considering the design of contextdependent adaptable UIs. In our case, the context-dependent adaptable UI included only simple visualizations to indicate changes. Several operators indicated that they would prefer more elaborate forms of prognosis information and information about alternative routes. However, previous research also indicates that complex change detection for SA support can result in higher operator workload (van der Kleij et al. 2018). It, therefore, is questionable whether more elaborated context-dependent information visualization would result in even better support for operators' SA, workload, and task performance.

Conclusion
In our case of three UI concepts for N-ONM tasks, we conclude that the difference between a coherent UI and an integrated UI is sufficiently significant to conclude that an integrated UI better supports operators' SA and task performance. The largest effects were found in relation to speed of task performance, especially speed of communication. Consolidating the results of this study reveals that the differences in effects of the coherent UI compared to the integrated UI and context-dependent adaptable UI is most likely due to a difference in information processing strategy applied by the operators. In this case study, the coherent UI more likely resulted in memory-based information processing, while the integrated UI and context-dependent adaptable UI resulted in more display-based information processing. Displaybased information processing resulted in better task performance compared to memory-based information processing. When comparing the effects of the context-dependent adaptable UI with the effects of the other interfaces, our study showed that simple UI features that trigger and facilitate proactive behavior of operators result in better Level-3 SA. Although the features did not provide rich information to the operators, their presence did result in more accurate answers provided by the operators in cases of skippers' questions related to Level-3 SA without causing negative effects on operators' workload. This shows that automated support not necessarily needs to be complex or rich in information to have a positive effect.
Generalizing these findings suggests that approaches aiming at designing integrated UIs to support display-based information processing for SA support are most promising in dynamic task environments, where operators require SA for time-constrained decisions and actions. Context-dependent adaptable UI features that trigger and facilitate proactive behavior of operators showed to be an advantage for tasks in which operators require Level-3 SA.