1 Introduction: including users’ mental state

In modern industry, many aspects of operation have moved into the control room. When combined with increasing degrees of automation, users must keep track of more information and are responsible for an increasing number of systems. Woods et al. (2002) highlight this issue of data overload, describing the issue of users being presented with an abundance of data without being able to process and act on the data in an efficient manner. In addition, control room activities are shifting toward monitoring systems, ensuring that operation is within normal limits. In the event of anomalies, the user must be able to perform appropriate actions to ensure safe and efficient operation. Bainbridge (1983) highlights a challenge of automation—namely, that of users mainly monitoring systems instead of actively controlling them. This can be problematic when action must be taken, because users are less familiar with active control and might not have a feel for the process that they are controlling. When designing such systems, the user is usually modeled as having stable, rational behavior—i.e., responding to external events in a predictable manner (Balters and Steinert 2015). However, it has been shown that humans do not behave in stable, rational patterns (Kahneman and Tversky 1979, 1984). Behavior can be influenced in part by the mental state of users. We define mental state as the combination of affective state and experienced workload. A positive affect has been shown to influence decision making by increasing risk averseness (Isen 2001; Isen and Reeve 2005). Hart and Staveland (1988) claim that experienced workload influences problem-solving strategies. If a user experiences a high workload, he might start shedding tasks or adopt a lower criterion for performance.

This knowledge—namely, the notion of behavior being influenced by mental state—has been adopted by the fields of engineering design and human–computer interaction. Affective computing (Picard 1997) aims to improve the interaction between users and computers by including knowledge of the user’s mental state in the design and behavior of the system. Jiao et al. (2017) highlight the importance of integrating affective and cognitive needs when designing the user experience. A promising method for measuring mental state is through physiology. Many studies explore this correlation in strictly controlled experiments (Hjortskov et al. 2004; McDuff et al. 2014; Nasoz et al. 2004; Zhai and Barreto 2006; Zhou et al. 2011). In an engineering context, users are more prone to noise, which might influence the correlation between physiology and mental state (Balters and Steinert 2015). There are several studies that investigate more complex tasks, such as driving (Healey and Picard 2005), aviation (Nixon and Charles 2017; Wilson 2002), nuclear power plants (Gao et al. 2013), and ship navigation (Cohen et al. 2015).

Ship navigation is of special interest for the authors, especially because there is increasing activity in the field of remote and autonomous shipping. The YARA Birkeland will be one of the first autonomous ships in operation and will be operating on the Norwegian coast as of 2019 with a small crew for supervision, and then autonomously as of 2020. We believe that this is only the beginning of a shift in the maritime industry toward remote and autonomous vessels. This shift means that there will be people operating and monitoring ships from onshore control rooms. There is little research on how this type of operation will affect the people performing the tasks, which in turn may have unknown consequences on safety and efficiency of operation.

The aim of this paper is to investigate the mental state aspect of people who remotely operate and monitor ships. More specifically, we set out to understand how mental state and physiology change between different operational situations, and if mental state can be estimated from physiological responses. This can then form a basis for understanding how mental state is related to performance by monitoring physiological state in future experiments, which again can help designers when developing and evaluating new concepts.

We pursue this goal through an experiment investigating how physiological changes are related to changes in mental state in the context of ship bridges. The experiment was conducted with consumer ship simulation software. 31 participants from a student population were tasked to navigate a large ship in two scenarios: on open sea and in a narrow harbor. Stimuli were designed and verified in cooperation with an industry partner, one of the world’s largest suppliers of ship bridge systems. Our contacts were professional ship simulator instructors with extensive experience operating large ships. The stimuli were created to be as realistic as possible—i.e., to replicate demands and actions encountered in a real context under normal operation. Tasks were consciously selected to represent the majority of activities on large ships, not extreme events that occur seldom, if ever. This will most likely result in smaller effect sizes in changes of mental state and physiology compared to extreme situations. Data were collected through surveys and physiology sensors. Participants rated their own affective state and workload through survey questions. Physiological state was evaluated through electrocardiography (ECG) and electrodermal activity (EDA). The results showed significant changes in mental and physiological state between the two scenarios. Concepts of stress and workload were correlated with EDA and elements of heart rate variability (HRV). These results indicate that users may have changing demands in their interfaces and system behavior as their affective state changes and that the experienced stress and workload of users are related to EDA and HRV.

2 Background: Mental state

In this work, we are interested in the mental state of users. In our definition of mental state, we include the constructs of affective state and workload. Both are of interest to the human–computer interaction community, because these concepts might influence users’ behavior and how they perceive their situations. Below we present the constructs of affective state and workload, along with common subjective (Table 1) and physiological (Table 2) measurement tools for the respective constructs.

Table 1 Subjective evaluation of mental state
Table 2 Studies using physiological measurements to evaluate mental state

2.1 Affective state

We use the definition of Balters and Steinert (2015), which describes affect, or emotions, as a set of variables that might moderate behavior. We define affective state as the manifestation of the concept affect—i.e., the value of these variables at a specific time for a specific individual.

There are two main schools on how to describe affect in the field of psychology. The first considers affect as a set of discrete categories (Tomkins 1962; Ekman and Friesen 1969, 1971; Ekman 1992). Tomkins and McCarter (1964) describe eight categories of emotions, naming them at medium and high levels. They are interest–excitement, enjoyment–joy, surprise–startle, distress–anguish, fear–terror, shame–humiliation, contempt–disgust, and anger–rage. Ekman et al. (1972) define the six basic emotions as happiness, surprise, anger, disgust, and fear. Later, Ekman expanded the number of emotions to amusement, anger, contempt, disgust, embarrassment, excitement, fear, guilt, pride in achievement, relief, sadness/distress, satisfaction, sensory pleasure, and shame (Ekman Ekman 1999). The second school of thought considers affect as a combination of multiple dimensions (Russell 1980; Russell and Barrett 1999; Thayer 1967; Watson and Tellegen 1985). Watson and Tellegen describe affect as a combination of the dimensions of positive affect and negative affect (Watson et al. 1988; Watson and Tellegen 1985). Positive affect is related to the extent a person feels enthusiastic, active, and alert. Negative affect is related to subjective feelings of distress and unpleasurable engagement. Positive affect and negative affect are measured through the Positive and Negative Affect Schedule (PANAS) scales, asking people to rate two ten-item mood scales assessing positive and negative affect on a five-point scale. Thayer (1967, 1978, 1986) explains affect as the two dimensions of energetic arousal and tense arousal, consisting of the four factors general activation (energy), deactivation-sleep (tiredness), high activation (tension), and general deactivation (calmness). This is measured through the activation–deactivation adjective checklist by rating 20–25 activation-descriptive adjectives on a four-point scale. Russell et al. (Russel et al. 1989; Russell 1979, 1980) propose a model of affect with the main dimensions arousal and pleasure–displeasure in which different affective states can be described as combinations of these two dimensions—e.g., stress being the combination of displeasure and high arousal. Levels of arousal and pleasure can be assessed either through the single-item affect grid (Russel et al. 1989) or through rating arousal and pleasure along separate dimensions. Russell (1979) shows that Thayer’s dimensions of energetic arousal and tense arousal can be seen as an approximate rotation of arousal and pleasure. Watson and Tellegen’s dimensions can—according to Russel et al. (1989)—be seen as a 45° rotation of the same dimensions.

2.2 Workload

The notion of workload or cognitive load has been used in human factor research in relation to performance. Parasuraman et al. (2008) argue that workload is one of the few constructs that is predictive of both performance in complex human–machine interactions and of the mental state of the operator. Cooper and Harper (1969) define workload as the sum of physical and mental effort and attention required to maintain a given level of performance. When viewing workload as a function of effort, one should consider both the capabilities of operators and their state. This could be how the skill level of operators might influence their effort, as well as their physical and mental state—i.e., tired or stressed. Parasuraman et al. (2008) describe workload as a function of the demand on mental resources in relation to the resources available from the human operator. Hart and Staveland (1988) describe workload as a multidimensional construct describing the cost incurred by a human operator to achieve a particular level of performance. Workload is not an inherent property but emerges from the interaction among task requirements, context, operator skills, behavior, perceptions, and affective state (Hart and Staveland 1988; Sheridan and Stassen 1979; Xie and Salvendy 2000). These definitions of workload are human centered, focusing on the subjective perception of workload. The notion of workload as a subjective experience is supported by Johanssen et al. (1979) and Sheridan (1980). The reason why the subjective experience of workload is important, according to Hart and Staveland (1988), is that this might alter behavior. Should an operator experience a situation as a high workload, they might adopt strategies to mitigate workload and experience distress.

Subjective workload is usually measured through surveys. Commonly used tools are the NASA Task Load Index (TLX) (Hart and Staveland 1988), Subjective Workload Assessment Technique (SWAT) (Reid and Nygren 1988), Modified Cooper–Harper scale (MCH) (Wierwille and Casali 1983), and Overall Workload (OW) (Vidulich and Tsang 1987). TLX and SWAT use multiple dimensions to assess workload, offering better diagnostic properties than one dimension when trying to assess the underlying mechanisms of workload. MCH and OW are unidimensional, providing less detail, but with the advantage of being faster to fill out. In addition, there is evidence that univariate methods have greater sensitivity than multivariate methods when estimating OW (Hendy et al. 1993; Vidulich and Tsang 1987).

2.3 Measuring mental state through physiology

Affective state and workload have been shown to be reflected in physiological responses (Andreassi 2010; Boucsein 2012; Ekman et al. 1983; Levenson 2003; Wilson and Eggemeier 1991). These physiological responses are controlled by the autonomic nervous system (ANS). The ANS is divided into two branches, the sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS). The PNS and SNS work antagonistically to regulate physiological arousal (Appelhans and Luecken 2006). The SNS can be described as responsible for the body’s fight-or-flight reactions, whereas the PNS has been said to be in charge of rest and digest. Sympathetic and parasympathetic activities are expressed in various physiological phenomena, such as heart rate, respiration, EDA, brain activity, muscle tension, pupil dilation, skin temperature, and blood pressure. For an exhaustive overview of the use of physiological measurements in an engineering context, we refer to Balters and Steinert (2015). In Table 2, we show examples of previous work on correlating mental state with physiological changes.

3 Method: experiment investigating mental state in a ship simulator

An experiment was created to investigate how self-reported mental state relates to physiology in large ship navigation settings. Our focus has been to develop an experiment that addresses regularly occurring situations when operating a large ship. This means that stimuli have been designed to replicate reality rather than to elicit specific mental states or reactions. The experiment was conducted in commercial ship simulation software with student participants. For an exhaustive description of the experiment setup and execution, we refer to Dybvik et al. (2018).

4 Stimuli: open sea and harbor

Participants were asked to steer a 200-m-long cruise ship using a commercial ship simulation software in two scenarios. These scenarios were created in cooperation with experienced ship simulator instructors from one of the world’s largest suppliers of ship bridge systems. The stimuli were created to be as realistic as possible—i.e., to replicate demands and actions encountered in a real context under normal operation. It was our goal to create scenarios that represent the majority of activities on large ships, not extreme events that occur seldom, if ever. This decision will likely result in smaller effect sizes in changes of mental state and physiology compared to extreme situations.

The first scenario was designed to replicate the task of sailing on an open sea. This can be described as long periods of time with little activity, mostly spent monitoring systems. Participants were tasked to sail the ship out from port, across an empty body of water, and into a new port. The only additional stimuli given during this scenario was a low-frequency (LF) engine rumble to create a realistic backdrop. The scenario lasted 15 min, but the duration was not communicated to participants in advance to reduce any expectation effects towards the end. When the scenario ended, the ship would be approximately halfway between the two ports.

The second scenario was supposed to recreate tasks associated with harboring. This includes navigating narrow channels and performing secondary tasks related to going to harbor under a time constraint. Participants were instructed to navigate to a berth marked on their map. When the ship started moving, a 10-min visible timer could be seen on the screen, instructing participants to reach their destination before time ran out. The secondary tasks consisted of asking prerecorded questions regarding crew and cargo over the radio at regular intervals. Crew and cargo lists were printed and placed face-down on the table in front of the participants. The lists were turned over by the participants at the beginning of the second scenario after the participants were given instructions to do so on screen. Participants were instructed in advance to give their answers using a handheld walkie-talkie. Questions were repeated after 90 s if no reply was given or upon request from participants. Should participants reach their designated berth, a new destination would be given. This was intended to make it nearly impossible to finish the primary task within the allotted 10 min. Throughout the second scenario, radio chatter and LF engine noise were added to create a realistic backdrop.

5 Data collection: subjective and physiological

42 participants were sampled from an engineering student population. Eleven were excluded because of technical issues or failure to follow instructions. 31 participants were included in the analysis—13 females and 18 males. Ages of the participants ranged from 19 to 33 years (24.0 ± 2.74). To address our research questions, we collected both physiological data and subjectively assessed affective state and workload.

5.1 Subjective measurements

Affective state and workload were assessed through survey questions. We adopted the circumplex model of affect (Russell 1980), evaluating arousal and pleasure, as a framework for affective state. Pilot studies showed that the participants had trouble understanding the concept of arousal. Thus, alertness and awakeness were added to arousal and pleasure in an attempt to triangulate the concept of arousal described by the circumplex model of affect. Because stress is a known term for most people, it was included to see how a subjective rating of arousal and pleasure would relate to perceived levels of stress. According to the circumplex model of affect, stress should show up as a combination of high arousal and displeasure. Workload was assessed in two ways: as a single dimension of OW (Vidulich and Tsang 1987) and through the multidimensional TLX (Hart and Staveland 1988) scheme. These were selected because TLX and OW have been shown to be superior in terms of sensitivity and user acceptance (Hill et al. 1992). OW can provide an overview of subjective workload, whereas TLX gives a more detailed view of which dimensions influence subjective workload in our context. Affective state and overall workload were rated on eleven-point Likert scales. TLX was rated on seven-point Likert scales plus pairwise comparisons of the six dimensions.

5.2 Physiological measurements

Selection of the physiological data type and related sensors to use in the experiment was guided by the feasibility of integration in future products. By this, we mean sensors that could be worn by users without interfering with normal operation—e.g., wireless and comfortable to wear. In this experiment, physiological data were collected through ECG and EDA. Heart rate and HRV can be calculated from ECG. Heart rate and HRV are associated with both sympathetic and parasympathetic nervous system activity. Sympathetic activity tends to increase heart rate and decrease HRV, and vice versa for parasympathetic activity (Appelhans and Luecken 2006; Berntson et al. 1997; Camm et al. 1996). High-frequency (HF) variations (0.04–0.15 Hz) in heart rate are believed to be parasympathetically mediated, whereas LF variations (0.15–0.4 Hz) are considered a product of both parasympathetic and sympathetic activities (Berntson et al. 1997; Camm et al. 1996; Malliani et al. 1991). The normalized frequency components of LF and HF HRV are supposed to assess sympathetic and parasympathetic activity, respectively (Furlan et al. 2000; Pagani et al. 1997).

EDA is influenced only by sympathetic nervous system activity (Boucsein 2012; Dawson et al. 2007). EDA can be divided into phasic and tonic activity. Phasic and tonic activities are, according to Dawson et al. (2007), related to attention and activation, respectively. Phasic activity, or skin conductance response (SCR), is elicited by almost any stimulus that is novel, unexpected, or potentially important (Siddle 1991). Tonic activity, or skin conductance level (SCL), is related to continuous stimuli—e.g., performing a task (Bohlin 1976). ECG data were collected using the Shimmer3 ECG device (Shimmer3 ECG/EMG Unit, 2017) with a sampling rate of 512 Hz. EDA data were collected with the Shimmer3 GSR + device (Shimmer3 GSR + Unit, 2017) with a sampling rate of 128 Hz. Both devices transmitted data wirelessly via Bluetooth to a central computer running iMotions 6.4 (iMotions 2017) for synchronization and storage.

5.3 Procedure

The IMotions (iMotions 2017) software platform was used to present stimuli and collect and synchronize subjective and physiological data. After the participants expressly consented to the experiment, physiology sensors were attached to them, and they were seated in front of a computer screen. The ECG sensor was attached with five leads on the chests of participants per the instructions provided by Shimmer, with the Vx lead in position six. The EDA sensor was attached to the middle part of the index and middle fingers on the left hand. After the sensors were attached, the participants were seated in front of a computer screen in the simulator environment (Fig. 1). The participants were instructed on how the experiment would proceed and told that they should follow instructions given either onscreen or via audio.

Fig. 1
figure 1

Experiment environment (Harbor scenario), both physical and virtual. Physiological sensors are highlighted in the red dashed rectangle, ECG (top) and GSR (bottom) (Dybvik et al. 2018)

Figure 2 shows the experimental procedure as a timeline. First, the participants were asked to fill out a survey on their affective state to provide a baseline. This was followed by information about the ship they were supposed to steer and a video demonstrating the controls of the ship. After the instructions were completed, the first scenario was presented, and the participants were supposed to sail the ship out of harbor and across an open expanse of sea. At the end of the first scenario, surveys on affective state and workload were filled out. This was repeated for the second scenario. After both scenarios were completed and surveys on mental state were filled out, an additional survey on demographics was filled out. This concluded the experiment, and the participants were debriefed and sensors were disconnected.

Fig. 2
figure 2

Experimental procedure

5.4 Analysis: classical statistics and multivariate analysis

Subjective measurements used for analysis were collected after each scenario. Measurements used for analyzing the physiological state were sampled from the last 5 min of each scenario (see Fig. 2). This was intended to avoid carryover effects from previous stimuli.

Heart rate and HRV in the time and frequency domains were calculated from ECG data using Kubios HRV (Tarvainen et al. 2014). The frequency domain of HRV is typically calculated using either fast Fourier transformation or autoregressive modeling (AR). We have selected AR because of its increased robustness and accuracy for shorter periods (Malliani et al. 1991; Montano et al. 2009). EDA data were processed in Ledalab using continuous decomposition analysis (CDA) (Benedek and Kaernbach 2010). CDA has been shown to be more sensitive to peak detection and estimation of tonic activity as opposed to through-to-peak algorithms (Benedek and Kaernbach 2010).

Table 3 provides an overview of all physiological variables included in the analysis. Assumptions of normal distribution, significant outliers, and skewness were evaluated, and statistical tests selected accordingly. The paired-sample t test investigates whether there is a difference in mean values between populations. The Wilcoxon signed-rank test and sign test investigate whether there is a difference in median values between populations. In addition to statistical tests comparing populations, the results were analyzed through correlation tests and multivariate analysis (i.e., principal component analysis (PCA) and partial least-squares regression (PLSR)) to investigate the relationship between different variables. For PCA and PLSR, each variable was mean centered and standardized (1/std.dev. of each variable). Multivariate analysis was performed using the software program (The Unscrambler X 2018).

Table 3 Physiology variables

6 Results

Three outliers were removed from the arousal values owing to inconsistencies in subjective reporting of arousal compared to awakeness and alertness. In these cases, arousal levels were reported much lower than awakeness and alertness, and we assume the nonnative English-speaking participants misinterpreted arousal as sexual arousal. This was a known misunderstanding from pilot studies. The data used for both classical statistics and multivariate analysis can be found in Online Resource 1.

7 Subjective variables: affective state and workload

Table 4 shows the results from statistical tests of subjective variables. Data used for the analysis are based on surveys filled out after each scenario. Participants show significant (p < 0.01) changes in self-reported affective state and workload between the two scenarios, open sea and harbor conditions, for paired samples with the exception of subjective experience of performance (p = 0.694), as previously shown by Dybvik et al. (2018).

Table 4 Subjective variable results

8 Physiological variables: heart rate variability and electrodermal activity

The results from the statistical tests can be seen in Table 5. Heart rate variables such as mean and standard deviation of heart rate show significant (p < 0.05) changes. Significant changes are found in the frequency domain for normalized LF and HF powers, whereas nonnormalized powers do not exhibit this change. The LF to HF ratio of HRV, which is a common metric used to describe physiological arousal, has a positive change from the open sea scenario to the harbor scenario (p = 0.053). Data on electrodermal activity show highly significant changes between the two scenarios for all variables except for the maximum phasic driver (p = 0.041).

Table 5 Physiological variable results

8.1 Relation within subjective variables

In the following two sections, we highlight interesting correlations from the analysis. The full correlation table can be found in Online Resource 2. Stress is defined according to Russell (1980) as a combination of high arousal and displeasure—i.e., the opposite of feeling pleasant. Our results show that arousal has a stronger correlation to stress (r = 0.47, n = 62, p < 0.001) than displeasure (r = 0.25, n = 62, p = 0.01). For this context, we interpret arousal as the main factor of stress. In a professional context such as large ship navigation, this makes sense, because we expect affective changes to be more linked to arousal—i.e., energy level—than feelings of displeasure. We find that workload, both OW and TLX, is associated with levels of stress, displeasure, and arousal, in that order.

The subjective variables were also analyzed using principal component analysis (PCA). Values are given as cross-validated results followed by calibration in parentheses. Cross-validation was performed by leaving one participant out at a time. The results show that the first component accounted for 46% (52%) of the variance in the subjective variables. This component was mainly dominated by workload and stress, as seen in Fig. 3a. The second component, accounting for 16% (18%) of variance, consisted mainly of arousal and displeasure (Fig. 3b). These first two components, explaining 62% (70%) of the total variance in the subjective data, align with Watson and Tellegen’s (1985) model of affect consisting of the dimensions positive and negative affect.

Fig. 3
figure 3

PCA—correlation loadings for subjective variables

What we see from both the statistical tests and the PCA is that the largest changes in mental state are in workload and stress. This could mean that in an environment where participants are supposed to perform professional tasks, emotions—i.e., arousal and displeasure—are less influenced than the more task-related concepts stress and workload. One other explanation for the smaller changes in self-reported affective state could be the semantic understanding of the different concepts, given that our experience reveals that it is more common to discuss concepts of stress and workload in a professional context as opposed to emotions. Evaluating one’s own state of arousal and displeasure could be more challenging than stress and workload for nonnative English speakers owing to unfamiliarity of the concepts.

8.2 Relation between subjective and physiological variables

We find that the variables workload and stress have the strongest correlation to both HRV and EDA variables. Among the HRV variables, the HF peak frequency has the highest correlation to both stress and workload (stress: r = − 0.33, n = 57, p < 0.001; OW: r = − 0.43, n = 57, p < 0.001; TLX: r = − 0.61, n = 57, p < 0.001). The number of skin conductivity responses (nSCRs), tonic level, and raw mean SC signal have the strongest correlation to stress and workload of the EDA variables. Displeasure was found to be correlated with nSCRs (r = 0.30, n = 62, p = 0.02). Awakeness shows correlations to the LF peak of HRV (r = 0.37, n = 57, p = 0.01) and HR standard deviation (r = 0.36, n = 57, p = 0.01). Arousal and alertness were not correlated with any of the physiological variables (r < 0.30). The strongest correlations to mental state were peak frequency in HF HRV and phasic and tonic EDA.

The relation between physiological and subjective variables was also investigated through PLSR (Fig. 4). The values are given as cross-validated results followed by calibration in parentheses. Cross-validation was performed by leaving one participant out at a time. We found that the first component accounts for 13% (19%) of variance in the subjective variables (marked in red) and 20% (29%) in the physiological variables (marked in blue). The second component accounted for 30% (30%) of the variance in the physiological variables but did not contribute to explaining the variance in the subjective variables. Here, the explained variance was − 2% (2%), indicating a small amount of overfitting.

Fig. 4
figure 4

PLSR—correlation loadings for physiological to subjective variables

The first component was dominated by HF peak, EDA variables, mean RR/HR, and to some extent normalized LF and HF components in terms of physiological variables (Fig. 4a, b). The mental state variables in the first component mainly consisted of workload and stress variables (Fig. 4a). The second component consisted of absolute power and time-domain HRV variables (Fig. 4c). The two scenarios (marked in green), cruising on open sea and docking in a narrow harbor, were down-weighted when calculating the factors. By including these variables in the model, we see the tendencies of how the variables changed between the two scenarios. Although the explained variance was relatively low, we found an emerging pattern of how the variables covaried. We observed a decrease in peak frequency HF HRV when workload and stress increased—i.e., HRV moved toward lower frequencies, which are associated with lowered parasympathetic activity, and could be a sign of increased sympathetic activity. This corresponds well to the increase in EDA for increasing levels of workload and stress, which is mediated solely by the SNS. The increase in both phasic and tonic EDA could be explained by the increased number of external stimuli and task demands, which should influence phasic and tonic levels, respectively.

9 Discussion

Our results show significant changes in both mental and physiological states between the two scenarios in the experiment. There is a general increase in arousal, displeasure, stress, and workload in the scenario where participants navigated a large ship in a narrow harbor compared to the scenario where they navigated on open sea. The same increase is found for EDA and partially for HRV. Through PLSR, we find that workload and stress make up the majority of variations in the mental state of participants. Most variation in physiological data along the same component can be found in EDA and the HRV HF band peak frequency. When calculating correlations between mental and physiological state variables, we find that the strongest relations are between the dominant variables found in the PLSR analysis—namely, stress and workload for mental state and EDA and the HF HRV peak for physiological state. These correlation scores range from approximately 0.4 to 0.6. The relation between mental and physiological state variables might be influenced by the following factors: First, physiological responses are highly individual. If given the same stimuli, one participant might show strong changes in EDA, whereas another has little to no change. Second, participants might have different reactions to the stimuli. What is considered stressful for one person might be routine for the other. Third, there might be differences in how individuals rate concepts of affective state and workload, owing either to language barriers or their own understanding of the concept in question. Finally, the analysis in this paper is based on average results from participants, meaning that individual differences are not considered. Although we did not consider individual differences, we found correlations between mental and physiological states. Our interpretation is that when these results can be found despite the aforementioned factors, analysis of multiple data points from individual participants should provide even stronger results.

Changes in HRV LF/HF ratio are similar to the findings of Gao et al. (2013) and Hjortskov et al. (2004), although our LF/HF ratio is higher. McDuff et al. (2014) found a much larger change in the LF/HF ratio, doubling the mean value from approx. 0.5 to 1.0 between the resting and stress conditions. We believe that our large ratio values are due to a lower HF power in our data compared to that of Hjortskov et al. This could indicate parasympathetic withdrawal, which could be an indicator of high baseline physiological arousal. Total power is comparable to the studies of Hjortskov et al. and Gao et al., although the latter report a much lower total power for their low-complexity task. Healey and Picard (2005) show that driver stress level was correlated most strongly to mean skin conductivity and HRV, when compared to respiration rate and electromyography in addition to the aforementioned variables. Out of these, mean skin conductivity had the strongest correlation to their stress metric. Wilson (2002) found EDA and heart rate to be more sensitive to changes in cognitive demand than HRV, which is similar to our results when comparing physiological data to the subjective ratings.

From the experiment setup, we know that the harbor scenario, where higher levels of perceived stress and workload were measured, contained more stimuli. EDA, or more specifically SCR, has been shown to be highly reactive to external stimuli (Siddle 1991). We observe a relative increase in number of SCRs in our data between the open sea and harbor scenarios, corresponding to the known reactivity to external stimuli. At the same time, we observe an increase in the tonic level of EDA as well as stress and workload. We show that there is a relation among the number of stimuli, subjectively assessed mental state, and electrodermal activity in the context of large ship navigation. However, we do not claim causality between the variables, only that they covary.

By designing tasks that replicate normal working conditions onboard a ship, it is our belief that the results of this paper are representative of the real context. However, the results are limited to the relevance of the participants partaking in the experiment. When interpreting the results of this experiment, it is important to keep in mind that the data were collected from a student population performing tasks in an unfamiliar environment, steering a large ship in a simulator. Even if the results are consistent for a student population, we do not know how this relates to the responses of professionals if they were to participate in the same experiment. We believe that students’ reactions to stimuli might be stronger than would be the case for professionals owing to their training and experience in handling the given tasks. Our assumption is that the direction of responses would be similar for students and professionals, although the magnitude might differ. This should be tested in a future experiment.

10 Conclusion: mental state and physiology

In this paper, we have presented an experiment aiming to investigate changes in mental state and physiology between different scenarios and how mental state and physiology are related in the context of large ship navigation. The motivation behind the experiment has been to create a foundation for future work on continuously monitoring users’ mental state in the context of remote operation and monitoring of ships.

The experiment tests the feasibility of using changes in electrodermal activity and heart rate variability to act as a proxy for users’ self-reported mental state. 31 participants from a student population were tasked with navigating a large ship in two scenarios: on open sea and in a narrow harbor. The results show that there are significant changes in the variables used to measure mental state between the two scenarios. We found significant changes in EDA and several variables representing HRV. We found that self-reported stress and workload were correlated with EDA and the peak frequency in the HF band of HRV. Awakeness was found to be positively correlated with the LF band of HRV and the standard deviation of heart rate. No correlation was found between arousal or alertness and physiological variables. Multivariate analysis showed that one component could explain 13% of the variance in mental state and 20% in physiological data. The second component did not contribute to explaining any variance in the mental state of users but did explain 30% of the variability of physiological data. The first component was dominated by workload, stress, EDA, and elements of HRV. The second component was dominated by absolute power- and time-domain heart rate variability.

We draw the following conclusions from our experiment: One, there is a significant difference in variables used to measure mental and physiological states between two regularly occurring scenarios in the context of large ship navigation. As these changes occur, the capacity and reaction pattern of the user change, and the user could have different or changing demands of the user interface and system behavior. Two, elements of mental state are correlated with changes in physiological state. Most prominently, stress and workload covary with EDA and elements of HRV. This finding can serve as a foundation for how to assess changes in mental state of users remotely operating and monitoring ships through measuring changes in physiological state.

We believe our findings to be representative of similar control rooms. To verify this, experiments should be conducted in situ by collecting data on professionals working in their normal environments, such as ship bridges, power plants, airplanes, or air traffic control. Such experiments would be more prone to noise and uncontrollable variables that may influence results. Despite the challenges of noisy data and uncontrolled influencing factors, this would be the real situation we are interested in. Results from such an experiment would have the highest ecological validity, because it is the real situation. For research on how mental and physiological states are related in a control room context, this is the necessary step to find an answer. Should a connection between mental and physiological states be found from in situ experiments, it would be possible to evaluate the mental state of users unobtrusively—i.e., not interfering with their tasks by having to fill out surveys or answer questions. With this information, designers could compare concepts on how they influence the mental state of users, which again may influence task performance. We hope this can be a tool in the designers’ toolkit on how to evaluate designs when working on remote operation of ships.