Advertisement

Human-Intelligent Systems Integration

, Volume 1, Issue 1, pp 43–51 | Cite as

Relationships between physiological signals and stress levels in the case of automated technology failure

  • Celeste Branstrom
  • Heejin Jeong
  • Jaehyun Park
  • Byung-Cheol Lee
  • Jangwoon ParkEmail author
Research Article

Abstract

Although successful automation can bring abundance to people’s lives, the prolonged use of unreliable automation causes negative impacts on users. This study aims to examine how prolonged use of an unreliable auto-proofreading system affects users’ trust levels and physiological responses. Nineteen native English speakers participated in tasks that correct grammatical errors in each of the 20 sentences in reliable and unreliable proofreading conditions. During the tasks, the participants’ electrodermal activities (EDA) were recorded and their perceived trust in the proofreading system was evaluated. As the unreliable auto-proofreading system worked improperly, perceived trust decreased gradually, and a noticeably increasing pattern of EDA signals was observed. In contrast, perceived trust increased gradually, and a stable or a decreasing pattern of EDA signals were observed in the reliable auto-proofreading system. Prolonged use of an unreliable system results in aggravating anxiety, causing an increase in distrust and EDA signals. The findings of this study provide empirical data that can be used for designing a fail-safe feature of automation by minimizing a user’s anxiety level.

Keywords

Automation Trust in automation Physiological measurement Human-automation interaction Physiological psychology 

1 Introduction

Technological advances have paved the way for the incorporation of many systems of automation in daily life. The automation of various technologies facilitates the efficient completion of tasks and, therefore, results in higher levels of user reliance. When systems work well, users gain trust in automation and have less workload and lower stress levels. For example, Crocoll and Coury (1990) reported that when a reliable decision aiding information is provided by an automated system, the time needed to identify an aircraft is reduced effectively. Dzindolet et al. (2003) asserted that the growth of successful automation builds trust and enhances reliance on automated technology. Merritt (2011) found that when using a reliable automated X-ray system, a subject’s reliance on the automated system intensifies with the amount of trust in the system. Users are more effective at work when they can rely on a trustworthy automated system compared with when they use a manual or an untrustworthy system.

Despite the practical benefits of reliable automation, putting too much trust in an unreliable automation system without noticing the malfunctions for a long period can result in complacency and negatively affect users and their work. For example, Crocoll and Coury (1990) reported that when people relied heavily on an ineffective automated system, misidentification heightened substantially and task performance diminished. Rice (2009) found that false alarms and misses caused by unreliable automation had negative effects on user trust. Sethumadhavan (2009) revealed that relying on unreliable automation for air traffic controllers caused an increase in the time needed to determine a potential airplane crash. These types of mistakes and unreliability cause a higher cognitive workload and result in mental stress for users (Jeong et al. 2018b), causing work efficiency to decrease (Jeong et al. 2017).

To design a trustworthy and fail-safe automation system, it is important to consider the effects of these failures on users. In other words, if one can detect a user’s stress level or emotional changes, owing to malfunctioning automation, through objective measures, then a fail-safe system could be activated to protect a user from a dangerous situation, such as missed control in self-driving mode. Despite the perceptible benefits of identifying the effects of automation malfunction on mental stress using an objective measure, such as physiological responses, there has been limited research on this topic. To the researchers’ best knowledge, there has been a lack of study on the relationships between perceived trust and physiological response to unreliable automation to understand human behavior in malfunctioning automation.

Many studies have demonstrated strong relationships between mental stress and physiological responses. Seiler et al. (2007) reported that the sympathetic nervous system becomes activated when a person is under stress, triggering one’s “fight or flight response” and causing a rise in various physiological responses, such as heart rate and electrodermal activity (EDA). Healey and Picard (2005) have shown that heart rate and skin conductivity have the greatest correspondence to mental stress levels during a driving test. Sierra et al. (2011) found that skin conductivity and heart rate have surged drastically during a stressful situation and can be used to determine if a person is under stress using 3-to-5 s data. Therefore, it can be assumed that EDA could be a good signal to identify user’s stress during automation malfunctioning.

The objective of this study is to understand how users’ perceived trust and physiological responses, specifically the EDA signal, will be affected during tasks while using reliable and unreliable automation. Since a suggestion feature in an automated system gives options to a user to support his/her decision or action, it is commonly found in a higher degree of automation (DOA). However, research is limited to quantify effects of DOA in an unreliable automated system. It is important to understand the effects of DOA in malfunctioning automation in terms of designing a fail-safe feature. Therefore, a suggestion was considered a factor in this study, even though we have fewer samples in each tested session. Therefore, the present study was designed to identify the effects of two main factors (system reliability and suggestion) on perceived trust and physiological responses. In this study, an auto-proofreading system, which is similar to the AutoCorrect feature in Microsoft Word, was selected as a simple automation system.

2 Methods

2.1 Participants

Nineteen native English speakers (11 females, 8 males) participated in the experiment, with an age range of 18–82 years (mean = 33.6 years old; SD = 18.0). Most participants had at least 2 years of experience of using AutoCorrect in Microsoft Word. This research complied with the American Psychological Association Code of Ethics.

2.2 Apparatus

For the experiment, a program was developed by Visual Studio C# (Visual Studio 2015, Microsoft Co., USA). The program provided four different auto-proofreading sessions (i.e., sessions A, B, C, and D) to the participants, as shown in Fig. 1. The detailed information regarding the program is described in Jeong et al. (2017):
  • Session A ran a reliable auto-proofreading condition that highlighted a grammatical error with an underline and did not provide a suggestion (word).

  • Session B ran the same reliable auto-proofreading condition and provided a correct suggestion.

  • Session C ran an unreliable auto-proofreading condition that highlighted a correct word with an underline and did not provide a suggestion.

  • Session D ran an unreliable auto-proofreading condition that highlighted a correct word with an underline and provided an incorrect suggestion.

Fig. 1

Four different auto-proofreading sessions provided by the developed program: (a) reliable without seggestion, (b) reliable with a correct suggestion, (c) unreliable without suggestion, and (d) unreliable with an incorrect suggestion

Sentences for the proofreading tasks were selected systematically from online sentence completion test sets (http://www.majortests.com) of well-known standardized tests, such as Scholastic Aptitude Test (SAT), for the easy level, and Graduate Record Examinations (GRE), for the difficult level. A total of 34 sentences, comprising 17 each for the easy and difficult level, were chosen based on their readability scores, which were measured by Readability Test Tool (www.webpagefx.com/tools/read-able/). Among the selected 34 sentences, four sentences were used for a training session; 10, for a manual proofreading session; and 20, for each of the four sessions (sessions A, B, C, and D).

The 13.5-in.-screen laptop (Q524UQ, AsusTek Computer Inc., USA) was used for the experiment with a 1920 × 1080 screen resolution. The program was centered on the screen. The font type and size were Times New Roman and approximately between 14 and 16 points, respectively.

2.3 Measures

For an objective measure, participants’ EDA signals were measured using the Empatica wristband (E4, Empatica Inc., USA) with a sampling rate of 4 Hz throughout the proofreading tasks. The room temperature was controlled at around 22 °C to block an effect of temperature on the skin conductivity. The present study measured a participant’s EDA signal to objectively and quantitatively infer the level of anxiety in the reliable and unreliable automation. Note that an increase in skin conductance is strongly associated with cognitive/emotional stimuli, such as stress or anxiety. The present study specifically focuses on measuring a level of anxiety using the EDA signal. For a subjective measure, perceived trust was evaluated using an online questionnaire with a 21-point Likert scale (very low, − 10; neutral, 0; very high, + 10). The definition of the perceived trust was designed in this study to be an expectation of the auto-correction system working as expected (for more information for measuring trust in automation, see Chien et al. 2014; Jeong et al. 2018a; Jian et al. 2000). Since trust is a variable between a human and automation, depending on a context, the measured trust in this study is valid only in auto-correction tasks. The questionnaire was developed to efficiently and effectively collect perceived trust from each participant without changing any environmental testing setup. For example, the participants were able to evaluate the perceived trust in the tested conditions by using the same laptop without having any delay. Twenty-one-point interval scale system was adopted for evaluating perceived trust in this study since it has better variabilities than 5- or 7-point scale.

2.4 Experimental procedure

The experiment was conducted in three stages (i.e., preparation, practice, and main experiment). In the preparation stage, the purpose and procedure of the experiment were explained to the participant, after which a written informed consent was obtained using procedures approved by the Texas A&M University—Corpus Christi Institutional Review Board (human subjects research protocol #59-17). Each participant wore the Empatica E4 wristband on his/her wrist to measure EDA signal throughout the experiment. A 2-min rest period was conducted before starting the practice stage. During the experiment, a moderator recorded the starting time of each proofreading task which allows us to synchronize the tasks with measured EDA signals.

In the practice stage, each participant conducted the training session to become more familiar with the provided auto-proofreading system. During the training, the participant was asked to complete the proofreading tasks quickly and correctly. When the participant corrected an error, he/she was asked to click the “Next” button at the bottom right of the page, and then the next sentence would appear in the program. To increase the stress levels during the proofreading tasks, each sentence had to be corrected within 20 s. If the sentence was not completed within the time limit, the program would move automatically to the next sentence. The program showed the remaining time available to complete the sentence in seconds, the number of corrected answers among the 20 sentences, and the seconds remaining audibly ticked as the participant attempted to complete the task. Next, after a break lasting a couple of minutes, the manual proofreading session for the 10 sentences was conducted, where there was no automated proofreading system. After the manual proofreading session, the participant had a 2-min rest period before starting the main experiment.

In the main experiment stage, the participant started a pre-determined session, which could be a reliable or unreliable auto-proofreading system. Each of the participants was randomly assigned to one of the four sessions. The 19 participants were distributed to conduct sessions A (5), B (5), C (4), and D (5). In each session, the participant was asked to complete a set of five sentences as quickly and correctly as possible, and then evaluate their perceived trust in that system. The participant was asked to complete a total of 20 sentences, randomly separated into four sequential sets; perceived trust was measured at the end of each set. A short break period was included between the sets to observe a change of physiological response. After completing the four sequential sets, the participant was allowed to take a break. Lastly, their participation was compensated.

3 Results

3.1 Electrodermal activities in the auto-proofreading tasks

Figure 2 shows raw EDA signals for two participants, namely, subjects #7 and #8, measured in sessions D (unreliable system with wrong suggestion) and B (reliable system with a correct suggestion), respectively. Both signals showed a similar increasing pattern in the training and manual proofreading sessions. However, during the main experimental tasks, the EDA signals measured in session D grew gradually, whereas in session B, either stayed relatively stable or decreased.
Fig. 2

EDA signals measured in the reliable vs. unreliable auto-proofreading sessions

Table 1 shows the normalized EDA signals measured from all the participants for each of the four sessions. To reduce individual differences in the magnitude of raw EDA signals, the measured EDA signals were normalized by employing a z-score standardization method. After the normalization, the normalized signals were plotted to examine visually a pattern in each of the four sessions. In addition, a linear regression equation for the sequential datasets was developed, and the model was displayed on the graph with predictions which is a dotted line. In session A, three participants (#2, #9, and #16) showed an increasing pattern with positive regression coefficients (4.751 × 10−4, 6.894 × 10−4, and 4.050 × 10−4, respectively); one, a decreasing pattern (#1); and another, a fluctuating pattern (#18). In session B, one participant (#13) displayed a slightly increasing pattern; three (#10, #12, and #17), a decreasing pattern with negative regression coefficients (− 1.313 × 10−3, − 1.500 × 10−3, and − 5.830 × 10−4); and one (#8), a fluctuating pattern. In session C, three participants (#3, #14, and #15) exhibited an increasing pattern with positive regression coefficients (4.051 × 10−4, 8.291 × 10−4, and 4.487 × 10−4, respectively) and one (#4), a fluctuating one. In session D, four participants (#5, #7, #11, and #19) indicated an increasing pattern with positive regression coefficients (2.474 × 10−4, 3.789 × 10−4, 1.811 × 10−4, and 6.800 × 10−4, respectively), and one (#6), a fluctuating pattern.
Table 1

Normalized EDA signals with a trend line in the auto-proofreading. Bold line indicates the signals measured in four sequential sets of proofreading tasks. Trend line indicates predictions by each of the regression equations (the abscissa is sampled; the ordinate is normalized EDA signal)

3.2 Perceived trust in the auto-proofreading tasks

Figure 3 shows the mean and standard deviation of trust measured with the four auto-proofreading sessions. On average, session B (reliable auto-proofreading system with suggestion) scored the highest trust level over the course of the experiment (mean ± SD = 9.4 ± 0.7) among the four sessions; perceived trust increased gradually by sets completed. Although the only difference between sessions A (reliable auto-proofreading system without suggestion) and B (reliable auto-proofreading system with a correct suggestion) was whether a correct word was suggested, the result showed a trust level that was more than twice lower in session A (mean ± SD = 3.4 ± 1.0) than in session B (mean ± SD = 9.4 ± 0.7), and the difference was significant (p < 0.001). Sessions C (unreliable system without suggestion; mean ± SD = − 5.1 ± 2.2) and D (unreliable system with an incorrect suggestion; mean ± SD = − 5.0 ± 1.1) scored the lowest trust level and the difference was not significant (p = 0.679).
Fig. 3

Mean ± SE of trust in the four sequential sets for each of the auto-proofreading sessions

Table 2 shows the result of analysis of variance (ANOVA) on the perceived trust. General linear model procedure was performed three-way mixed ANOVA for the unbalanced data. The factors were included two between-subjects factors: system reliability (reliable vs. unreliable) and suggestion (with suggestion vs. without suggestion), and one within-subject factor: time (four sequential sets). As a result, only, the system reliability had a significant effect (p < 0.001) on the perceived trust at a significant level of 0.05. Even though the interaction between the system reliability and the suggestion was not significant, the interaction plot (Fig. 4) shows the effect of suggestion in the reliable auto-proofreading system. For example, in the unreliable auto-proofreading system, the wrong suggestions did not affect the trust; on the other hand, in the reliable system, the correct suggestions substantially elevated the trust.
Table 2

Summary of the three-way ANOVA

Source

Adj. SS

Adj. MS

F

p

Reliability (R)

2448.99

2448.99

35.79*

< 0.001

Suggestion (G)

175.82

175.82

2.57

0.130

R × G

163.12

163.12

2.38

0.143

S/RG

1026.49

68.43

  

Time (T)

4.34

1.45

0.18

0.908

R × T

13.42

4.47

0.56

0.642

G × T

60.33

20.11

2.53

0.069

R × G × T

33.36

11.12

1.40

0.256

T × S/RG

357.76

7.95

  

Total

4283.63

   

SS, sum of squares; MS, mean squares; Adj. SS, type III SS. *p < 0.05

Fig. 4

Interaction plot of mean trust for system reliability vs. suggestion

3.3 Linear relationship between the perceived trust and the EDA pattern

To identify the relationship between the perceived and EDA signals, correlation analysis was conducted. Figure 5 shows a scatter plot of the negative correlation between the mean perceived trust and the regression coefficients of EDA signals (see Table 1) for all 19 participants. Pearson’s correlation coefficient (r) between the perceived trust and regression coefficient of EDA signals was calculated as − 0.416 (p = 0.077) which means if the perceived trust was decreased, then the EDA signal was increased and vice versa.
Fig. 5

Negative correlation (r = − 0.416) between the perceived trust and the calculated regression coefficients

4 Discussion

The negative effects of prolonged use of an unreliable auto-proofreading system on trust and anxiety level were observed. In the reliable auto-proofreading system, the perceived trust levels were increased while performing the tasks, and the EDA signals indicated stable or even decreasing patterns at the same time. In contrast, in the unreliable auto-proofreading system, the perceived trust levels decreased gradually, and many participants’ EDA signals exhibited an increasing pattern over the course of the experiment. As the EDA signal is known as the most sensitive indicator of anxiety levels (Epstein and Roupenian 1970; Naveteur and Baque 1987), it can be concluded that the prolonged use of an unreliable auto-proofreading system could aggravate gradually the anxiety levels, causing an increase in EDA signals.

The negative correlations (r = − 0.416) between the EDA signals and user trust in the auto-proofreading systems were observed. When the system worked properly, the EDA signals revealed a stable or even a decreasing pattern compared with training and manual proofreading sessions, and perceived trust displayed an increasing pattern as an increase of usage time. In contrast, when the system worked improperly, the EDA signals showed an increasing pattern, and perceived trust exhibited a decreasing pattern. This could be caused by a decrease or an increase in anxiety as the system continually worked properly or improperly, respectively. Note that this result is similar to the corresponding results from Park et al. (2018), such as significant Pearson’s correlation coefficients between EDA slope and mental demand (r = 0.477, p = 0.039), effort (r = 0.428, p = 0.068), performance (r = − 0.500, p = 0.029), and frustration (r = 0.474, p = 0.040).

When the auto-proofreading system worked reliably, the degree of the auto-proofreading system (whether a correct suggestion was provided) had a significant effect on the perceived trust and the physiological responses. The high DOA (providing a correct suggestion in this study) can increase perceived trust and stabilize EDA signals compared with the low DOA (without suggestion). In addition, most of the EDA signals measured in the high DOA, in the reliable system, shows decreasing patterns over the course of the testing (three negative regressions coefficients), which differs from the other three conditions (all with three or four positive regression coefficients). This may result from that users are assumed to have strong confidence (less anxiety) when the suggested words match their expectations, a system that could, ultimately, elevates the user trust. Therefore, we can conclude that a high degree of automation, in a reliable auto-proofreading system, can elevate user trust, as well as a confidence level.

There were limitations in recruiting a homogeneous group of participants in terms of their vocabulary levels. The participants in this study represented a variety of age, as well as education levels, ranging from earning a high school diploma to a graduate degree. A future study could include a variety of different vocabulary levels. To recruit homogeneous participants, we will carefully select participants in the future by considering their ages and educational levels to minimize potential uncertainty from participants. Specific populations of people could be all grouped together to verify whether trends exist within a specific group. A few participants did not show rapid EDA level changes, whereas others indicated great changes in EDA levels along with higher stress ratings. Many of the participants who had recently been in a school/academic setting often showed more drastic changes in EDA levels, whereas those who had been out of school for a certain period displayed less drastic changes in EDA levels. This finding could be attributed to a certain trend in which there is little change in EDA levels or unexpected spikes and dips. A number of the participants were unfamiliar with the vocabulary used in this study or self-described themselves as poor in grammar and spelling. This may have been among the factors leading to certain participants having low levels of EDA change during the experiment; they may not have put much pressure on doing well while conducting the experiment.

By detecting early on user trust or distrust level on a system using bio-signal, a user can be protected from negative impacts of complacency in automation. Automation helps many parts of one’s daily lives; however, if automation malfunctions for a long period, people with complacency can panic. When a failure of automation cannot be detected itself, a user’s bio-signal can be an indicator to detect system instability in human-automation interaction. If the anxiety level can be estimated by interpreting the EDA signals obtained from the user in an unreliable automation system, a fail-safe feature of the automation can be activated by triggering the alarm or automatic system shutdown. In addition, another physiological measure, such as heart rate, can be considered. Healey and Picard (2005) demonstrated that although certain participants lacked reliable EDA data for determining stress, they had reliable data for other physiological responses, such as heart rate. Different physiological changes could be tested to consider whether people show anxiety levels differently. Further research is needed to quantify the relationships between bio-signal and anxiety level in a reliable and an unreliable automation system. In this way, user trust or distrust on a system can be estimated quantitatively by using a user’s bio-signal.

5 Conclusion

The effects of system reliability on perceived trust and physiological response have been quantified. In this study, an auto-proofreading system was chosen as a simple automated system and the system was manipulated in terms of reliability (reliable vs. unreliable auto-proofreading system) and suggestion feature (with suggestion vs. without suggestion) to identify the relationship between perceived trust and physiological response (electrodermal activities, EDA) in automation. Main findings of this study can be listed as follows: First, the system reliability in the auto-proofreading system has a significant effect (p < 0.001) on the perceived trust and a substantial effect on EDA signals. Second, the suggestion feature can substantially elevate perceived trust if the auto-proofreading system worked reliably; On the other hand, if the system worked unreliably, the suggestion feature did not affect the perceived trust. Third, the EDA signals were highly dependent on system’s reliability. Lastly, for the first time, the negative correlation (r = − 0.416) between the perceived trust and the EDA signals in the auto-proofreading systems was quantified. The method in this study is applicable to other types of automation to understand the relationships between trust and physiological response.

Notes

Acknowledgements

This work was supported by Incheon National University (International Cooperative) Research Grant in 2017 (Grant No.: 20170303).

Compliance with ethical standards

This research complied with the American Psychological Association Code of Ethics. A written informed consent was obtained using procedures approved by the Texas A&M University—Corpus Christi Institutional Review Board (human subjects research protocol #59-17)

References

  1. Chien SY, Semnani-Azad Z, Lewis M, Sycara K (2014) Towards the development of an inter-cultural scale to measure trust in automation. In International conference on cross-cultural design, 35–46Google Scholar
  2. Crocoll W, Coury B (1990) Status or recommendation: selecting the type of information for decision aiding. Proceedings of the Human Factors Society 34th Annual Meeting.  https://doi.org/10.1177/154193129003401922
  3. Dzindolet M, Peterson S, Pomranky R, Pierce L, Beck H (2003) The role of trust in automation reliance. Int J Comput Stud 58:697–718CrossRefGoogle Scholar
  4. Epstein S, Roupenian A (1970) Heart rate and skin conductance during experimentally induced anxiety: the effect of uncertainty about receiving a noxious stimulus. J Pers Soc Psychol 16(1):20–28CrossRefGoogle Scholar
  5. Healey J, Picard R (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166CrossRefGoogle Scholar
  6. Jeong H, Park J, Park J, Lee BC (2017) Inconsistent work performance in automation, can we measure trust in automation? Int Robot Autom J 3(6):00075Google Scholar
  7. Jeong H, Park J, Park J, Pham T, & Lee BC (2018a) Analysis of trust in automation survey instruments using semantic network analysis. In International conference on applied human factors and ergonomics, 9–18Google Scholar
  8. Jeong H, Park J, Park J, Lee BC (2018b) Effects of automation type on human performance in proofreading tasks. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62(1), 1140–1140Google Scholar
  9. Jian JY, Bisantz AM, Drury CG (2000) Foundations for an empirically determined scale of trust in automated systems. Int J Cogn Ergon 4(1):53–71CrossRefGoogle Scholar
  10. Merritt S (2011) Affective processes in human-automation interactions. Hum Factors 53(4):356–370CrossRefGoogle Scholar
  11. Naveteur J, Baque EFI (1987) Individual differences in electrodermal activity as a function of subjects’ anxiety. Personal Individ Differ 8(5):615–626CrossRefGoogle Scholar
  12. Park J, Jeong H, Park J, Lee B C (2018) Relationships between cognitive workload and physiological response under reliable and unreliable automation. In International conference on applied human factors and ergonomics, 3–8Google Scholar
  13. Rice S (2009) Examining single- and multiple-process theories of trust in automation. J Gen Psychol 136(3):303–319CrossRefGoogle Scholar
  14. Seiler S, Haugen O, Kuffel E (2007) Autonomic recovery after exercise in trained athletes: intensity and duration effects. Med Sci Sports Exerc 39:1366–1373CrossRefGoogle Scholar
  15. Sierra A, Ávila CS, Casanova JG, del Pozo GB (2011) A stress-detection system based on physiological signals and fuzzy logic. IEEE Trans Ind Electron 58(10):4857–4865CrossRefGoogle Scholar
  16. Sethumadhavan A (2009) Effects of automation types on air traffic controller situation awareness and performance. In Proceedings of the Human Factors and Ergonomics Society 53rd Annual MeetingGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Biological & Agricultural EngineeringTexas A&M UniversityCollege StationUSA
  2. 2.Department of Industrial and Operations EngineeringUniversity of MichiganAnn ArborUSA
  3. 3.Department of Industrial and Management EngineeringIncheon National UniversityIncheonSouth Korea
  4. 4.Department of EngineeringTexas A&M University – Corpus ChristiCorpus ChristiUSA

Personalised recommendations