Keywords

1 Introduction

Humans benefit from emotional interchange as a source of information to adapt and react to external stimuli and navigate their reality. Computers, on the other hand, rely on classification methods to do so. It uses models to calculate and differentiate affective information from other human inputs because of the emotional expressions that emerge through human body responses, language, and behavior changes. The present study is structured to investigate methods for identifying and interpreting variations of physiological responses related to emotional states as a way of improving emotion recognition in interactive systems. The study frame observes emotional responses during interactions with Intelligent Virtual Agents (later IVAs) in a simulated context.

In recent years, increasing investments have introduced IVAs in customer service, education, health care, entertainment, immigration settings, and social media. IVAs encapsulate the embodiment of different interactive channels and establish meaningful communication with humans by enacting emotions, empathy, and social behavior [6]. The social and emotional capabilities displayed by IVAs motivate the users to establish empathy and bonding [5]. Moreover, IVAs’ real-time perception, cognition, and emotional awareness bring novel solutions for human-machine cooperative tasks [21] within social domains.

1.1 Research Goal and Motivation

This study investigates methods for assessing psychophysiological responses to emotional experiences evoked in a simulated IVA interview. Closely, the project reflects on the implications of IVA’s emotion recognition for applications in real-life scenarios, like easing the critical situation of migration management in Europe [7, 8]. Ultimately, the study encourages an adequate use of emotional information and thoughtful development of automated mediators to aid in contexts involving high-stakes decisions that shape people’s life chances [8, 23]. Identifying the asylum seekers’ narratives aspect which proves their situation of fear is essential to grating asylum-seeker protection, as the Refugee Status Determination(RSD) procedure assesses applicant’s “well-founded fear of being persecuted” due to “happenings of major importance” and, for that unable to return to their home countries ([9] - Paragraphs 34, 36, 32–110). Therefore, to automate stages of the RSD assessment, it is critical to explore the design of emotion recognition methodologies to assist automated migration management processes.

1.2 Hypotheses and Research Question

Two hypotheses are formulated to guide the development of the research:

  • H1: People react emotionally to a simulated interview interaction with a virtual agent, even in as-if contexts where they are assigned specific roles (“Imagine that you are -”)

  • H2: Individual human subjects’ responses to the perceived affective information in simulated settings can be identified and related to a set of emotional states based on a combination of quantitative (psycho-physiological) and qualitative (subjective reports) measures of their behavior.

Correspondingly, three research questions are investigated:

  • RQ1: How do people emotionally appraise the context of the interview?

  • RQ2: Can the individual subjective self-reported emotional experience (qualitative data) be associated with features recognized from the same individual’s physiological behavior (quantitative data)?

  • RQ3: Within the context, is it possible to identify certain emotional reactions by looking at the patterns of physiological data?

2 Theoretical Background

Psychology researchers have waged an endless debate since William James (1884) interrogated what emotion is. Nevertheless, there is little consensus around the definitions [11], albeit the diverse answers presented. Damasio’s studies described that humans use the body as a theater for emotions, supporting that the mind is embodied and the experience of emotions is a manifestation of the drives and instincts that help the organism regulate itself and respond to changes in the environment [24]. As Fontaine and colleagues [4] points out, some elements of the emotional experience are uncontroversial:

  • Certain events tend to elicit specific affective responses in organisms.

  • Physiological changes, behavior expressions, behavioral intention, and direction shifts characterize these organism responses.

  • The response patterns produce a conscious feeling of a certain quality in the person’s experience over a certain period.

  • Emotional episodes and the associated experiences are labeled by the experiencing person and by the observers with a specific word or expression.

The appraisal theory of emotion explains that affective states are elicited by people’s subjective evaluation or appraisal of the events’ significance for their well-being and goal achievement. Also, these theories acknowledge that emotions arise from comparing individual needs and external environmental demands. Brave and Nass’s investigation remarks that the knowledge of appraisal theories serves HCI researchers in modeling and predicting users’ emotional states in real-time [10, 25]. Likewise, the appraisal theory of emotions can be mapped to Artificial Intelligence concepts, like belief-desire-intention models of agency [21].

By examining its users’ psycho-physiological cues, computers can sense implicit communication by assessing affective and cognitive states. Emotion recognition is based on automated classifiers trained to identify general emotional states from patterns found in the users’ biosignals, whether explicit (i.e., measurements of facial expression and gestures) or implicit (i.e., electrocardiography or electrodermal activity measurements). Furthermore, it expands the interface’s limited input modalities (i.e., mouse, keyboard, and camera), moving the limits of interaction beyond stated and visible emotion parameters [10]. Stemmler’s studies [12] demonstrated relationships between one emotion and specific physiological changes, showing that, for instance, fear can be characterized by strong cardiovascular and electrodermal activity. Although there is evidence that it is possible to identify certain emotions through measures of bodily changes, the results are confined for multiple reasons. Issues of validity and reliability are often pointed out in research [1, 3, 13].

In general, emotions do not display a universal signature of physiological activity for different people, as emotions arise from a unique and personal experience for each individual, and the relationship between appraisal of a situation and emotional response is, to a great extent, context-bound [10]. Nonetheless, recent studies suggest that to foster the development of the classifiers, combined measurements of psycho-physiological activity during emotional events are suitable strategies for sorting out overlapping signal problems [11, 22]. Moreover, self-assessment results can be correlated to the physiological metrics to classify emotions measured from various parameters [14].

3 Methodology

This study adopts an experimental approach to examine the influence of questions raised by an IVA on participants’ emotions in the context of an automated initial interview in the RSD assessment. The researchers developed a within-subject study consisting of one session in which participants are presented to one hundred questions, as shown in Fig. 1 below, displayed audio-visually by a VA. Precisely, in each session, participants interact with an interview simulation where they encounter a looping video of a female avatar that articulates via voice-over questions regarding practical and personal matters (stimuli). Essentially, each question represents one unique condition. The independent variable in this study is the answer provided to the emotional self-assessment, and the dependent variable is the emotional bodily experience recorded from participants.

Fig. 1.
figure 1

In a single session, the study exposes participants to the different conditions

The display of the female VA on the screen remains constant throughout the session. Although the virtual agent has no adaptive reaction, its display on the simulation is meant to foster the enactment of an IVA presence by emulating the gaze of the computer [16].

3.1 Experiment Design

The researchers created a prototype interactive interview simulation that embeds simultaneous psycho-physiological data collection. At the experimental level, when studying participants’ emotional perception via as-if situations, the narrative supports psychological immersion into the interaction context [26] and furthermore, the fictitious scenario help to situate the participants into the specific context of the research [17]. Moreover, inspired by the research on social appraisal [2, 15], it is assumed that even though participants might answer questions based on their own experience, the simulated interview context still allow them to identify with the asylum seeker’s emotional situation.

In this study, context is presented as a narrative prior to the experiment and later enhanced by the simulation that situates participants into the RSD procedure and invites them to take the role of an asylum seeker. To provide stimuli, the study uses references from the list of Affective Norms for English Words (ANEW) [19] for phrasing the emotionally evocative questions. The questionnaire comprises a hundred closed questions, sixty-five percent of which are assumed to be emotionally evocative. The simulation ultimately intends to benefit from the physiological responses generated over the enunciation of the sensitive questions and does not have follow-up inquiries.

Accordingly, as shown in Fig. 2, a question is first enunciated by the virtual officer and presented as text on the screen. Once that is completed, participants are taken to the self-report screen and asked to select from the wheel the label and intensity of the emotion they experienced. After completing the self-assessment, participants are presented with the answer options. Finally, after answering that question, participants are presented with a new one. Randomization of the questions was not included in the design of the simulation.

Fig. 2.
figure 2

Screenshots illustrating the loop of tasks through which participants labeled the emotional state experienced for each question presented by the virtual officer

As the experiment design establishes no fixed time for completing the simulation, every time a new question is presented to participants, a mark is created in the database to inform the start of the question. The approach is justified for not inducing pressure on participants during the simulation play.

3.2 Procedure

It is worth remarking that the experiment was piloted twice to test out a) the simulation setup and questions structure and b) data collection methods. After fine-tuning the latest adjustments, the experiments were carried out in laboratory settings under uniform conditions. Figure 3 illustrates the experimental procedure designed for the study.

Experiments were conducted during the COVID-19 pandemic; thus, the necessary safety measures stood as part of the procedure. At first, participants answered a BIS/BAS questionnaire [20] to analyze the symmetry of personality in terms of the motivational systems underlying behavior and affect responses. When conforming, participants were considered apt for the experiment. Later, information regarding the experiment, the purpose of the study, and the technical apparatus was provided. Participants were exposed to the contextual narrative upon signing the consent form. Following, the sensors were connected, as it is shown in the Data Collection section below. Before engaging in the simulation, participants were invited to perform a simple breathing exercise. It is noteworthy that, while playing the simulation, participants were alone, and monitored from a nearby room via a camera streaming setting.

3.3 Participants

Seven (7) English-speaking individuals currently residing in Tallinn, Estonia, participated in the study. Participation was voluntary, and the age range of participants was 18–40 years old. Even though the broader context of the study refers to the vulnerable population (refugees), no actual refugee applicant or any other vulnerable group was included in the study sample. Nevertheless, recruitment embraced participants from foreign countries that would be more likely to have a reference of the border control inquiries.

Fig. 3.
figure 3

Stages of the experimental procedure

3.4 Data Collection

The simulation embeds a data collection procedure that synchronizes participants’ physiological data with their emotional self-reports. The goal is to enable the participant to experience and disclose emotional states during the physiological data acquisition without disruption or further interference.

Physiological Data. Participants’ physiological signals are gathered via BITalino Plugged kit, a low-cost biosignal acquisition device [27]. The sampling frequency of the used ECG is 1,000 Hz, as recommended by [28]. EDA and fEMG measurements are collected at the same sampling rate due to using a single BITalino device to collect the three physiological at the same time. The BITalino device was connected to a high-performance computer via Bluetooth. Participants’ physiological information thus is collected from three channels - ECG, EDA, and EMG - and continuously recorded using the compatible software OpenSignalsFootnote 1.

Fig. 4.
figure 4

Electrodes placement for biosignal data collection

Sensors were placed on the clean surface of participants’ skin, following recommendations of reviewed literature [22, 27]. Figure 4 illustrates the configuration used in the experiments. Two electrodes were used to measure EDA, placed on the thenar and hypothenar eminence of the right hand as illustrated in Fig. 4a. For collecting ECG, three electrodes were placed on the participant’s chest, two on both sides of the clavicles and one at the lower left part of the rib cage, as it is illustrated in Fig. 4b. Additionally, facial muscle activity (fEMG) was acquired via three electrodes placed on participants’ face, the reference electrode in the middle of the forehead and the two others on the Corrugator Supercilli and Zygomathicus Major, as in Fig. 4c.

Emotion Self-report. To collect the emotional self-reports of experiment participants on the go, the interface simulation uses a digital tool based on the self-report paradigm proposed by the Geneva Emotion Wheel (GEW) [2]. Self-report paradigms are used to grasp the individual nature of emotional experience, which reflects the integration of mental and bodily changes in the context of particular events. Researchers in emotional labeling point out that even though the results obtained by these paradigms are plausible and interpretable, the statistical analysis is hampered by the abundance of emotion labels collected, making the interpretation complex [2]. This way, embedding the GEW into the simulation allowed the emotional assessment to be done interactively through an adapted version of the tool and provided homogeneity of reports. The choice also benefits the processing and synchronization of physiological data collected through the sensors.

3.5 Preprocessing Data

The multisource dataset required a composite of processing and analysis methods. Constraints faced at this stage caused the removal of the fEMG data from analysis. At first, the ECG and EDA data was downsampled from 1000hz to 100hz to reduce computational overhead. Thereafter, filtering to remove noise from the ECG and EDA data was done automatically as part of the Neurokit2data preprocessing pipeline [18]. Furthermore, we divided the ECG and EDA data into epochs with a duration of 10 s, starting one second before the question presentation and finishing nine seconds after that. Compiling the information into epochs returned a list of numeric identifiers for the physiological reactions regarding each question. The 10 s epoch duration was based on the calculation of the participant’s question average time (Max: 12 min 32 s, Min: 08 min 15 s). Hence, each participant produced 100 data samples, thus giving us a total of 700 data samples. Two features were extracted from ECG data, ECG Rate, and ECG R Peaks. From EDA data, we obtained EDA Phasic and EDA Tonic features. All these were achieved using the Neurokit2 python library [18].

3.6 Analysis

Following the dataset containing the labeled psychophysiological data of the sample (N = 7) was statistically analyzed, focusing on features of the physiological measurements acquired alongside emotional self-reports. Independent t-tests (performed two-sided) were used to draw comparisons between the emotional reports and their specific physiological reactions. A P value of 0.05 was assumed to indicate the statistical significance of the tests. Furthermore, correlation tests were performed to observe the relationship between the physiological data features and the emotion labels indicated by the participants. The statistical analysis was performed using SPSS software (ver. 28.0.1.1).

4 Results

Among the negative valence emotions, the most reported emotion was Disappointment, representing 12.9% (n = 90) of the total answers. Fear was reported in 9.4% (n = 66), followed by Sadness in 9.1% (n = 64) of the answers gathered. Among the positive valence emotions, Relief was the most reported, chosen at 8.0% (n = 56) of the time, followed by Interest, reported at 6.4% (n = 45), and Pride, 4.0% (n = 28). Issues with the self-assessment led the analysis not to include the variations of intensity reported by participants. Manifested differences between the data across participants hampered a separate analysis of the emotional reactions, leading the study to consider the emotions between negative and positive valence.

The analysis has shown that the negative stimuli increased cardiac activity in participants compared to positive stimuli. The mean of the ECG Rate for Positive emotions was M = .6452 (SD = .6760), and for Negative emotions, it was M = .6645 (SD = .7092). However, no significant difference was found in the ECG Rate between the emotions of positive and negative valence (T623 = \(-.330\), p = .741). This way, tracing general conclusions about participants’ emotional arousal through examination of cardiac activity was unattainable.

A similar issue was found in the features of EDA data. Measurements of the EDA Tonic were scattered and not pronounced. The little difference between the EDA Tonic values could indicate that the quick flow of questions had piled up different stimuli and affected the measurements in the tonic component, which is distinguished for happening slowly over time. On the other hand, measurements of the EDA Phasic suggested that negative emotion might have increased electrodermal activity response if compared to the positive reports. The mean of EDA Phasic for Positive emotions was M = −1.5396E−4 (SD = 1.243E−3), while Negative emotions had a mean of M = −4.9404E−4 (SD = 1.859E−3). Additionally, the parametric test revealed a significant difference in the means of EDA Phasic between groups of positive and negative emotions (T601.852 = 2.728, p = .007).

Furthermore, Pearson’s correlation tests were performed to observe the relationship between the EDA and ECG measurements. Results have indicated that ECG Rate and EDA Phasic have a significant, weak negative correlation (r = −.316, p<.001). In contrast, ECG Rate and EDA Tonic have a significant moderate, positive correlation (r = .603, p< .001). Such findings can indicate measures to consider in future studies.

5 Discussion

In an experimental investigation, we showed that our prototype enabled the simultaneous collection of participants’ quantitative and qualitative affective information during a simulated interview. This pilot study allowed us to identify what needs to be iterated for the upcoming trials of this study. Moreover, the process of the study illustrated the complexities entangling psycho-physiological studies in HCI. The emotion elicitation method and designed data collection interface had a positive outcome. However, generalized conclusions regarding associations of self-reported emotional experience and the two features of physiological measurements, EDA and ECG, were not attainable.

The analysis of the ECG and EDA measurements across different emotion reports did not allow the distinction of specific emotions experienced by the participants. Still, the results aligned with the previous studies showing that negative emotions are characterized by strong cardiovascular and electrodermal activity [12]. Besides, the results suggested that combinations of measures of ECG Rate could be used alongside EDA Tonic or EDA Phasic to detect positive and negative emotions.

The interview context was appraised negatively among participants, as Disappointment and Fear were the most reported emotions among the responses. For that, the fictitious scenario was considered enough to situate the participants in the research context and allow them to identify with the emotional situation. Besides, the different sorts of appraisal triggered by the simulation suggested that using words from the ANEW list [19] had contributed to the necessary stimuli and evoked different assessments of the given stimuli.

6 Conclusion

In general, emotion recognition is based on automated classifiers trained to identify general emotional states from patterns found in the users’ biosignals, whether explicit (i.e., from facial expressions and gestures) or implicit (i.e., from biosensors acquisition). Converged tasks are required for achieving accuracy and reliability in such systems [11, 14, 22]. HCI practitioners and designers should take advantage of physiological and affective computing methods to develop accurate emotion recognition models. This study adds to the evidence that simulated settings could be used for eliciting emotional reactions, especially if the content of the simulation is framed through validated emotion elicitation approaches, such as the affective words of the ANEW list [19]. The findings of this pilot pave the way for novel methods that enable the understanding of emotional experiences in simulated interviews. We envision applications of the methods for assessing the user experience’s emotional aspects with other interactive systems and different contexts.

Limitations and Future Studies. We acknowledge that emotion is a subjective context-dependent experience bound by different factors, whereas the individual appraisal of the situation, culture, age, and gender, among other aspects. Thus, the specificity of each individual appraisal and reaction to the emotional stimuli makes the generalizations about the relationships between the perceived affective information and the physiological reaction measurements very complex. Future research should account for the different physiology of people. Moreover, calibrated individual measurements could help to investigate bodily reaction patterns. Adjustments to experiment settings should cover the definition of a fixed length for the stimuli, the review of the simulation content, and the possibility of a less extensive experiment setting. Also, a larger sample size would be required. Refinements could also approach developing the necessary pipelines and testing different data processing methods, feature extraction, and analysis to achieve generalization. Trends among the data could be analyzed in future research by classification and clustering. Future studies would also benefit from using an alternative experimental design to test the influence of the specific aspects of the simulation.

Ethical Implications. Recognizing the emergent risks connected to the automation of refugee status determination assessment is necessary. The novelty of the IVAs may serve as a tool for facilitating the initial screening at the border control stations. Nonetheless, the emotion monitoring for the adaptability of such systems should be made explicit to the users before the interaction. The access to a person’s mental and emotional states may be seen as invasive and place users in a vulnerable position, besides compromising the levels of trust in such interactions. Foremost, the assessment of implicit biosignals should not serve surveillance purposes nor deception detection.