1 Introduction

In a contemporary society, which is consumerist and uses various technologies, products (both devices and software) must be able to guarantee efficiency, satisfaction, comfort, and safety during use in order to meet users’ needs and desires (Cruz 2015; Defeo 2016; Solomon 2019). In this scenario, the study of user experience plays a primary role in product design because it is the result of human–product interaction (Soares 2021).

With the coming of the computer age and the arrival of computational systems, investigations into the interaction of human beings with products and systems grew. One of the emphases was on devices and software/games. Some classic ergonomic studies have evaluated users’ physical responses when they use computer systems (Siu et al. 2009), the cognitive demands in human–computer interaction (Parasuraman 2003; Dehais et al. 2020), and users’ decision-making when playing games (Jacobs and Baker 2002; Gillespie 2002). One tool that these researchers used in common was a self-report data collection technique.

Understanding the variables that influence the use of consumer products would not be possible without users’ participation. However, an issue arising from this has been debated and investigated: Namely, when people express their opinions by evaluating their emotions after using a product or system, it may be that the experiences they report when completing the evaluation do not always correspond to the real experiences that they have felt. Questioning the advantages and effectiveness of self-report has been analyzed in various studies on emotions and usability, for example, Rebelo et al. (2012) and Granato et al. (2018).

These questions have led to uncertainties and new challenges for the study of evaluating users’ emotional experiences, especially with regard to using self-report data collection techniques and to mistrusting their subjective nature (Papousek et al. 2011; Albraikan et al. 2018). Therefore, alternatives are needed to understand users’ real emotional state and opinions.

This article presents a case study in which 13 volunteers used two vehicle driving simulators, one for training and the other for entertainment. Self-report responses about their emotional state were compared with biofeedback responses obtained by EEG and digital thermography (psychophysiology). The results allowed a check to be made on correspondences between what they answered and what their body was really feeling in both situations, with emphasis on the training experiment. Evidence was found of the importance of multimodal studies on emotions felt when using simulators, especially in the experiment using infrared thermography (IRT). This new approach presented great potential for applications in experiments with games and virtual reality.

2 Theoretical contextualization

Using a product goes beyond meeting physical, usage, and operational needs. Cultural and social needs, emotions, and inspirations are also part of this context, and assessing emotions must be considered and evaluated (Netzer et al. 2018). The study of the psychological aspects and of the impact of an emotional experience that a product or a system can provoke in the useris a determining factor in design (Löbach 2001). In addition, it is important to study emotional connections with regard both to acquiring products and to using them (Norman 2004; Soares 2021; Ge et al. 2021).

It is believed that designers greatly desire to develop products that can cause ever stronger, real, and lasting emotions in users and that to achieve this presents great challenges. With game designers, it is no different, especially for those who develop simulators and especially with virtual reality (VR). After all, the more immersive an experience is, the greater the emotional change for the user.

In this context, immersion and presence are key elements. According to Rebelo et al. (2012), Oliveira et al. (2016), and Tori et al. (2018), immersion is the feeling of being inside an environment, not only visually, but also of experiencing other intensities and senses, such as hearing and touch. Moreover, according to the authors, presence corresponds to the ability of a system to effectively detect the user input. Both are related to biological and emotional phenomena.

The use of simulators allows users’ reactions to situations that resemble real life in a virtual representation and a specific situation to be investigated. This type of activity and the pieces of equipment for this aim to simulate real user experiences in an immersive, empathetic, and participatory way (Salen and Zimmerman 2003; Martin and Hanington 2012; Drachen et al. 2013).

This type of interactive experimentation also occurs in virtual reality applications by using computationally generated digital environments (Jerald 2015). Therefore, a VR application can also be considered a simulation. The purpose of using it, the construction of the virtual environment and the quality of this application can make it possible to give an immersive, semi-immersive, and non-immersive experience to users (Rebelo et al. 2012; Tori et al. 2018).

In several studies in which users’ emotions were investigated, various types of data collection techniques were identified. In addition to self-report techniques (e.g., questionnaires), equipment has also been used to assess users’ psychophysiological activity (Zhao et al. 2016; Wang et al. 2017).

Psychophysiological investigations can be defined as those that use physiological signals to study psychological phenomena (Li et al. 2015). They can be considered more reliable due to the impossibility of controlling human consciousness, especially in research studies involving the study of emotional responses (Jenkins et al. 2009). This is because physiological components are involved in the unpredictable reaction of the organism which is prompted by emotions (Lelord and Andre 2002).

Objective, precise, and measurable technological responses are required in this investigative scenario. Neuroimaging techniques meet this demand. For example, show the brain behaves during cognitive activities and when performing tasks. The EEG is considered the best choice for recording information on brain activity due to its characteristics: It is of a non-invasive nature, it emits a continuous signal, it has sensitivity, it captures signals in real-time, and it has excellent temporal resolution (Mehta and Parasuraman 2013; Lecoutre et al. 2015; Zhang et al. 2016). Using EEG in experiments with computational systems, especially games, has enabled changes in several frequency bands of the brain activity of players to be identified (Kivikangas et al. 2011; McMahan et al. 2015; Kerous et al. 2017). Virtual reality is also inserted into this context (Sun et al. 2019; Rogers et al. 2021).

Another technique with excellent potential for evaluating users’ emotional stimuli is digital infrared thermography (Jenkins et al. 2007). By using infrared radiation, temperature and its distribution in a body or object are recorded (Jones 1998). Recognizing emotions by using thermography is a technique that has become practiced by a growing number of researchers over the past two decades (Or and Duffy 2007; Nakanishi and Imai-Matsumura 2008; Barros et al. 2016; Jian et al. 2017).

Although there is no standard model of thermal analysis that enables human emotions to be recognized (Fu and Frasson 2016; Filippini et al. 2020), monitoring the change in temperature of regions of interest (ROI) on the face has allowed correlations with specific emotions and two-dimensional models of emotion to be identified (Merla and Romani 2007; Robinson et al. 2012; Ioannou et al. 2014; Salazar-López et al. 2015; Diaz-Piedra et al. 2019).

As for cognitive aspects, thermography has already been used in multimodal studies that have compared self-report responses (what is said) with physiological responses (what is felt) in studies on the veracity of information given by individuals to public authorities in everyday situations, e.g., at airports (Pavlidis 2004; Warmelink et al. 2011).

Emotion recognition is an important and highly valued investigation in the area of human–computer interaction (HCI) (Dzedzickis et al. 2020). Due to the complexity in studies of the emotions and their great dissipative capacity, the scientific community is looking for alternatives to increase the level of reliability of results. Multimodal analyses can meet this demand as they use several methods for data collection and analysis (Khan and Sharif 2017; Siqueira et al. 2018).

As to using EEG and thermography in investigations into emotions, several studies can be found. Nozawa and Tacano (2009) used musical auditory stimuli to assess the level of arousal. Jenkins et al. (2009) and in their detailed extension of that study (Jenkins and Brown 2014) investigated students in a cognitive task of creating routes on maps of fictional zoos to evaluate stimuli. Sheba et al. (2012) evaluated the stress level of subjects after contact with therapeutic robotic pets. Fu and Frasson (2016) analyzed subjects after watching provocative images using the IAPS (International Affective Picture System). Zeng et al. (2020) carried out a systematic literature review on studies involving emotion, EEG, and infrared thermography.

In the current scenario of constant and growing processes of computerization, digitalization, and gamification of the most diverse procedures, systems, services, and human activities in society, investigation of the emotions in the interaction with digital systems has become fundamental. As an example, we cite the multimodal research by Yamagishi et al. (2011), who assessed the level of attractiveness of using mobile phones. In this scenario, conducting multimodal studies such as those mentioned above has become innovatory.

Currently, digital games, with or without VR technology, have become important fields of research on human emotion for several reasons: The need to use mixed methods of investigation given the diversity of elements involved (Lieberoth and Roepstorff 2015), the need for such investigations to require various competencies to be deployed collaboratively (Kampa 2020; Pearson 2020), and because of the potential for using multimodal systems (Mishra et al. 2016; Kotsia et al. 2016).

Electronic games have long met demands that go beyond the desire for entertainment. Serious games offer good examples of this because they have the potential to develop users’ perceptual, cognitive, and motor skills (Connolly et al. 2012; Jerzak and Rebelo 2014). For instance, Boyle et al. (2016), Tahmosybayat et al. (2018), and Neves (2022) draw attention to the successful use of electronic games to improve the quality of life of people with health problems.

Several studies have made use of racing games. This type of game simulates the need for the user to learn and/or improve the driving skills needed in nearly all social contexts related to human mobility. A vehicle driving simulator, whether used for entertainment or training, immerses users in situations with which they empathize because what they experience, is real to them (Salen and Zimmerman 2003; Martin and Hanington 2012). Several multimodal studies have already made use of this category of games (Tognetti et al. 2010; Uriarte et al. 2015; Lee and Bae 2019), including studies on virtual reality (Hartfiel and Stark 2021).

In view of this context, the main objective of the present study was to examine whether users’ self-reported responses about their emotional state when using driving simulators show any emotional valence concordance with their psychophysiological reactions. By putting users in a fun situation and in another one for training purposes, it was possible to check in which one there was a greater concordance between the answers reported in the questionnaires and the psychophysiological responses obtained by the users’ brain signals and the temperature of their faces.

3 Methodology

This paper presents the results of investigating the emotional responses of a group of 13 participants (five women and eight men, whose average age was 19 years and 3 months). Initial data collection was carried out with 30 volunteers, but it was necessary to discard the results of 17 of these subjects for the following reasons: (1) The quality of some thermograms of the ROIs did not present sufficient contrast to ensure thermal change; (2) some subjects’ EEG electrical signal was below 95% accuracy; (3) data from users who completed the experiments too quickly or too slowly were deleted. It was decided to analyze the data on average times; and (4) there were some delays in restarting one session to another, and it is believed that this caused loss of concentration and stress in the subjects. In total, there were data failures regarding 17 subjects. Therefore, data from the remaining 13 subjects were used, with 100% of the results in perfect condition, in accordance with the protocols adopted.

The two case studies were as follows: (1) the CGame experiment (Commercial Game)—an electronic racing game coupled to a Formula 1 cockpit-type simulator and (2) the TSimu experiment (Training Simulator)—a software for training someone to drive a vehicle with equipment that simulates driving that is used in driving schools to train drivers (Tavares 2022).

These two experiences used in the case study can be considered semi-immersive as they place users in partially virtual environments. Although the digital projection system is similar to the windshield and windows of a car, the audios are similar to the sounds of traffic and surroundings and users enter cockpits with steering wheels, gearshifts, and pedals, the systems used allow users to remain connected to the external world and its physical surroundings.

The reasons for choosing two computational systems with different characteristics (entertainment and training) were because: a) This type of media has complex emotional stimuli (Järvelä et al. 2015); b) the importance of investigations into emotion and affectivity in the use of computational systems and VR has been increasing (Hassenzahl and Tractinsky 2006; Kaza et al. 2016; Kumar and Kumar 2016); c) there are potential advantages to using psychophysiology techniques to investigate the emotional responses of users while they play computer games (Calvo and D’Mello 2010; Yannakakis et al. 2016); and d) they have been used in professional training experiments (Borsci et al. 2016; Calandra et al. 2023).

4 Research procedures

Figure 1 shows the layout of the two research environments. In Fig. 1a, the total area of the TSimu (15.40 m2) is shown, and in Fig. 1b, the total area of the CGame (11.61 m2) is shown. The environmental conditions of the two rooms were regulated equally with regard to the following elements: sound conditions, temperature, humidity and speed of the air, circulation of the air, and the flow of activities.

Fig. 1
figure 1

TSimu (a) and CGame (b) research environments

As to Fig. 1, number 1 indicates the entrances to the rooms, and the other numbers represent the equipment and accessories used in data collection, namely: 2) acclimatization chair; 3) support chairs; 4) air conditioning; 5) researcher’s notebook; 6) cockpit-type simulator; 7) camera to record the participants’ reactions; 8) camcorder camera for recording the performance of activities; 9) thermometer; 10) monitor to show the game/software; and 11) Playstation 4 console. Items 10 and 11 were not used in the TSimu experiment.

Figure 2 shows real images of the two research environments. On the right side, Fig. 2A shows a subject participating in the training experiment (TSimu), and in Fig. 2B, on the left side, a subject is shown taking part in the fun experiment (CGame).

Fig. 2
figure 2

A The training simulator used in the TSimu experiment and B the racing simulator used in the CGame experiment

Within the images, some numbers can be seen. They serve to identify the equipment and accessories specified by the numbers in Fig. 1 and which are described in the same figure.

4.1 Data collection

Data collection followed electroencephalography and digital thermography protocols (IACT 2002; Posner et al. 2005; Ring and Ammer 2012; Sheikholeslami et al. 2007; Monk 2008; Malik and Amin 2017; Fernández-Cuevas et al. 2015). These protocols were adapted by the author to meet the characteristics of the research. Ethical principles in research with human beings were set and approved by the Ethics and Research Committee of the Federal University of Pernambuco, Brazil.

4.2 Tools for collecting data

This research used four tools to collect the participants’ responses which were divided into two categories: (1) two self-report questionnaires about emotions (GEW and PANAS) and (2) two sets of equipment for psychophysiological analysis (electroencephalography Emotiv EPOC + portable device and digital infrared thermography camera).

4.2.1 Self-report questionnaires about emotions

The Geneva Emotion Wheel (GEW)was used to analyze users’ emotional valence. It is an empirically and theoretically tested instrument to measure users’ emotional reactions after they use objects in various contexts (Sacharin et al. 2012; Unige 2019), including in games research (Lv et al. 2017; Jercic et al. 2018). The circular structure of the GEW presents 20 emotions, 10 with positive valence (quadrants 1 and 4) and 10 with negative valence (quadrants 2 and 3), and participants only need to choose the option that represents the emotion they are feeling at the time they answer the GEW questionnaire.

The Positive and Negative Affective Schedule (PANAS)was used to verify the users’ level of affect. Affect is part of an individual’s sensory experience and plays an important role in both cognition and perception. This tool was used to investigate an individual’s state of affect when faced with a certain situation that arouses his or her emotions (Watson and Clark 1994; Duncan and Barrett 2007; Carvalho et al. 2013), including in games research (Buelow et al. 2015; Evans et al. 2018). Participants need to assign values (1–5) to all 20 PANAS emotions. The overall result indicates whether the affect is positive or negative.

4.2.2 Tools for evaluating the psychophysiological elements

Electroencephalography (EEG)—To capture the electrical activity of participants’ brains, we used the device Emotiv EPOC + 14 Channel Mobile Brainwear and the Emotiv Pro v.2 Academic Plus + software. This is portable equipment used in various scientific research studies (Rodríguez et al. 2013; Williams et al. 2020), including studies on electronic games (Ahn et al. 2014; Abhishek and Suma 2014) and virtual reality (Krokos and Varshney 2022).

Infrared Thermography—a Flir E60 thermal imager (19,200 pixels and 320 × 240 resolution) and FLIR Tools software were used. The regions of interest (ROIs) chosen to analyze the participants’ thermal changes were the tip of the nose, cheeks, and chin, as they are considered important facial reference points for capturing emotions (Nhan and Chau 2010; Robinson et al. 2012; Ioannou et al. 2014; Legrand et al. 2015; Salazar-López et al. 2015; Cruz-Albarran et al. 2017; Kosonogov et al. 2017; Goulart et al. 2019; Zhang et al. 2019).

The aforementioned authors indicate that: (1) When positive valence occurred, there was an increase in the temperature of the nose and a drop in the temperature of the chin; and (2) when negative valence occurred, there was a drop in nose temperature and an increase in cheek temperature. Therefore, these are the parameters of thermography that was set for the present study.

4.3 Design of the present study

The design of this study followed the protocols established for electroencephalography and infrared thermography to avoid incompatibilities and inaccuracies (a) in accordance with the topic and (b) regarding data collection. Thus, each of the two experiments was performed in four steps as set out in Table 1.

Table 1 Sequential data collection scheme in CGame and TSimu

Table 1 shows that the data were collected in steps 2 and 4 and that the psychophysiological data were collected before the self-reports. The steps will be described below:

Step 1–Initial Acclimatization—This step comprises the 15 min of initial acclimatization required to regulate the temperature of the body as described in the protocol for applying thermography (IACT 2002; Ring and Ammer 2012). Then, the participant sat down in the simulator cockpit, and an Emotiv EPOC + was placed on his/her head. Next, he/she was informed that the experiment would begin.

Step 2Before the game—The objective was to obtain information about the participant’s psychophysiological emotional state by using EEG and thermography and his/her emotional state by completion of self-reports (GEW and PANAS). These tools were applied before the two experiments began. The results from these initial data were compared with the final data obtained in Step 4, called “After the Game.”

Step 3–Experiment—This step corresponds to doing the TSimu and CGame experiments themselves. The three sections of this step, also called rounds, were equalin narratives, and each of them corresponded to a cycle of activities with specific goals to be accomplished. The participants were informed of these goals at the beginning of Step 1, and they are quoted in the next paragraph. It should be made clear that no evaluation tool was applied during Step 3 in order not to interrupt the performance of the participants.

In the CGame experiment, a Formula 1 cockpit-type racing simulator was used, and the game “The Crew 2” was played (Tavares 2020a). The activity proposed required the participants to finish the three sections in the shortest possible time and to improve the result of the previous section. They also had to avoid crashing into other cars, obstacles, and buildings. The TSimu experiment use was made both of the Pro Simulator driving simulator, which is generally used in driver training schools in Brazil, and also of the “ProS.auto” software (Tavares 2020b). The objective of the activity was for the participants to complete the three sections without committing traffic violations. They were also asked to drive the longest distance possible by the end of the stopwatch count (4 min and 30 s) and to improve their performance over the three sections.

Step 4–After the game—The goal of this step was to obtain the participant’s psychophysiological and self-report emotional responses after he/she had completed the two experiments. What could be verified in the two case studies was whether the experience was positive or negative from the participant’s point of view and from measuring the reactions of his/her body.

5 Results and analysis of the data from the experiments

Data analysis was divided into categories, and the results were recorded under the respective tools. The results were prepared considering the group of subjects, i.e., there was no individual analysis. Therefore, most results will be displayed in percentage form.

5.1 Self-report category

GEW—The GEW result corresponds to the sum of emotions with positive and negative valence. Table 2 is divided into two parts. The left side shows the TSimu results, and the right side shows the CGame results.

Table 2 The results of the GEW questionnaire in the TSimu and CGame experiments

The “Q” column indicates how many times the emotion was chosen, and the “P/N” column indicates whether it is a positive or negative valence emotion. Adopting the type of valence according to the subjects’ responses as a general result, we arrived at the following results:

In TSimu, the initial stage showed 12 positive emotions and 1 negative emotion (92.3% positive emotions). In the final stage, there were 8 negative and 5 positive emotions (38.5% positive emotions). This indicates that there was a 41.71% drop in positive emotions at the end of TSimu. As for CGame, the initial stage showed that all volunteers chose positive emotions (100%). The final stage showed 9 positive emotions and 4 negative emotions (69.2%). There was a 69.2% drop in positive emotions in the entertainment experiment.

Note that there was a sharp fall in the positive valence of both experiments. In other words, the responses to the GEW questionnaire indicate that the emotional state of the subjects showed an increase in negative emotions, and the TSumi experiment was slightly more negative than the CGame experiment.

PANAS—The final result was obtained with the sum of the intensities of the ten positive emotions and the ten negative emotions. If positive affects have a higher intensity score than negative affects, it means that positive valence predominated.

Table 3 shows the 20 PANAS affects in the first column. The second column indicates whether the affect is of positive (P) or negative (N) valence. The “TSimu” and “CGame” columns are divided into three subcolumns. The “Initial” and “Final” subcolumns show the score for each affect according to the responses of the 13 subjects. The “Diff.” subcolumn shows the difference in points between these two subcolumns. In other words, if the final value was greater than the initial one, the difference will be positive. But if the value is lower, the difference will be negative.

Table 3 The results of the PANAS questionnaire in the TSimu and CGame experiments

For this research, the value corresponding to the difference in the increase or decrease in each affect was considered after comparing the results of the initial and final stages (Diff. column). The general result corresponds to the percentage comparison between the means for the difference in positive affects and the means for the difference in negative affects (displayed in the last two lines of the table). The highest mean difference value will indicate whether there was a general predominance of positive or negative valence.

Before the analysis, it is important to mention the total PANAS points in each stage. (1) Initial TSimu (positive = 471, negative = 202); final Tsimu (positive = 408, negative = 307) and (2) initial CGame (positive = 454, negative = 188), final CGame (positive = 434, negative = 244). The last four lines of Table 3 show a percentage summary of these values.

TSimu’s positive affects lost an average of 7 points, i.e., 13.37% of its initial score. Regarding negative affects, there was an average increase of 10.5 points, i.e., 51.98% of the initial score. In CGame, there was a drop of 2 points in the average of positive affects, representing 4.4% of the initial value, and an increase in the average of negative affects of 5.6 points, i.e., 29.78% of the initial score.

These results indicate that, in general, the responses to the PANAS questionnaire showed an increase in negative affects. Detail: The scores of all 10 negative affects increased in both experiments. When comparing the self-report results from TSimu and CGame, it was again found that the training experiment presented a larger negative valence.

5.1.1 Analysis 1—emotional self-report tools

The comparison between the final and initial data from GEW and PANAS was considered. Both the TSimu and the CGame experiments showed a drop in positive valence and an increase in negative valence in the GEW. In PANAS, both experiments showed an increase in negative affect and a drop in positive affect. The GEW and PANAS results showed similar results and statistics also proved the concordance and harmony between the GEW and PANAS results after Cochran’s Q test presented a p-value of 0.002 and McNemar’s test presented a p-value of 0.617. However, the values in Tables 2 and 3 indicate that the negative intensity of the TSimu was greater. In other words, the experience with the entertainment simulator was more positive than with the training simulator.

It is believed that several matters may have contributed to this. These include: (1) CGame was a fun experience; (2) the research environment was playful; (3) the participants were familiar with the game category of racing; (4) the state of flow (total involvement) caused by the game (Csikszentmihalyi 1990) may have been greater because it was a fun situation; (5) the expectation of feeling the thrill of using the Formula 1 cockpit for the first time (none of the participants had used equipment like this before); (6) the freedom to drive the car without the need to follow traffic rules; (7) the subjects’ lack of experience in driving cars may have contributed to their poor performance on the TSimu, and this may have influenced their responses; (8) the excess of traffic rules to be obeyed in TSimu may have stressed the subjects; and (9) TSimu was carried out before CGame. Therefore, the fun nature of CGame may have contributed to it being considered more positive than TSimu.

5.2 Psychophysiology category

The results of biofeedback measurements can be considered objective because they are represented by values generated through exhaustively tested and scientifically proven calculations and mathematical formulae (EEG) and also by the comparative analysis of results (images) obtained with precision equipment in pre-determined periods (thermography).

EEG—An epoch (period of time) of 60 s of electrical activity of the brain was defined as the period to be analyzed at the beginning (initial EEG) and at the end of each experiment (final EEG). The result of each epoch corresponds to the average of all alpha electrical activity, i.e., it is the average of each epoch. However, in order to synchronize the moment of capture of the thermal image and the average of the EEG epoch, a protocol had to be created so as to synchronize the EEG with the thermogram. Therefore, the EEG average corresponded to the same instant of thermogram capture, both at the initial Thermo and at the final Thermo.

To verify the emotional valence, the valence hypothesis was used (Davidson 1992; Zhang et al. 2011; Prete et al. 2014). This indicates positive emotions when there is greater activity in the left hemisphere of the brain and indicates negative emotions if the right hemisphere has greater activity. However, to calculate the asymmetry in the activity of the cerebral hemispheres, the Frontal Alpha Asymmetry Index (FAAI) of the alpha frequency band was used (Coan and Allen 2004; Stewart et al. 2010; Suo et al. 2017). The electrodes investigated were F7, F3, F4, and F8 (Moridis et al. 2017). The Frontal Alpha Asymmetry Index is calculated by the formula: FAAI = LognµV2AlfaFD − LognµV2AlfaFE.

Frontal Alpha activity represents a reduction in activity in a given region, i.e., if brain activity in one of the hemispheres is relatively high, Alpha activation is low and vice versa (Valenzi et al. 2014). This inverse relationship between alpha band power activity and brain activation is accounted for in the IAAF logarithmic function. Therefore, the final result of the IAAF corroborates what governs the valence hypothesis.

The average value of the FAAI for each stage of the two experiments was generated and organized in accordance with the model in Table 4. Table 4 shows only the initial EEG result from the CGame experiment for demonstration purposes. The first column presents the number of each participant (Part.). The columns of the left hemisphere (electrodes F7 and F3) and the right hemisphere (electrodes F4 and F8) show the average values of the electrical activity of each participant’s brain. The last column corresponds to the FAAI and, below this, is the global FAAI value for this step (Average of initial EEG =  − 1.377).

Table 4 Average FAAI values generated in the initial EEG step of the CGame experiment

There is also a similar table for the final EEG of the CGame, for the initial EEG of the TSimu, not included in this paper.

Table 5 presents a summary with the average FAAI values in the two experiments. The line “Difference” presents the total value of the change from the initial average to the final average. The “left hemisphere” and “right hemisphere” columns display the average values of their respective electrodes (F7-F3 and F4-F8), and the last column shows the overall average of the FAAI of each step.

Table 5 Overview of FAAI scores in the right and left hemispheres

In the CGame experiment, the overall average of the FAAI in the initial EEG (in the last column) presented FAAI with a high negative value (− 1.377). This means an overall negative response before the experiment. The overall average of the FAAI in the final EEG was (0.002), i.e., there was a decrease in the negative valence. Therefore, there was an increase in positive valence. In the TSimu experiment, the overall average of the FAAI in the initial EEG was 0.04, and the average of the final EEG was 0.20, and there was an increase in the positive valence. Note that there is an overall increase in the activity of the left hemisphere in the two experiments as can be seen in the values of the “Difference” line.

In the end, when considering the inverse relationship between the cortical activity (which demonstrates that the values in the right hemisphere were greater than those of the left hemisphere) and the activity of Alpha power (which indicates positive values in the two final averages), it is proven that there was an increase in the positive valence in both experiments.

Looking at the “Difference” line, it can be seen that the experiment with the CGame entertainment simulator showed the greatest changes (differences) in the values of the electrical activity of the brain. This means that this experiment was more intense than the experiment with the training simulator according to the data obtained with the EEG.

Statistical analyses on the EEG results were performed. In TSimu, after carrying out the Shapiro–Wilk normality test (p-value of 0.000004) and the Wilcoxon test (p-value of 0.878), it was evident that the medians of the initial and final stages are the same, which explains the proximity of the numbers in the “FAAI Average” column. In the CGame, the Shapiro–Wilk normality test (p-value 0.456) and the paired t-Test (p-value 0.049) indicate that the EEG mean of the initial and final stages are not statistically equal, and the “FAAI Average” column showed the large difference in values between the initial and final stages.

Thermography—The descriptive statistics of the thermal images of the regions of interest (ROIs) and paired samples test were analyzed. The average initial (initial Thermo) and the average final (final Thermo) temperatures of the tip of the nose, cheeks, and chin were obtained. The results were then compared to verify if there had been a thermal increase or decrease in these ROIs. The results are presented in Table 6. The temperature values (min, max, and average columns) are in degrees Celsius.

Table 6 Results of the capture of the temperature of the regions of interest (ROI)

The columns identify the minimum temperature (min), maximum temperature (max), the average temperature (average), and the standard deviation (S.D.). Although the temperatures of the right and left cheeks were obtained separately, we chose to consider the average value between the two cheeks, as can be seen in the item “Average temp. of cheeks.”

As seen in the “average” column of the TSimu experiment, when comparing the results of Step 2 (before the game) and Step 4 (after the game), there was a drop in temperature only at the tip of the nose. The chin and cheeks showed an increase in temperature. In the “average” column of the CGame experiment, it can be seen that all ROIs showed a drop in temperature after the end of the fun experiment.

Table 7 shows a summary of the previous table, presenting only the average temperatures in degrees Celsius. It is important to remember that, according to the sources cited in the last paragraph of Sect. 3.3.2 (page 07), a positive valence condition occurs if there is an increase in the temperature at the tip of the nose and a drop in the temperature at the chin. And it will be negative valence if the temperature of the tip of the nose drops, and the temperature of the cheeks increases.

Table 7 Summary of mean values of the temperature of the regions of interest (ROI)

The “Results” column shows the valences (P or N) of the three ROIs according to the criteria in the previous paragraph and according to their respective indicators (1, 2, and 3). Note that the TSimu experiment presented three thermal changes that indicate negative valence (N), and the CGame experiment presented two positive changes (P). This means that, according to the three ROIs analyzed, the training experiment was completely negative, and the CGame experiment was partially positive.

5.2.1 Analysis 2—concordance between the self-report and of biofeedback

The comparison between the initial and final data of the EEG and the thermography with the results of self-reports was considered. Table 8 presents the results of the largest changes in the average valences of the final stages of the four data collection tools. It was decided not to analyze the average valences of the initial stages. The reason is that thermography results require at least two measurements to be compared, i.e., the final measurement compared to the initial measurement. The protocols of other tools allow the valence to be identified in just one step, with just one measurement.

Table 8 Result of the final valences of self-reports and biofeedback

EEG VS. Self-report (based on Tables 2, 3, and 5)—The final EEG of both experiments showed an increase in the positive valence, with greater intensity in the CGame. In the self-report questionnaires, the average valence of the final stage of the TSimu GEW was negative. Although the final PANAS average was positive, there was an increase in all negative affects. This means that, in general, the TSimu experiment showed disagreement between the reported results (GEW and PANAS) and the EEG. Regarding the CGame experiment, there was agreement between the final valence of the EEG (positive) and the GEW (positive), although there was a drop in positive valence. Although the final mean of the PANAS was positive, it also showed an increase in all negative affects, which highlights a general negative affect and a disagreement with the EEG. This implies that between the two experiments, only the training experiment did not show agreement between the EEG and self-reports, but the fun experiment showed agreement between the EEG and the GEW.

Thermography VS. Self-report (based on Tables 2, 3, and 7)—After identifying an increase in all PANAS negative affects in both experiments, the valence of GEW in TSimu was negative, and in the Cgame, it was positive. When comparing with the thermography results, it was found that all TSimu ROIs presented negative valence, which characterizes complete agreement between self-reports and thermography in the training experiment. Regarding the fun experiment, there was only agreement between GEW and thermography, but not with PANAS. In other words, there was partial agreement with the self-reports.

Thermography VS. EEG—Table 8 makes it clear that in TSsimu, the results of the final valence of IAAF (which were positive) do not agree with the thermography (3 ROIs with negative valence). In the CGame, partial agreement occurs, as two of the three ROIs indicate positive valence, as seen in IAAF.

In summary, we had the following results: (I) TSimu: total agreement between the thermography and the two self-reports. Disagreement between the two biofeedback reports; (II) CGame: partial agreement between the thermography and a self-report (GEW). Agreement between the EEG and the other biofeedback (PANAS). Partial agreement between the two biofeedback reports. This means that thermography showed greater concordance with the self-report results than the EEG, considering the protocols and data collection methods adopted in this research.

5.2.2 Analysis 3—CGame versus TSimu

What experiment presented greater concordance between what was said (self-reports) and what was felt (psychophysiology)? The results from the self-report GEW and PANAS tools (Analysis 1) showed that the TSimu experiment presented greater concordance between the two self-report responses. The CGame (entertainment) was the experiment that presented the highest percentage of positive emotions, especially among the examples of biofeedback. In the psychophysiological tools (Analysis 2), the TSimu experiment showed the highest concordance between the self-reports and the thermography results. And the CGame showed a balance in concordance, which highlights the biofeedback and the GEW.

Given the results, we can see that biofeedback can be used to check the emotional state of subjects, but research is needed in different contexts and situations so that it can reach a high level of reliability.

6 Discussion

The results led to confirming how complex it is to use different tools to investigate the emotional state of individuals after they have used computational systems. Nevertheless, doing so proved to be necessary and reinforced the understanding that adopting only one tool can conceal or generate incomplete or erroneous results about individuals’ real emotional opinions. The ever greater use of virtual reality in extremely diverse situations reinforces this finding.

We believe that this investigation needs to be expanded and applied in other contexts, with other types of participant and needs to adopt other methods to bring new results. All multimodal research presents challenges and difficulties due to the methodology adopted, the amount of data generated and the complex analysis of the results. This research reinforced the idea that the use of biofeedback in research on emotional state can make great contributions to the understanding of human emotions.

In this research, we verified some points that can be improved in future research: (1) Check if there is a predominance between ROIS, that is, if the change in some ROI can be more representative than another ROI; (2) compare the subjects’ performance results after the sections to see if there is any relationship with the self-report and biofeedback results; (3) investigate the male and female public separately and identify the emotional differences reported and obtained through biofeedback; (4) increase the number of subjects investigated; (5) check whether there is a difference between the intensity and agreement of positive biofeedback and negative biofeedback; and (6) check which aspects and situations of a training experience present similarities in emotional responses with fun games. The answers can contribute to understanding the role of gamification in training processes.

7 Conclusions

The results of this study confirm the importance of undertaking studies involving user experience and emotions to reach a better understanding of people’s real emotional state when they use products and systems. Note also that these investigations can be extended beyond entertainment or training situations as verified here.

Multidisciplinary studies such as this one help to contribute and expand how investigations of each of the areas involved (design, emotions, games, virtual reality, HCI, usability, and user experience) are tackled, and this can include by combining emerging technologies, e.g., EEG and infrared thermography. After all, user experience involves both what is reported and what is felt.

Thus, given that this study has shown the wide application of different tools and taken advantage of the lessons learned, it is believed that the procedures of this research can be reproduced in other contexts in which the emotional state of the user and even of the worker is an essential element in carrying out several tasks. Even the conduct of tasks can occur both in real and virtual environments and situations.

These contexts include research involving stressful situations in the use of products or systems; assessment of tasks in home–office environments (especially after the COVID-19 pandemic); research on measuring human satisfaction with games being used as complementary therapeutic treatment to maintain or promote health (exergames); training situations with virtual reality, research on emotional responses and the usability of virtual reality, augmented reality and items of extended reality equipment; and studies on user experience (UX) in various contexts, especially with digital and virtual reality products.

This research reinforced the authors’ view that using a single tool to understand a user’s emotional state can lead to incomplete or erroneous results. Using multiple techniques (qualitative, quantitative, or both) is critical both in real and virtual situations.

Although research into human emotions is still a great challenge, many answers can be found by using psychophysiology. The increasing use of virtual reality also requires new multimodal studies to examine the emotional reaction of individuals to this technology. Using digital thermography proved to be an innovative and promising tool in this investigation.