Keywords

1 Introduction and Related Works

Recent years have brought a dynamic growth of diverse approaches and solutions to the novel modes user interaction and interfaces in the field of Human-Computer Interaction (HCI). One of the technologies undergoing significant evolution as new technical solutions and interfaces become available is Virtual Reality (VR), which challenges established paradigms of user interfaces, in particular well established WIMP paradigm (windows, icons, mouse, pointer).

This compels developers and academics to explore novel interfaces to facilitate effective human interaction with a three-dimensional virtual world, such as VR. There are multiple indicators of immersion in VR [14] in the field of applied psychophysiology [17], which may be used for the purpose of evaluating presence [5]. This aspect is key in evaluating interaction with avatars [2], or virtual agents [16], which are necessary in VR to engage people in social situations.

This study was a starting point for exploring various modes of voice interaction. It based on previous works of our XR Lab team in this field as a part of HASE research group (Human Aspects in Science and Engineering) by the Living Lab Kobo research activities on virtual reality rapid prototyping and development. These activities include end user engagement [10] and rapid content and software development [8] as well as alternative interfaces which were presented on major conferences, including CHI and also on previous MIDI conference, i.e. voice interfaces [7, 11] and brain-computer interaction and interfaces (BCI) [9].

Therefore, the primary objective of this work is to propose a hardware and software solution that enables repeated experimental research of user interaction with agents equipped with various types of avatars. Another objective is to determine the differences in perception of various forms of avatars representing virtual agents in virtual reality. The main research hypothesis is that regardless of the visual depiction of the avatar, i.e. the virtual “person” giving the information, there are no variations in the user’s perception of the identical content. In other words this research endeavour is in line with the concept of ecological validity.

The concept of ecological validity refers to experimental findings that can be generalized to real-life [1]. In research measuring emotional and cognitive processes, two approaches are often used – experimentally testing those processes in the laboratory or using retrospective recall with self-reported measures. Both of those methods are impairing the ecological validity of the study. First, laboratory settings are often very far from everyday life and thus the psychological processes measured in the lab might not fully reflect the everyday life of a given individual/group of individuals. Second, some of these processes, e.g. avoidance, can not be reliably measured through self-reported measures prone to retrospection biases. Thus, one of the main challenges in current research in the field of psychology and related disciplines is to assess and test psychological processes in the larger context of ecological validity, taking into account not only a given process but also the context of its development and maintenance [6]. Providing tools to study such psychological processes in the conditions of ecological validity is therefore a crucial research problem.

The results of this study coined the foundation for our XR framework for the development of advanced immersive environments and research tools providing ecological validity conditions with multimodal experimental data acquisition, including self-reported data (e.g. surveys) as well as objective psychophysiological data, related to eye movements, cardiac functions or skin conductance, described in the method section below. Therefore the results of this study paved the way for follow up studies and further research within the HASE group member labs, including Emotion Cognition Lab SWPS University and Institute of Psychology Polish Academy of Sciences.

2 Methods

2.1 Study Aims

To validate the study hypothesis while also evaluating the system’s usability, the following research variants of the virtual agent interaction are compared (see Fig. 1). They are embedded in the same omnidirectional visual environment:

  1. 1.

    Avatar 1. High-fidelity model (rendered on the basis of photogrammetry) with scripted 3D animation,

  2. 2.

    Avatar 2. Video recording of a real person,

  3. 3.

    Avatar 3. VA (Voice Assistant), which is audio emitted from a virtual assistant model.

Fig. 1.
figure 1

From left to right: voice assistant case, high-fidelity photogrammetry model, video recording.

2.2 Mesures

As previously indicated, the study employed traditional research methodologies [3], both quantitative, including questionnaire surveys (conducted prior to, during, and after the VR session), and qualitative, in form of semi-structured interviews (prior to and after the VR session).

These methods were validated using objective psychophysiological markers, specifically:

  1. 1.

    Eye Movement (EM) as a major sign of attention, measured by eye tracking,

  2. 2.

    Synchronized signals from auxiliary source, namely:

    1. (a)

      Cardiac function (PPG - PhotoPlethysmoGraphy, photoplethysmography, assessment of heart parameters based on blood flow analysis),

    2. (b)

      Changes in skin conductivity (EDA/GSR - ElectroDermal Activity, Galvanic Skin Response).

The automated measurement of the aforementioned psychophysiological indicators within the proposed approach (research framework) was utilized to generate objective measures for evaluating the reliability of reception of the presented content. The objective of such verification was to eliminate inconsistencies in declarative data that are caused by natural human factors and are inherent in evaluating human-computer interaction, such as: the Hawthorne effect, [12, 13] which refers to the impact of the researcher’s presence and implicit expectations on the subject’s response, the desire to present a subject more proficiently than other subjects, and the possibility of obtaining insincere answers from the participants.

To test the research hypothesis, the results of the study participants’ declarative responses were compared to psychophysiological data on several dimensions relevant to assessing the immersion quality of the user’s interaction with virtual reality [5]. These dimensions include sense of immersion and co-presence, as well as the attribution of anthropomorphic features to agents, taking into account potential occurrence of the uncanny valley effect, which has been studied extensively in, and outside, of VR [15]. The last factor is especially pertinent when evaluating the quality of potential high-fidelity content, particularly humanoid avatar models [4].

Fig. 2.
figure 2

Schematic of the proposed research solution.

2.3 Research Application

The research conducted for this work resulted in the creation of the dedicated solution depicted in Fig. 2, which was subsequently validated through an empirical survey with users, as detailed later. The research solution consisted of:

  1. 1.

    Arduino - to mediate the Unity - Biopac communication.

  2. 2.

    Unity - with necessary prefabs such as: GazeObjectManager, The EyetrackerMasnager, SMI_CameraWithEyeTracking and SceneSwitcher

The following tools were utilized to develop the software required for the study: Unity, the Arduino IDE, MS Visual Studio, and the HTC iViewHMD software.

2.4 Research Flow

Process name

Description

Baseline

Gathering the data from the Biopac sensors without the headset to serve as a baseline for evaluating the psychophysiological data gathered during the experiment proper. The participant for 5 min is alone in a room, sitting in front of a black wall

Survey settings

The first scene after turning on the application visible only to the researcher. Here, enter the prefix of the result files for the test subject and the port number to which the Arduino is connected. Additionally, you can select the data simulation mode for the eye tracker to facilitate testing the application

Startup scene/calibration

The first scene visible to the subject. This is where the eye tracker is calibrated and the order of the scenes presented is implicitly selected

Preliminary survey (training, warm-up)

A scene that allows the subject to become familiar with the questionnaire interface. Additionally, it will serve to establish the subject’s baseline mood

VR 360 scenes with an agent

The main scene of the application showing stages with different agents in VR: 3D animation, video recording and voice assistant in a random order for different participants

Follow-up questionnaire after each VR scene

Scene used for survey, after each stage with an assistant

2.5 Experimental Setup

The pilot experimental study was conducted in the Institute of Psychology of the Polish Academy of Sciences’ VR Lab. IP PAN’s VR Lab is equipped with the technological equipment fulfilling the requirements of the study, including a SMI eye tracker paired with a virtual reality headset, a system for psychophysiological assessments (Biopac), as well as statistical analysis capabilities and research hypothesis verification (Fig. 3).

Fig. 3.
figure 3

Workflow of the survey application.

2.6 Participants

The pilot study involved twenty-two Living Lab Kobo participants, including 18 from the experimental group, which included seniors over the age of 60, and 4 from the control group (under 50 years old). The experimental group consisted of 13 women and 9 men. The mean age for the entire study was 64.1 (standard deviation, SD = 15.52), with the experimental group averaging 70.8 (SD = 7.65) and the control group averaging 37.25 (SD = 8.31). The study’s youngest participant was 23 years old, while the oldest was 90 years old. The median age was 66 years, with equals 68 in the experimental group and 41 in the control group. 22 sets of measurements were taken throughout the study, which included 18 sets of measurements from the experimental group and 4 sets of measurements from the control group (Fig. 4).

Fig. 4.
figure 4

Preparing the study participant at the VRLab IP PAS

3 Results

The results of declarative (ex-ante, control, and ex-post questionnaires) and psychophysiological (EM, PPG, and EDA) tests conducted during the analyses revealed statistically significant differences in perception of avatars, supporting the rejection of the hypothesis that no differences in perception of different types of avatars representing virtual agents in virtual reality exist.

The results of participants’ declarative responses to survey questions asked both before and after the study (on paper) and during the study: via a questionnaire module integrated into the research framework - were utilized to verify the research hypothesis. The questionnaire responses were examined in the context of psychophysiological data gathered using eye tracker EMs synchronized with Biopac signals (PPG and EDA/GSR).

Data analysis was conducted on several psychophysical dimensions identified in the formulation of the research problem that are relevant to assessing the immersion quality of a user’s interaction with virtual reality, specifically: a sense of immersion in virtual reality, a sense of co-presence, attributing anthropomorphic characteristics to agents, Belief in Human Nature Uniqueness (BHNU), and the uncanny valley effect. BHNU had a particularly strong link with the experience of co-presence in scenes with a humanoid avatar, with a correlation coefficient of 0.57 for video footage and 0.44 for rendered avatar. This demonstrates that the produced avatar lacked the sense of its human features present in the image from the video clip.

Moreover, additional extensive analyses of the anthropomorphic qualities assigned to avatars revealed further evident and statistically significant differences in perceptions of avatars. Fewer participants attributed human characteristics to the VA avatar than to a video recording and rendered avatar. The perceived sense of co-presence was most prominent for the video, decreased in case of rendered avatar, and was lowest for the VA.

Additionally, statistically significant variations were discovered in evaluations of the uncanny valley dimension: the phenomenon was observed the most for rendered avatar and occurred the least for the video recording.

The findings shown above, which are based on questionnaire and psychophysiological data, are consistent with the information gathered from in-depth qualitative interviews, as well as the analysis of eye tracker data.

4 Discussion

The results of the pilot study conducted in XR Lab PJAIT in cooperation with Emotion Cognition Lab SWPS University and Virtual Reality and Psychophysiology Lab of the Institute of Psychology of the Polish Academy of Sciences were deemed very promising by members of the HASE research group. As a result, work on the provided solution will resume, and the framework will undergo further development. Further waves of the study are planned to confirm the hypotheses by expanding the number of the experimental group of seniors and the control group of younger individuals.

With these objectives in mind, it is worth noting that the configuration of the connection between Arduino and Biopac, which is critical for synchronizing psychophysiological signals and correlating them to declarative questionnaire responses, proved to be effective and sufficient. However, due to the nature of the basic Biopac module (electrically unbuffered diagnostic ports), a more secure solution utilizing a specialized Biopac module for digital communication (STM type) or the use of an additional installation galvanically separating the electrical signal, such as an optocoupler, is recommended for the future.

5 Conclusions

The solution presented in this paper was validated through an experimental research procedure with users, demonstrating its efficacy and utility in resolving the primary research problem, which is the evaluation of interactions in virtual reality via new interfaces in the form of virtual agents with a variety of avatars. The experiment demonstrates that the numerous psychological measures used to assess users’ immersion in virtual reality reveal statistically significant variations in agents’ and their avatars’ perceptions. At the same time this study formed the basis for further work on the XR framework, which enables research teams to conduct XR experiments in the conditions of ecological validity, while at the same time verifying their qualitative findings through numerous psychophysiological measures. Such alignment of multimodal research measures in the immersive virtual reality enables the development of reproducible experiments providing more reliable, triangulated, results.