1 Introduction: Motivation and Related Work

Any VR experience allows the subjects to choose their paths of visual attention [4, 5] and spatial behavior [11, 12, 14, 15], which leads to different emotional reactions [7, 9, 16]. VR designers that employ any system of attention cues and plan to manipulate emotional arousal in immersive storytelling, may be interested in measuring the effectiveness of their actions. Tools are emerging, but so far, they are limited (e.g., based on the eye, and head movements only [13]). Our transdisciplinary multi-dimensional approach addresses the problem of reliable measurements of the audience response to cinematic VR experiences.

The number of virtual training and research environments in which participants are free to explore limited or (nearly) unlimited space increases [1, 8, 18, 19]. Analysis of participants’ movement, physiological responses and declarations allows inferences on the impact of the experience, approaching or avoiding objects, being relaxed or stressed, the response to virtual agents (passive or active characters).

Cinematic VR experiences (CVR) are limited compared to computer-generated 3D environments because they do not allow users to walk freely or interact with objects and characters. CVR is typically experienced in a linear story, but the use of spherical projection and central placement of the participant can make the experience different for each viewer, depending on the creator’s intention. CVR can be more immersive than computer-generated environments due to its higher level of photorealism, which can lead to greater spatial presence and co-presence illusions. Some CVR experiences can elicit intense arousal in viewers due to the inclusion of primary distress signals, such as sudden loud noises or changes in lighting [3, 10]. Increased arousal may not be related to the content of the narrative. However, high tonic activity (TA) can make it easier to elicit a response to important parts of the story. It can also make it difficult if the creator puts too many primal stimuli and TA reaches the ceiling level. Situations in which the story evokes arousal are much more interesting. These can be e.g., the appearance of characters, interaction between them, and music. Reliable measurement enables us to register, e.g., increased heart rate or skin conductance level as a response to objects and characters.

Successful storytelling in CVR depends on participants being placed in the right direction at critical moments, giving them the physical ability to follow what is visible and audible and respond to it in a predicted way.

Art creators and filmmakers are looking for a new VR narrative paradigm, and want to do it systematically. VR technology that made it possible to shoot stereoscopic 360\(^{\circ }\) videos and create interactive experiences also allows us to analyze participants’ reactions to the virtual content. The use of this information depends on close, transdisciplinary collaboration between art & science.

2 Overview of the Cinematic VR Research Method

2.1 The Research Application

The research application was created using the Unity engine, which is the core of the method and enables the collection of behavioral data and communication with external devices. However, the transdisciplinary approach to CVR research also involves the use of a variety of other methods to provide additional insights into viewer reactions to CVR and ensure the reliability of the research.

2.2 Screening

Before inviting participants to the lab, we ensure that there are no health issues or medications that could interfere with psychophysiological measurements (PF), and that participants have not consumed any psychoactive substances, alcohol, or coffee. We also verify that participants’ vision is corrected if necessary.

2.3 Measures Used Before VR Experience

Before watching the experience, participants complete questionnaires to assess psychological variables, individual differences, attitudes, and traits that may impact their perception of the experience or be affected by it. The specific methods used in these questionnaires may be tailored to the content of the experience or used consistently across all experiences, such as the mood scale and the Self-Assessment Manikin (SAM). [2, 6]. We measure temperamental traits, which are relatively stable human characteristics, to better understand participants’ motor behavior and emotional reactions. We also assess participants’ mood and emotional state to determine the positive or negative impact of the experience on them.

2.4 Baseline Measures

Research with the use of PF and behavioral measures require a baseline measurement in neutral conditions.

Calibration: 1) PF measures (e.g., EDA, ECG, EEGFootnote 1 if applicable) - require external equipment, and trained personnel; 2) eye-tracker - standard procedure from the manufacturer; 3) body motion and position (head, arms, legs, torso and waist). Participants perform a series of standard movements (head and whole body rotations) - providing data for the correction of motor artifacts in the PF signal.

Baseline Sequence: 1) 3 min in a quiet, minimalist environment, eyes openFootnote 2; 2) 360 neutral and affective videos - reference to PF response. Videos by Li et al. [7]: Abandoned City, valence 3.33/9, arousal 3.33/9, duration 50 s; Spangler Lawn, v. 5.9/9, a. 3.27/9, d. 58 s; Seagulls, v. 6/9, a. 1.6/9, d. 60 s; Walk Among Giants, v. 5.79/9, a. 2/9, d. 60 s; Tahiti surf, v. 7.1/9, a. 4.8/9, d. 60 s.

2.5 Cinematic VR Experience Test

After completing the baseline, the target CVR experience begins. Participants start watching from a fixed position and are free to move around (3DoF) and explore the experience (6DoF). Both body and eye movements are recorded. Thanks to markers, it is possible to time synchronize the PF signal recorded with external equipment.

2.6 Measures Applied After VR Experience

Participants complete a series of questionnaires. Presence and co-presence, repeated measure of mood and SAM, user experience and declarative evaluation of the baseline and target CVR is measured. Questionnaires specific for the tested VR experiences are introduced. An in-depth interview regarding the content and impressions related to the experience is conducted.

2.7 Digital Markers Calibration

Synchronization with external devices (e.g. BIOPAC, NEUROSCAN) requires the ability to send digital triggers via LPT or USB TTL adapter. Unity allows us to send time/event related and conditional markers with millisecond precision, which is important when analyzing subtle changes in PF signal. We use 8-bit markers (1–255). Recording frequency of 1000 Hz or 2000 Hz enables precise reading of the marker. A test is performed before each series of studies. A series of markers in 1–1000 ms intervals is sent and registered on an external device with 20000 Hz resolution. The marker times are compared and corrected if necessary.

3 Current Research - Method

3.1 Participants

247 people participated in the study (157 females, 88 males, 2 persons did not indicate gender or chose a different answer), mean age 31.65 SD = 9.32. Half of the participants were involved in creative activities, while the other half were recipients of art (interested in, e.g., cinema or art exhibitions - screened to rule out the distracting influence of low motivation) (Fig. 1).

Fig. 1.
figure 1

One of the participants of the cinematic VR experience study.

The study involved volunteers recruited through online ads. Screening was performed to exclude people who had contraindications to participate in VR experiments (e.g. problems with stereoscopic vision) or used substances that could influence PF measurements (e.g. anti-arrhythmic drugs).

3.2 Materials and Apparatus

Measures. Due to length restrictions we limit the scope of this paper and analyze only a part of the data. The full list of measurements includes: 1. questionnaires related to the CVR productions and individual differences that may affect reception of them, 2. mood and well-being repeated measure, 3. PF data (eye movements and fixations, pulse, EDA), 4. post-VR questionnaire that included:

  • User experience experimental setting and equipment scale (from 1 definitely not to 5 definitely yes)

    • instructions: The instruction given by the experimenter was clear and understandable

    • lab.comfort, .hot, .tooSmall, e.g., The room where the simulation took place made me feel comfortable,

    • appar.comfort The apparatus used during the study was comfortable,

    • HMD.heavy The HMD was too heavy,

    • HMD.hot It was too hot with the HMD on,

    • HMD.unhyg The HMD looked unhygienic,

    • Bothering.wires Wires connected to HMD bothered me,

    • Bothering.noVisual (...) after putting on HMD, I couldn’t see the experimenter,

    • Bothering.observed It bothered me that other people were watching me,

    • afraid.fall After putting on the HMD, I was afraid that I might bump into something or fall over,

    • afraid.damage The equipment used for the training was too delicate, I was afraid that I might damage something.

  • User state - Self-Assessment Manikin (SAM) [2, 6]

    • valence (positive - negative),

    • arousal (high - low),

    • dominance/control (low - high);

  • Short Presence Scale (SPS) created by the authors, based on the MEC-SPQ [17] (e.g., I felt as if I were taking part in events, not just watching them): Spatial Presence - Self Location (SPSL), four items; Spatial Presence - Possible Actions (SPPA), four items; Suspension of Disbelief (SoD), two items; Attention Engagement (AE), two items.

Hardware. 64bit PC Intel Xeon I7, RAM: 32 GB, GTX 1080TI (Win10), a Neuroscan device, 6 HTC ViveTrackers 2.0, Vive Pro Eye HMD, accessories: elastic wrist, ankle, belt, chest straps with tracker mounts and replacement facial interface from VRCover with single use hygienic covers.

Experimental Application. Experiment application was prepared in Unity Engine. It consisted of a single scene environment, with a simple cubical room as a neutral starting point for the stimulus. Important features were: playback of hi-res 360 videos with ambisonic audio, sending event markers using parallel port and gathering user positional and eye tracking data with high resolution. Video playback was solved by usage of Unity Store asset designed for uninterrupted 4K video playback with stereoscopic 360 video support. To solve ambisonic audio playback we have used a Facebook 360 spatial decoder, synchronising its playback internally with video. This approach combined with splitting audio and video streams into two separate files, proved itself sufficient to meet aforementioned assumptions. For parallel port communication an external library was used. To achieve best precision of event markers delivery we have developed solution allowing us to prepare event markers in advance and separate task of timing and sending those to parallel port output into a separate thread, with help of C# built in Stopwatch class millisecond precision of the markers was achieved. Highest resolution of data gathering possible was limited by Unity to 90 Hz as the core of the application is constrained by the application framerate (consistent with HMD’s framerate), which proved to be sufficient. Data we gathered was obtained as follows: from within scene - positional and rotational data of each connected VR device and accessory, using OpenVR API we also gathered positional data, while additionally collecting velocity and pose matrix data of those devices, by using SRanipal API we gathered eye tracking data consisting of combined eye, left and right eye gaze data and current eye focus point. Outgoing event markers were recorded allowing us to sync with PF data (Fig. 2).

Fig. 2.
figure 2

Diagram of sources of gathered data.

3.3 Procedure

Six HTC Vive Trackers were attached at ankles, wrists, belt and chest areas with special anti-slip straps. Participant was seated in a chair where electrodes and sensors were applied. Then, the HMD was put on, electrode readouts were checked, and per-user eye tracker calibration was performed. After setup all lights in the experiment room were turned off, baseline and target registration followed under supervision.

Table 1. Global scores and Personal characteristics
Table 2. Contents characteristics

4 Results

4.1 User Experience - Experimental Setting and Equipment Evaluation

The overall user experience rating was very good. The pattern of the results of individual aspects was as predicted - see Table 1 “Global”: 1) The instructions provided by the experimenter and in the app were clear and understandable; 2) Lab comfort was rated very high, which is consistent with temperature and lab size ratings; 3) Apparatus was comfortable, although rated lower than lab in general. HMD - appropriate weight and temperature, hygienic (important in times of epidemic threat); 4) Neither equipment nor experimental situation bothered the participants. Participants were not bothered by the fact that they did not have visual contact with the experimenter after wearing HMD, or that someone could observe them during the VR experience; 5) Participants were not afraid to fall or bump into something. They also did not see the equipment as fragile and easy to damage. 6) Differences; Women considered HMD significantly heavier than men. Art recipients were more concerned about being watched than the art creators. Surprisingly we noticed that some aspects of comfort differed between narrative vs. non-narrative and live-action vs. animated conditions.

4.2 User State - Emotion, Arousal and Control

A key ethical issue was to ensure that participants were in a good emotional state upon completion of the study. This goal was achieved: participants felt rather happy, calm and in control. As expected, there were no differences in SAM scale (Table 1). Differences appeared between types of content (Table 2). As expected, participants felt more in control in narrative compared to non-narrative condition. They also felt more happy, aroused and in control in the animated compared to live-action condition.

4.3 Presence

Global level of presence was moderate (SPSL and SPPA, see Table 1 “Global”). SPPA were rated lower than SPSL. There are individual differences; men rated SPPA below the midpoint of the scale, whereas women rated it higher, above the midpoint of the scale. More differences were found between art creators and recipients: Art creators rated SPSL &PA significantly lower than recipients. They scored also lower in SPSD. Ratings in all presence sub-scales were lower in live-action compared to animated condition (Table 2).

5 Discussion

We presented scientific tools for researching virtual narratives - 360 3D CVR. Our methods are comfortable and safe for participants. They rate high conditions in the lab and the equipment used (in terms of e.g. comfort and hygiene). There are also no significant distractions from equipment or procedure (comp. SAM and presence scales). After the study, participants feel happy, calm and in control. The level of self-location is slightly above average, indicating illusion of presence. The level of possible actions is lower, because CVR is not interactive. This may lower the overall presence level, as CVR photorealism may trigger an expectation of interaction. Another reason could be experimental situation: participation in the study may trigger participants’ urge to carefully analyze the presented materials. Attention involvement is very high and suspension of disbelief low. This may cause the participants to notice all the flaws of the cinematic experience while being distracted by the internal and external stimuli. This is in line with the differences observed between art creators and recipients. Creators (being more critical) have lower scores on the presence subscales except for attention involvement. They analyse the experience more, and therefore they have lower suspension of disbelief and presence. Differences between animated and live-action experiences may be due to the contents of the experiences. Live-action VR is very realistic, whereas animated VR looks artificial - expectations differ.

The unexpected differences between narrative vs non-narrative can be explained by artifacts that possibly appeared in between-subjects design. The lab and HMD temperature ratings differed, which coincided with the execution of one of the conditions (experiences were tested right after they were produced), they could cause systematic differences in ratings. It needs further investigation, as there is no theoretical reason for them to differ this way (especially as there were no differences in other factors, e.g. gender).

There are differences in self assessment of emotional state between contents conditions, but not in art-related and gender groups. This further supports the explanation coming from the differences in the tested narratives which match the more global classification of contents groups. A further investigation is needed with a more comparable material - for example an experience with the same content but a different form: live-action vs animated. It is worth noting that despite differences, the participants in all groups are satisfied, calm and in control after completing the study.

6 Conclusions

In terms of user experience and predictability of results, we have created and presented an effective method of CVR research. It can provide an insight into the perception of CVR experiences that can be used by art creators to better understand their audience and improve their means of expression. The obtained data is also attractive for various fields of science. This gives hope for closer cooperation between art &science and the improvement of both CVR productions and tools for studying them.