1 Introduction

Virtual Reality (VR) technology has been defined as an interactive, immersive experience generated by a computer [1], in which interpreting and encoding information from the surroundings involve the integration of inputs from high screen resolutions [2] and all viewing directions. When designing a VR application, the ability to create an intense sense of reality plays a key role to enhance the user’s experience. In fact, real immersive experiences stimulate the five sensory systems and cause an increase in the activity of the parasympathetic nervous system thus enhancing the person’s awareness [3]. When immersed in a VR environment, the human brain has to merge the virtual visual information to build up a coherent and accurate representation of the events occurring in it. Therefore, although VR technology has emerged and has been deployed in several application scenarios over the past decade (e.g., entertainment [4], healthcare [5], education [6], occupational safety [7] and tourism [8]) the design of VR applications still entails several challenges. A recent study [9] highlights that there are eight key barriers to face when dealing with the creation of VR applications: i) it is difficult to select the right VR tools, ii) it is difficult to fully exploit online learning resources (understanding the nomenclature, formulating search queries, and finding relevant and up-to-date information), iii) there is a lack of concrete design guidelines and examples, iv) it is difficult to design the physical aspect of immersive experiences, v) it is difficult to plan and simulate motion, vi) it is difficult to design story-driven immersive experiences, vii) there are too many unknowns in development, testing, and debugging, and viii) it is complex to perform user testing and evaluation. Among these challenges, we decided to address the lack of guidelines by focusing on the definition of user attention-oriented design recommendations. In fact, as highlighted in [10], the design of a VR application requires the understanding of how users approach the virtual environment and interact with the virtual contents. These issues become even more relevant when VR applications are employed for safety-related tasks, such as the training of employees in charge of managing emergency situations (e.g., earthquakes, fires, or natural disasters). To prevent trainees from ignoring important information, the virtual application should be designed in such a way that the user’s attention is correctly driven in the virtual environment [11]. For this reason, the aim of this work is to define a set of guidelines for designing a VR application optimized in terms of visual attention.

To enhance user attention during virtual experiences, different issues need to be addressed [12,13,14]. More specifically, the main challenges related to attention guidance in VR application design can be summarized as follows:

  • when using 2D screens, all the visual content is displayed in an area that is smaller than the human field of view. On the other hand, in virtual reality, due to the possibility of exploring a \(360^\circ \) environment, relevant content may be missed by users who are looking in a different direction [12];

  • the distance between the virtual object and the user may prevent the possibility of obtaining clear content information [13];

  • the use of Head Mounted Displays (HMDs) can induce cybersickness [14]. This phenomenon can be defined as a combination of several symptoms such as disorientation, fatigue and dizziness [15]. These symptoms can discourage affected users from using simulators or reduce the efficiency of VR applications.

To address these issues, this work presents a study about factors affecting the virtual experience and the capability of the user to identify relevant information in the virtual world. In addition, an on-purpose designed virtual reality application has been employed during an experimental session involving 36 participants to test the impact of the different factors on the users. To quantify the user’s readiness to detect important information, the Reaction Time (RT) has been employed as a measure of attention. More specifically, during the experimental session, visual stimuli appeared in the virtual environment and the time needed for the users to detect their presence and react to it has been recorded. The research questions addressed in this work can be summarized as:

R1

Which are the factors affecting user attention during an immersive virtual experience?

R2

Is it possible to assess how those factors influence the virtual experience?

R3

Is it possible to define a set of guidelines for designing an optimal VR application in terms of visual attention?

The rest of the paper is organized as follows. In Section 2 related works are reviewed. In Section 3 the factors impacting the user attention in VR applications are presented, and in Section 4 the experimental session definition is described. Then, in Section 5 the results of the experimental tests are analyzed and discussed, and in Section 6 the design guidelines are provided. Finally, in Section 7, the conclusions are drawn and in Section 8 the future research directions are presented.

2 Related works

As stated by Zimmermann et al., practical or cognitive activities are under the control of the attentional system [16]. While exploring an environment, the capacity of the attentional system determines our performance, and inattention and lack of alertness can cause a loss of information. Therefore, the VR application should be designed to capture and guide the user’s attention to the right target and reduce the likelihood of the user not noticing the presence of relevant information [17]. To this aim, several papers in the literature investigated the use of visual cues. In [12], authors propose to use swarm motion to guide the users to relevant content. Similarly, in [18] the authors investigated the impact of a set of visual cues (i.e., arrow signal, radar signal, and auto focus) to guide the user to the main scene while watching a \(360^\circ \) video using a HMD. Differently from these approaches, we propose to optimize the virtual application design to increase the user’s responsiveness to relevant information, without the use of additional visual cues. In more detail, we provide a set of guidelines for the design of a VR application based on the enhancement of user attention. As highlighted in [9], the definition of guidelines for VR application design is one of the challenges that have to be tackled when dealing with VR. Although some works have been presented in the literature for defining such guidelines, they are focused on specific use cases. For instance, in [19] a systematic mapping is used to identify design elements of existing research dedicated to the use of VR in higher education. In more detail, the authors analyze the learning contents, the VR design elements, and the learning theories that can be applied to VR-based learning. Moreover, in [20] a taxonomy of social VR application design choices has been presented. Furthermore, a set of guidelines for VR training for the chemical industry has been presented in [21]. In this work, on the contrary, we propose a generic set of guidelines aiming at enhancing the user’s attention independently from the application scenario. These guidelines can thus be employed for the design of VR applications in several fields, spreading from industrial training [22], to gaming [23], and even to virtual workspace environment design [24].

To measure the attentional performance of a user, a well-known parameter is the RT. It can be defined as the time interval from the onset of a stimulus to the instant in which the user reacts to it. This response involves both signal transmission and neural processing. Therefore, RT can be considered as a relevant index for understanding the speed of brain processing [25]. As detailed in [16], an RT experiment consists in showing to the users simple and easily distinguishable stimuli to which the user reacts by a simple motor response. In [26], the RT is one of the parameters which have been monitored during a continuous visual attention test to evaluate the attention performances of soccer referees and assistants. In [27], the authors analyzed the relationship between the RT, physical activity, physical fitness, and selective attention in children between 10 and 12 years.

In the VR field, RT has been employed in several studies to assess the responsiveness of the users. In [28], the impact of an auditory stimulus on the users’ attention in a \(360^\circ \) visual search task has been analyzed. More specifically, each participant was instructed to detect a salient horizontal line segment in a \(360^\circ \) VR environment. For evaluating the users’ promptness, the time needed for identifying the position of the visual stimulus (i.e., the RT) was recorded. In addition, in [29], a comparison between real-life scenarios and VR for cognitive tasks has been performed. The users’ performances have been measured in terms of an RT-based assessment of perceptual speed. Moreover, in [30], the authors studied how the width and spacing of buttons in virtual environments affect the user-application interaction. The authors provided a set of guidelines concerning the button layout based on a clicking-task experiment in which the clicking position and time were analyzed.

In this work, the RT of the users is the monitored parameter to quantify the suitability of the VR environment design, thus providing useful insights for the guideline definition. Low RTs indicates that the user has effectively become familiar with the virtual environment, while large RTs suggests inadequate adaptation, resulting in unsuccessful VR system design. Similarly to [30], we consider how the content placement influences the user experience but we do not focus on a specific content type, thus providing content-independent guidelines. Moreover, our study is not limited to the enhancement of the virtual world arrangement, but it investigates the different factors which contribute to the overall application effectiveness.

3 Selection of the factors impacting on the user attention

The first part of our work focused on the identification of the factors that impact user attention and responsiveness. According to [27], human responsiveness is affected by several factors which can be classified into two main categories: subject-related and stimulus-related. The former class includes physical condition, experience, motivation, gender, age, and fatigue. The second category comprises stimulus features such as its type, intensity, or duration. When it comes to VR, stimulus-related factors can be further decomposed into operational and technological [31]. Technological factors are related to the characteristics of the HMD, such as optics, display, and ergonomics. The operational factors, on the contrary, are directly related to the virtual environment design. Examples are the required head movement, the environment features, and the duration of the virtual experience.

Although [31] investigates the effects of technological and operational factors on cybersickness, in this work we extend the analysis of the impact of the mentioned factors to the overall application design.

For what concerns subject-related factors, we evaluate the influence of the user age and gender on RT and analyze the incidence of cybersickness. Although this phenomenon is widely recognized as a key factor affecting the virtual experience, its main cause is not clear yet. According to [32], cybersickness symptoms develop due to the lack of an appropriate strategy to control and maintain postural stability within the virtual world. Another cause of cybersickness could be the lack of coherence between the perceived visual stimuli and the user’s physical movements in the virtual environment [33]. Moreover, technical limitations of the display presenting the virtual content (e.g., low refresh rate, visual flicker, and latency delay) contribute to cybersickness symptoms [14].

Concerning stimulus-related factors, we first investigate the operational ones. We analyze the impact of the placement of objects in the VR environment in terms of:

  • angular position, by splitting the user’s vertical Field of View (FoV) in three regions as detailed in Section 4.2;

  • distance with respect to the user, by splitting the scene in background and foreground as detailed in Section 4.2.

Finally, we study the influence of task complexity and duration on visual attention. In fact, the user may get tired both due to the length of the experiment and to the cognitive workload it requires.

Regarding the impact of the technological factors, we focused on the hardware characteristics of the HMD. We analyzed the FoV, resolution, and screen type since they allow to characterize the HMD independently from the virtual reality application, as will be detailed in the following. In this work, we present an objective comparison between different hardware options and provide some guidelines for the selection of the HMD.

The performed analysis provides an answer to research question number one (R1) and the summary of the analyzed factors is reported in Fig. 1.

Fig. 1
figure 1

Factors affecting the virtual experience analyzed in this study

4 Experimental tests

In order to verify the effectiveness of the selected factors in characterizing human attention, a subjective experiment has been designed and carried out.

The technological factors are HMD-dependent and are fixed once the HMD has been selected. The operational factors, on the contrary, vary with the application design. Therefore, we developed an interactive VR application to investigate their impact. Finally, the subject-related factors depend on the users participating in the study. Therefore, we organized an experimental session involving 36 volunteers to analyze their impact.

The experimental session consists of five experiments in which visual stimuli appear in the virtual environment. Since the aim of this work is to assess the impact of the VR application design on user attention independently from the application content, white cubes (\(0.25 \times 0.25 \times 0.25\) meters) have been selected as stimuli. Every experiment includes 30 stimuli which are presented to the participants every 10 seconds for a total test duration of about 25 minutes. To measure the user attention, the participants are requested to detect the stimulus and click on it. We recorded the RTs as the time between the appearance of the stimuli and the moment in which the participant presses the hand-held trigger button while pointing at the stimulus. The pointing direction has been assessed by tracking the controllers so that the trigger clicks were considered valid only if the user pointed toward the target.

4.1 Subject-related factor assessment

As detailed in Section 3, the assessment of the impact of the subject-related factors involved the analysis of the users’ age and gender, and the incidence of cybersickness, as described in the following.

Age and gender

Before starting the experiments the participants were asked to read and sign an informed consent form and to fill out a questionnaire with their general information, such as age, gender, and familiarity with virtual reality. Then, the Snellen test for assessing visual acuity and a color blindness test have been performed. The data provided in the questionnaire have been employed for analyzing the test results based on subject-related factors.

Cybersickness

At the end of the experimental session, each participant was requested to fill the Simulator Sickness Questionnaire (SSQ) which is widely used for the assessment of the severity of perceived cybersickness symptoms [34, 35]. The questionnaire contains 16 items: general discomfort, fatigue, headache, eye strain, difficulty focusing, increased salivation, sweating, nausea, difficulty concentrating, “fullness of the head”, blurred vision, dizzy (eyes open), dizzy (eyes closed), vertigo, stomach awareness, burping. For each item, the participant can rate the severity of the experienced discomfort using a 4-point scale (1 for none, 4 for severe).

4.2 Operational factor assessment

As mentioned in Section 3, the assessment of the impact of the operational factors involved the analysis of the angular position and distance of the stimuli, and the task duration and complexity, as detailed in the following.

Fig. 2
figure 2

Human visual field. Redrawn and adapted from [36, 37]

Stimuli angular position

To analyze the impact of the angular position of the visual stimulus on the RT, we exploited the characteristics of the human visual system to select the region of appearance of the stimuli in the VR environment. A representation of the human visual field is in Fig. 2. The central vision corresponds to the region depicted in orange, the near periphery to the one shown in green, the middle periphery covers the area represented in light blue, and the far periphery corresponds to the portion depicted in dark blue. Moreover, in addition to the division in central vision, near, middle and far periphery another distinction can be made. More specifically, as highlighted in [38], there is an upper limit for eye rotation at +25°. Furthermore, as reported in [38], an head movement of \(\pm \) \(30^\circ \) can be considered comfortable whereas wider rotations can cause discomfort. By considering the upper limit for eye rotation, plus the maximum comfortable head rotation, we defined an upper limit of \(55^\circ \) for the angular interval. In addition, in order to define a symmetrical interval, we also set the lower limit to −55°. To evaluate the impact of the angular position of the stimulus, we decided to split the vertical visual field into three regions encompassing the central FoV, the near periphery, and the middle periphery. It is useful to highlight that the transition between the near and middle periphery also includes the color discrimination boundary [38]. Therefore, instead of considering an abrupt separation in the middle of the interval between \(0^\circ \) and \(\pm \) \(55^\circ \) we introduced a transition region of \(10^\circ \) both for positive and negative angles. In more details, we set the extension of the first FoV portion to \(45^\circ \) going from \(0^\circ \) to ± \(22.5^\circ \), the extension of the second FoV portion to 20° going from \(\pm 22.5\)° to \(\pm 32.5\)°, and the extension of the third portion to 45° going from \(\pm 32.5\)° to \(\pm 55\)°. Therefore, as shown in Fig. 2, the first portion comprises the central FoV and the inner portion of the near periphery, the second represents the transition between the near and middle periphery, and the third represents the outer portion of the middle periphery. A visual representation of the angular portions is provided in Fig. 3.

Stimuli distance

Regarding the distance between the user and visual stimuli, two different conditions have been considered: objects in the foreground and objects in the background. According to [13], the space surrounding a person can be divided in three egocentric regions: the personal space, the action space, and the vista space. This taxonomy is employed also in [39], where the perceptual space is divided into the near, medium, and far fields. The first region extends to about 1.5 meters and represents the area in which the human depth perception is truthful. The medium field (or action space) goes from 1.5 meters to about 30 meters. It represents the region in which the depth perception starts becoming compressed and objects appear closer than they really are. Finally, the last region extends from 30 meters to infinity and represents the area in which the compression effect becomes significant and increases with distance. Based on this, in this work we considered visual stimuli appearing in the near and medium regions.

Task duration and complexity

In order to evaluate the effect of the task duration and complexity, five experiments have been defined. A flowchart of the VR content in terms of task complexity is shown in Fig. 4, and a detailed description is provided in the following.

During the first and second experiments the user was placed in an empty room where stimuli, consisting of white cubes, appeared randomly at different angles in the foreground or in the background. An example is shown in Fig. 5.

Fig. 3
figure 3

Angular location of visual stimuli

Fig. 4
figure 4

Experimental session description

Fig. 5
figure 5

Example of the experimental setup

Such experiments have been performed to study the impact of the visual stimuli location on the RT (in terms of angle and distance) independently from any other factor. In addition, these experiments allow the users to get familiar with the environment and with the task to be performed, thus undergoing a learning effect. In the third experiment, visual distractions have been introduced thus increasing the overall workload. At the beginning of the experiment, the user is placed in an empty room where stimuli appear as in the first two experiments. After 5 stimuli, smoke appears with increasing density. At the \(15^{th}\) stimulus a fire explodes, and at the \(25^{th}\) stimulus an earthquake feeling starts. In more details, lag and high-speed motion of objects have been introduced. By manipulating the positions and velocities of the virtual objects in real-time, the shaking and instability associated with an earthquake have been simulated. These distractions appeared all around the user thus impairing a clear view of the environment and making the task of detecting the stimulus presence more complex. In the fourth experiment, the same design as the first and second experiments has been used. Since this experiment is performed after a demeaning task (i.e., the third experiment), the comparison with the RT recorded during the first two experiments allows to analyze the influence of fatigue on user attention. In more details, the comparison between the first and the fourth experiments allows to compare the situation in which the participants are not experiencing neither the learning effect nor the fatigue effect with a condition in which they are experiencing both. The comparison between the second and the fourth experiment allows to compare a situation in which the participant has learned how to perform the required task but has a low fatigue impact with a condition in which the participant is affected both by learning and fatigue effects. Finally, the fifth experiment shares the design with the first two experiments but introduces auditory distractions. The choice of the two types of distractions has been performed to investigate the impact of an element that tackles the same human sense used for performing the required task (i.e., sight), and an element that interacts with a sense which is not directly involved in the required task (i.e., hearing). A summary of the main inter-experiment comparisons and of the corresponding investigated characteristics is reported in Table 1.

Table 1 Main comparisons performed between experiments and analyzed characteristics

4.3 Training

Before the beginning of the experimental session, all participants undertook a training session of five minutes to get familiar with the headset, the controllers, and the interaction with the VR environment. At the end of the training session, a screen appeared to help the user assume the correct position and orientation before the beginning of the experimental session.

Fig. 6
figure 6

Participant during the visual test (A) and the VR experiment (B)

4.4 Participants

A group of 36 participants took part in the experiments. The ages ranged from 21 to 65 years, with an average of 40 and a standard deviation of 15. The subjects have been drawn from a pool of employees of Leonardo S.p.A. company and students of Roma Tre University. An example of the test environment is provided in Fig. 6. All the tests have been approved by the ethical review board of both the university and the company. The participants neither reported brain injury, seizures, or any other neurological issue, nor severe motor, and auditive disorders. However, 12 participants with visual impairments wore corrective glasses or lenses. Since the data were collected during the Corona-Virus (COVID-19) pandemic, the recommendations of the Italian Ministry of Health have been followed. Participants were screened for their body temperature and maximum one subjective experiment was performed per day. All the equipment was disinfected before and after use. Moreover, disposable masks and gloves were used to cover all surfaces to protect them from contact with skin and hair.

5 Experimental results

In this section, the results concerning the RT are provided. Before processing the recorded data, a denoising procedure has been applied since different types of noise from various sources (such as simultaneous clicks on the same target or hardware delays) may alter the RT signal. More specifically, following the approach employed in [40] for the heart rate signal, the denoising has been performed by thresholding the Discrete Wavelet Transform (DWT) coefficients of the RTs. The tests have been conducted with a total of \(36 \, (participants) \times 30 \, (stimuli) = 1080\) RTs. For the application development we used the Unity Game Development Engine.

5.1 Technological factors impact

When designing a virtual reality application, the choice of the HMD is very important. This component, in fact, represents the interface that allows the user to interact with the virtual environment. Available technologies can be differentiated according to features such as the type of integrated sensors, the FoV, the screen type, the resolution, the presence of wires connecting the HMD to the computer, and the type of user tracking. The importance of the type of integrated sensors, tracking, and portability highly depends on the application. Differently, the FoV, the screen type, and the resolution will influence the virtual experience independently from the specific application needs. Concerning the FoV, the human eyes allow an angular vision of approximately \(180^\circ \). As reported in [41], an horizontal FoV of around 110 degrees has enough slices for fast head movement without the view limitation being visible. The main advantage of a wide FoV is the increase of the sense of immersion, which affects both the emotional and physiological sensations of the user. As for the display type, AMOLED displays are often preferred with respect to LCD displays. In fact, LCD panels require a backlight with a colored mask which might cause eyesight issues. OLED and AMOLED displays, on the contrary, use organic LEDs thus allowing for better color gamut and true blacks. Moreover, LCD displays often show a longer persistence which can result in blurry images produced during rotations. This issue can be avoided by using OLED and AMOLED displays [42]. Concerning the resolution, a screen with a high resolution will guarantee that the image is as sharp as possible. In addition, devices with the correct resolution and screen type can alleviate vision impairment issues that are typical of elderly people [43].

A review of available headsets is provided in Table 2. The complete analysis of technological factors shall include the use of different types of VR hardware and will be the subject of future contributions. In this work, the HTC Vive headset has been selected since it meets the requirements concerning FoV and screen type, and has a high resolution. Moreover, it was one of the earliest premium VR headsets available to consumers. Based on the hardware selection, we used the HTC Vive packages for Unity while developing the application.

Table 2 Technological factors. The visible FoV is indicated as vertical / horizontal / diagonal, and the PPD are indicated as vertical / horizontal

5.2 Operational factors impact

In this section, how the operational factors affect the users’ attention is studied. More specifically, the impact of task complexity and duration is described, and the effects of the visual stimuli position are analyzed.

5.2.1 Effects of the task complexity and duration

First, the RT for each visual stimulus in all experiments have been analyzed. Figure 7 represents the trend of the average RT value for all participants for a given visual stimulus. By computing the average, it is possible to show the impact of the task complexity independently from the influence of human factors. From Fig. 7 it is possible to notice that the first, the second, and the fourth experiment lead to a similar average RT value. On the contrary, due to the introduction of distractions, in the third and fifth experiments the average RT value increases.

Fig. 7
figure 7

Average RT trend for all experiments

Further information can be extracted from the boxplots shown in Fig. 8. More specifically, although the first and the second experiments are the same, the RT variation decreases in the second experiment since the users get familiar with the task. When the distractions are added in the third experiment, the RT increases significantly with respect to the first two cases. As for the fourth experiment, although the median RT is very similar to the first two, the RT variation changes. In more details, the variation is smaller with respect to what occurs in absence of the learning effect (Experiment #1), but it increases due to fatigue with respect to the second experiment (presence of learning effect and absence of fatigue). Finally, in the fifth experiment, both the median and the variation are higher due to the presence of auditory distractions.

Fig. 8
figure 8

Boxplot of the RTs for each experiment. The box extends from the first to the third quartile, the line inside the box indicates the median value, and the whiskers indicate the maximum and minimum values except for points that are determined to be outliers

These considerations are confirmed by the RT distribution shown in Fig. 9. In more details, the Kernel Density Estimation (KDE) shows that the RT distributions for the first, the second, and the fourth experiments are almost centered at the same RT value, while the first and fourth experiments distributions are wider. Moreover, the distributions for the third and fifth experiments are centered at higher RT values, and are wider with respect to the other three. In addition, auditory distractions appear to have a smaller impact with respect to visual ones. The reason could be that users were asked to perform a visual task, thus meaning that visual distractions have a more significant impact than auditory ones. To further confirm this behavior, we report in Table 3 the means and standard deviations of the RTs.

Fig. 9
figure 9

Reaction time distribution for all the experiments

Table 3 Means and standard deviations of the RTs for all the experiments

To gain further insights on the RT trends, linear regression has been performed on the data recorded during each experiment. The results are shown in Fig. 10. The regression line decreases in the first experiment while the user becomes familiar with the task. In addition, the trend is almost constant in the second and fourth experiments, and it increases significantly in the third and fifth experiments.

Fig. 10
figure 10

RT Linear regression for all the experiments. The blue shaded band represents the \(95\%\) confidence interval

To provide a quantitative analysis of the linear regression result, we computed the regression line slope and intercept with the corresponding \(95\%\) confidence intervals. Moreover, we computed the \(R^2\) parameter to provide an indication about the percentage of the variability in the data explained by the selected independent variable. Furthermore we report the p-values for the t-test and the F-test considering as null hypothesis the condition in which there is no relationship between the dependent (RT) and independent (target) variables. The computed values are provided in Table 4. As it can be noticed, except for experiment #4, the p-values are smaller than 0.05 so that the null hypothesis is rejected.

Table 4 Regression line parameters and specifications, slope and intercept are reported with the corresponding \(95\%\) confidence intervals

Furthermore, we performed the Kruskal-Wallis test to verify if the differences between the RTs recorded for the five experiments are significantly different. The test confirmed that there is a statistically significant difference between the experiments, showing a p-value smaller than 0.01. Finally, we evaluated the effect size for the performed statistical test to prove the practical effectiveness of the presented experimental results. For the Kruskal-Wallis test, we computed the effect size as [44]:

$$\begin{aligned} \eta ^2_H = \frac{H-k+1}{n-k}, \end{aligned}$$
(1)

where H is the Kruskal-Wallis output statistic, n is the total number of observations and k is the number of groups. The interpretation values commonly employed are: \(\eta ^2_H < 0.06\) (small effect), \(\eta ^2_H\) between 0.06 and 0.14 (moderate effect), and \(\eta ^2_H \ge 0.14\) (large effect). We obtained \(\eta ^2_H=0.71\), thus indicating a large effect. The statistical results of the Kruskal-Wallis test are reported in Table 5.

Table 5 Effect of task complexity and duration: Kruskal-wallis test results

To further investigate this difference, we performed a Tukey HSD post-hoc test. The result is presented in Fig. 11. It is useful to notice that the rank means are reported on the x axis since the Kruskal-Wallis test uses ranks of the data.

Fig. 11
figure 11

Output of the Tukey HSD post-hoc test for the RTs recorded for the different experiments. The filled dots represent the mean RT rank for each experiment, and the line represents the comparison interval corresponding to the 0.05 significance level. Different colors are used to indicate the RT groups which are significantly different

In the figure, the filled dots represent the mean RT rank for each experiment and the line represents the comparison interval corresponding to the 0.05 significance level. Two RT rank means can be considered significantly different if their intervals are disjoint, while they are not significantly different if their intervals overlap. We reported with different colors the RT groups which are significantly different according to the Tukey HSD post-hoc test. As can be clearly noticed, the experiments including the distractions, i.e., the third and fifth experiments, are significantly different from the ones without distractions, i.e., the first, the second, and the fourth experiments. In addition, the RTs recorded during the experiments with the two types of distractions did not allow to reject the null hypothesis. The same occurred for the RTs recorded during the experiments without distractions. We report the p-values for the different one-paired comparison tests in Table 6 under the null hypothesis that the corresponding mean difference is null.

Table 6 Results of the one-paired comparison tests for the experiments. The asterisks indicate statistical significance

5.2.2 Effects of the placement of the visual stimulus

Figure 12 shows the average RT for the considered visual field angular portions and the two distance ranges. In the figure, the angular portion ranging from 0° to ±22.5°, which includes the central FoV and the inner portion of the near periphery, is referred to as “Central/Near”, the angular portion ranging from ±22.5° to ±32.5°, which represents the transition between the near and middle periphery, is referred to as “Near/Middle”, and the portion ranging from ±32.5° to ±55°, which represents the outer portion of the middle periphery, is referred to as “Middle”. Going from the foreground scenario to the background one, the RT behavior is opposite for targets placed in the central and near periphery of the human visual field, and for the ones placed in the outer portion of the middle periphery. As for targets placed at the transition between near and middle periphery, the RT is not influenced by the distance between the user and the target. These results can be explained by considering the actions a user has to perform before the RT is recorded. In fact, the measured time interval goes from the cube appearance to the click performed by the user on the controller while pointing at the stimulus. Let us consider background objects first, and define a user rest position as the one in which the user stands with the arms along the body. Background objects appear at a depth larger than 1.5 meters from the headset so that for pointing at the stimulus the user’s arm is always outstretched, and the clicking procedure requires only shoulder extension and flexion movements. This holds whether the user’s movement is from the rest position to the pointing direction or directly from one target to the next. For this reason, the reaction time has an increasing trend going from the central visual field and the near periphery towards the middle periphery. When objects are displayed in the foreground, on the other hand, the type of movements the user has to perform is different. In more detail, the user’s movement could involve extension and flexion of the elbow in addition to extension and flexion of the shoulder. This holds whether the user’s movement is from the rest position to the pointing direction or if he/she suddenly moves from one target to the next. This phenomenon is more frequent when the depth of the visual stimulus is reduced. Furthermore, given a depth value, the actual distance between the user and the target is smaller for objects in the central visual field or in the near periphery and increases with the visual field angle. For this reason, the recorded RTs show a decreasing trend for objects in the foreground.

Fig. 12
figure 12

RT trends for foreground and background objects in the analyzed portions of the visual field

5.3 Individual factor impact

In this section, the impact of individual factors on the users’ promptness is analyzed. More specifically, both the effects of user age and gender and the incidence of cybersickness are detailed.

Fig. 13
figure 13

Reaction time for each experiment per gender

5.3.1 User gender and age

The mean RT results in terms of user gender are provided in Fig. 13. Among the 36 participants involved in the tests, 12 were women and 24 were men. Figure 13 suggests that the RTs is not influenced by the user gender. In fact, for experiments one, two, and three the results are almost identical for men and women. In presence of distractions, on the contrary, a slight difference can be noticed. Interestingly, the RT recorded for male participants is higher in presence of visual distractions whereas it is smaller when auditory distractions are added. This phenomenon may require further investigations with a larger and more balanced sample for the experiments.

To further verify the impact of gender on RT, we performed the Mann-Whitney U-test. The test failed in rejecting the null hypothesis with a p-value of 0.80, thus showing that, based on the current sample, it is not possible to detect a significant difference between women and men for the required task. Moreover, we evaluated the effect size. For the Mann-Whitney U-test, we computed the effect size as the rank-biserial correlation [45]:

$$\begin{aligned} r = \frac{2U_1}{n_1\times n_2}-1, \end{aligned}$$
(2)

where \(U_1\) is the Mann-Whitney statistic for the first group, and \(n_1\) and \(n_2\) are the number of observations for groups 1 and 2, respectively. The effect size is considered small for \(r<0.10\), medium for r between 0.30 and 0.50, and large for \(r>0.50\) [46]. We obtained an effect size of 0.53. This result may indicate that a difference between the RTs for different genders exists, but the analyzed sample is too small to highlight it. We report the statistical results of the Mann-Whitney U-test in Table 7.

Table 7 Effect of gender difference: Mann-Whitney U-test results
Fig. 14
figure 14

Reaction time for each experiment per age

Finally, we performed a right-tailed Mann-Whitney U-test under the null hypothesis that the medians for men during experiment # 3 (visual distractions) and experiment #5 (auditory distractions) are equal, against the alternative hypothesis that the former is larger than the latter. We obtained a p-value of 0.0002 thus rejecting the null hypothesis.

Concerning the impact of the user age, three age intervals have been defined:

  • younger than 30 years old (i.e, \(< 30\) years old);

  • between 30 and 50 years old (i.e, \(\ge 30\) and \(< 50\) years old);

  • 50 years old or older.

Among the 36 participants, 14 belonged to the first class, 10 belonged to the second class, and 12 belonged to the third class. The obtained mean results for each experiment are shown in Fig. 14. The bar plot shows that, for the majority of the experiments, lower ages correspond to a smaller RTs. An exception is represented by the fifth experiment. As shown in Fig. 14, in fact, people younger than 50 years perform slightly better with respect to people younger than 30 years.

Moreover, we performed the Kruskal-Wallis test to verify if the difference between the RTs recorded for people belonging to the considered age intervals is statistically significant. The test confirmed that there is a statistically significant difference between the age intervals, showing a p-value smaller than 0.01. Finally, also in this case we evaluated the effect size obtaining \(\eta ^2_H=0.09\), thus showing a moderate effect. We provide the statistical results of the Kruskal-Wallis test in Table 8.

Table 8 Effect of age difference: Kruskal-wallis test results

To further investigate on this difference, we performed a Tukey HSD post-hoc test. The result is presented in Fig. 15.

Fig. 15
figure 15

Output of the Tukey HSD post-hoc test for the RTs recorded for the different age intervals. The filled dots represent the mean RT rank for each age group, and the line represents the comparison interval corresponding to the 0.05 significance level. Different colors are used to indicate the RT groups which are significantly different

In the figure, the filled dots represent the mean RT rank for each age interval and the line represents the comparison interval corresponding to the 0.05 significance level. As clearly shown from Fig. 15, people belonging to the third age interval (i.e., 50 years old or older) show a significantly different behavior with respect to younger participants. Moreover, to investigate the slight difference observed in Fig. 14 in the fifth experiment between people younger than 30 years and people between 30 and 50 years old, we performed a left-tailed Mann-Whitney U-test. More specifically, the test has been performed under the null hypothesis that the medians for the two age classes during experiment #5 are equal, against the alternative hypothesis that people between 30 and 50 years old have smaller RTs with respect to people younger than 30 years. We obtained a p-value of 0.215 thus indicating that it is not possible to reject the null hypothesis. We report the p-values for the different one-paired comparison tests in Table 9 under the null hypothesis that the corresponding mean difference is null.

5.3.2 Cybersickness

According to Rebenitsch et al. [47], the likelihood of cybersickness occurrence in virtual environments ranges from 30% to 80%. The SSQ performed after the experiments provided the results reported in Table 10. As can be noticed, the majority of the participants did not experience cybersickness. To quantify the presence of cybersickness symptoms we report in the last column the percentages of participants who developed cybersickness symptoms for each category of the SSQ, considering from slight to severe effects. The results show that blurred vision and fatigue occurred more frequently than the other symptoms. The first issue is probably due to the fact that, having performed the tests during the COVID-19 pandemic, the users wore a face mask which contributed to the lens tarnishing. As for fatigue, its main cause may be the length of the experimental session. In addition, the fact that the user had the ability to control the virtual objects [48] and the reduced walking interaction [49] may account for the reduced percentage of users experiencing cybersickness in this context compared to the estimates provided in [47].

Table 9 Results of the one-paired comparison tests for the age groups. The asterisks indicate statistical significance
Table 10 SSQ results for the 36 participants, expressed as percentages

6 Discussion and VR design guidelines

The performed study allows the definition of a set of guidelines for the design of VR applications, thus answering research question number three (R3). Differently from the approaches proposed in the literature, we do not focus on a specific application scenario but propose a generic set of guidelines aiming at enhancing the user’s attention. Moreover, instead of relying on additional visual cues [12, 18], we defined a set of experiments in order to understand how the application design itself can be adapted for increasing user responsiveness, thus proposing an attention-driven application design.

The first step of the application design consists in the definition of the virtual environment itself and of the object location taking into account the operational factors. Concerning the objects in the foreground, according to the results presented in Section 5.2.2, the central portion of the FoV and the near periphery should be avoided. In fact, reduced RTs have been recorded for objects located at angles larger than \(\pm 22.5^\circ \). As for the background objects, on the contrary, the central portion of the FoV and the near/middle periphery in the range \(\pm 32.5^\circ \) should be preferred.

Concerning the task complexity and the task duration, the results presented in Section 5.2.1 highlight that at the beginning the RT decreases with time since the user gets familiar with the task. This indicates that a training period can be useful for the user to fully understand the tasks he/she needs to perform. In addition, as the fourth experiment highlighted, although the mean reaction time is related to the current task, its variation depends also on the user’s fatigue. For this reason, relevant tasks should not be presented immediately after demanding assignments. In addition, both visual and auditory distractions impact the user’s attention. Therefore, the choice of adding distractions depends on the application scope. When employed for safety-related training, the virtual environment should be as simple as possible, containing only the required information. For gaming purposes, on the contrary, different amounts and types of distractions can be added for defining increasing game levels.

In addition, the application design should take into account the targeted end users features. The individual factors analyzed in Section 5.3.1 show that a significant influence of age on the users’ attentional response can be assessed, while the same does not occur for gender. More specifically, the responsiveness of younger users tends to be higher. Therefore, the time interval for displaying relevant information should be properly adjusted depending on the expected age of the end users. In addition, independently from the user’s age, cybersickness should be avoided. The results presented in Section 5.3.2 suggest that the incidence of sickness symptoms may be reduced by providing the user sufficient control of the virtual environment and, when possible, by diminishing the walking interaction. Moreover, fatigue appears to be one of the predominant cybersickness effect. Therefore, the task complexity and duration should be properly adjusted taking this element into account. However, it is important to notice that cybersickness issues go beyond individual factors, and are also linked to technological features like system latency. Therefore, as discussed in Section 5.1, technological factors play an important role in the overall application design. In this direction, although a single HMD has been employed, the participants provided useful feedback. More specifically, some of them reported discomfort in using the HMD while wearing glasses, found the employed HMD heavy, and found the choice of a wired HMD a limitation.

A summary of the guidelines developed based on the performed tests is provided in Table 11. The presented study highlights that for gaining further insights on the optimal VR application design, additional trials are needed. More specifically, it is essential to collect data from a wide sample of end users having different backgrounds, and expertise with VR, gender, and age. Moreover, the practical impact of technological factors needs to be investigated.

Table 11 Individual, technological and operational factors impact

7 Conclusions

In this paper, a set of guidelines for VR application design has been presented. In more details, we analyzed individual, operational, and technological factors impacting the user’s attention during a virtual application experience and defined an experimental session to analyze their influence on the user promptness. For this purpose, we recorded data RT from a sample of 36 participants and analyzed the collected information. The study highlights that age, cybersickness incidence, task duration and complexity, and the positioning of virtual objects in the space highly influence the user responsiveness in VR. More specifically, statistical tests proved that the presence of distractions causes significantly different reaction times with respect to the case of no distractions, and that users belonging to different age intervals have significantly different behaviors. In addition, guidelines concerning the choice of the VR hardware have been provided.

8 Future works

This work represents a first proof of concept for the definition of guidelines for the design of VR applications based on the enhancement of user attention. In this direction, several additional studies can be conducted. First of all, for this first work, we decided to use a white cube of fixed dimensions as stimulus for the guideline definition. Further studies will be dedicated to the evaluation of the impact of the stimulus content (i.e., color, texture, shape) and size on the users’ responsiveness. Moreover, we will analyze how the visual background impacts on user attention. Depending on the saliency of the background environment, in fact, the user may get distracted from the principal application task. Furthermore, future studies will include the collection of a larger dataset and the test of different hardware technologies to provide more insights for optimizing VR application design in terms of user attention.