1 Introduction

Agent-based crowd simulations are widely used for modelling building and space usage. They allow designers to make predictions about hypothetical real-world scenarios, such as day-to-day use of buildings or crowd behaviours at large events (Johansson et al. 2012; Tang et al. 2017); they are also used for training and planning of extraordinary events, particularly evacuations (Helbing et al. 2002; Pelechano et al. 2008a, b; Zheng et al. 2009).

The embedding of human participants directly into large-crowd simulations has considerable potential to generate richer and more informative outcomes. For example, the effects of deploying fire marshalls on evacuation times could be assessed, or the impact of staff training could be evaluated. However, the effectiveness of such applications is dependent on the ability of the simulation to generate the same kind of experiences and behaviours as one might experience in equivalent real-world scenarios. Existing work (for example, Filingeri et al. 2017) has identified and described common themes in typical crowds. These include negative responses to feelings of being overcrowded such as lack of personal space, inability move easily, and anxiety: these factors can be seen to contribute to negative outcomes (such as panic) in extreme situations. The aim of our work is to determine whether such experiences can be reproduced in VR simulations and to what extent they are dependent on crowd density, in order to warrant the use of such simulations with embedded human participants for training, evacuation simulations, or similar scenarios.

Whilst some existing work has considered the use of crowd simulations in virtual reality (VR) settings, it has generally focused on three areas: firstly, the validation of simulation models by a human observer (Ahn et al. 2012; Kim et al. 2016; Pelechano et al. 2008a, b; Rojas and Yang 2013; Rojas et al. 2014); secondly, investigations of additional human–agent interactions, such as gestures and gaze (Kyriakou et al. 2016; Narang et al. 2016); and finally, user response to agent proximity (proxemics). Proxemics is of most relevance to our work, but existing investigations use single agents or objects, or small groups in predefined configurations, rather than full crowd simulation models (Bruneau et al. 2015; Llobera et al. 2010; Wilcox et al. 2006; Sanz et al. 2015), and also use physiological measures of arousal rather than measures of affect.

1.1 Objectives and contributions

Our work builds on previous studies to investigate the effect of immersion in a full crowd simulation on user affect and behaviour, using VR. High density in real crowds is known to create negative experiences such as stress, frustration, and discomfort (Filingeri et al. 2017); however, existing work has not directly considered whether such effects might be replicated in VR simulations. We approach VR crowd simulation from this perspective. We use a social forces simulation, adapted from Helbing and Molnar (1995), and insert a human participant, using the HTC Vive platform. Participants undertake a routine manual task, navigate by walking naturally, and human–agent interactions are mediated by the same social force model. Our objectives are:

  • To examine the effects of varying agent density in the simulation on participants’ affective state. We hypothesise that users will experience higher levels of negative affect with increasing agent density.

  • To investigate the effect of varying crowd density on participants’ behaviour. Previous works (e.g. Pelechano et al. 2008a, b) have anecdotally reported observations of gestures and behaviours associated with negative experiences or discomfort, such as perceived collisions or avoidance. We aim to quantify and report the prevalence of such behaviours at varying crowd density, as we believe that this gives additional insight into user experience and may help guide the design and implementation of future VR-based crowd simulations.

  • We further aim to quantify trajectory data across different densities, to support our findings.

Previous works on proxemics in VR have used a mixture of physiological measures such as electrodermal activity (EDA), bespoke questionnaires, and trajectory data to investigate user response. EDA is an established measure of arousal, but is also sensitive to users’ physical movement (Boucsein 1992; Dawson et al. 2007), making it unsuitable for studies like ours (in which users navigate by walking and perform manual tasks). Moreover, we are interested in assessing affective state, rather than arousal, as a measure of user experience. To this end, we propose the use of the positive and negative affect schedule (PANAS) self-report scale (Watson et al. 1988) as a measure of user experience. The PANAS scale is widely used in experimental psychology, and comprises two ten-point scales which assess the subject’s affective state (mood) on both positive and negative dimensions. The questionnaire comprises a number of words which describe emotions (e.g. Excited, Hostile), and participants rate to what extent they feel each of those. Reliability and validity have been reported as good (Watson et al. 1988). PANAS is also in games user research: previous work has established a relationship between emotions, user experience, and physiological measures, further warranting the use of validated self-report measures where EDA cannot be used reliably. However, we also propose PANAS (as an established measure of affect rather than arousal) as a descriptive and meaningful measure of experience. Despite widespread use in other contexts, only a very small number of VR studies have used the PANAS scale (e.g. Zibrek et al. 2018) to investigate user affect; no previous works have used PANAS to study VR crowd simulations, and none have so far investigated user response to varying crowd density in VR.

The outcomes from our study comprise the following contributions to the study of VR crowd simulation.

  • We use the PANAS self-report measure to evaluate the effect of density on participants’ affective state, while undertaking a simple manual task within the simulation. We demonstrate a significant increase in negative affect between low- and high-density conditions.

  • We use video data collected from participants to show significant changes in participants’ behaviour between density conditions, particularly regarding the use of gestures and reactive behaviours associated with observation and avoidance.

  • We have collected trajectory data from the simulation describing user speed and proximity to agents, which varies across different crowd densities. We use this data, together and post hoc interviews with participants, to support our findings.

The remainder of our paper is organised as follows. We present a summary of previous related work in Sect. 2, focusing on recent results which warrant our own study. We proceed to describe our simulation, experimental set-up in Sect. 3, and methodology in Sect. 4. We present our experimental results in Sect. 5, and conclude with a discussion focusing on participants’ interactions within the simulation, relationships to previous work, application areas, and consideration for future work.

2 Previous work

A considerable body of previous work exists in the field of crowd simulation. The use of VR as a platform is not a new concept (for example, see Ulicny and Thalmann 2001); however, the technology to properly realise such systems has only recently become widely available. Consequently, most related experimental work has been conducted relatively recently and falls broadly into the following three themes.

2.1 Validation of simulation models using VR

The validation and comparison of crowd simulations is an active area of research. Some recent work has considered the use of human observation in VR as an objective validation method. For example, work by Pelechano et al. (2008a, b) is related to our own: they used presence as a metric to compare simulation methods (including social forces). Evaluation was conducted using a bespoke questionnaire and qualitative descriptions of video recordings. VR was also used by Rojas and Yang (2013) and Rojas et al. (2014) to study small group formations, and Ahn et al. (2012) used a similar method to compare different collision avoidance algorithms in a street scene. In this case, the user remained stationary, and observed the scene using a CAVE environment. Recently, Kim et al. (2016) used both 2D screens and HMDs to rate the similarity of simulations to videos of real scenes. Model validation using VR represents an ongoing theme in crowd simulation research, and a recent summary and discussion is provided by Pelechano and Allbecky (2016).

2.2 User experience and human–agent interactions

Investigations into user experiences with VR displays are still being made. For example, Buttussi and Chittaro (2018) recently compared VR headsets with 2D displays; they used the IPQ (Schubert et al. 2001) to demonstrate in significantly increased sense of presence when using a high-fidelity headset, in an evacuation simulation. Similarly, work by Hupont et al. (2015) indicates an increased sense of presence when using a HMD for a forklift simulator.

A number of researchers have sought to either use virtual agents to enhance users’ immersion or have added additional interactions such as gaze to increase users’ sense of presence. Garau et al. (2005) demonstrated that user response to virtual characters in VR is to some degree mediated by normal social behaviours; participants experienced a higher sense of personal contact when agents were responsive to them. Narang et al. (2016) adapted the bespoke questionnaire used by Garau et al. in their recent study of agent gaze and showed that participants had a preference for agents who made visual contact with them. Kyriakou et al. (2015, 2016) investigated how collision avoidance and other interactions (gaze and salutations) affected user experience. They found that collision avoidance improves the user’s reported sense of realism, whilst additional behavioural features increased presence. Sohre et al. (2017) also found that lack of collision avoidance decreased users’ sense of comfort while walking through a stream of agents.

2.3 Proxemics

A number of researchers have studied participants’ response to the perceived physical proximity of virtual agents in VR environments. Wilcox et al. (2006) placed virtual characters into participant’s personal space, using a stereographic display, and guided by the well-known personal proximity zones defined by Hall (1963). Electrodermal activity (EDA) was used as a quantifiable metric, which demonstrated that participants’ arousal response to virtual agents was comparable to their responses to real people. Llobera et al. (2010) performed related experiments using a VR HMD and virtual characters which approached the stationary participant in groups of one to four agents. Again, EDA data were collected and showed a similar arousal response to close proximity. The form of the characters did not affect response (representations of inanimate objects were also used). Christou et al. (2015) performed a similar study: participants remained stationary in a CAVE environment, and groups of six agents were used, either stopping at predefined distances (similar to Llobera et al.) or passing with the same minimum distances. Similar results were achieved using EDA, and a memory test was also used: cognitive performance was found to be reduced under close proximity.

Bruneau et al. (2015) investigated group avoidance behaviour in a VR CAVE environment. Participants were able to adapt their distance from a group of agents, by choosing a trajectory to navigate between two points (crossing the group’s path). Participants used a joystick control to navigate, and trajectories were recorded and used to build an energy-based avoidance behaviour model. A small number of previous studies have studied behaviour of walking participants in VR environments, using static obstructions. Gérin-Lajoie et al. (2008) used a static cylinder obstruction to compare avoidance distances of walking pedestrians in both real and virtual reality conditions. They used an HMD for the virtual environment and found that participants allowed slightly larger distances as compared to the real obstruction. Similar results were also reported by Fink et al. (2007). More recently, Sanz et al. (2015) similarly compared clearance distances in virtual environments using a stereoscopic immersive projection space. They also found increased distance in the virtual space and increased distance when a static anthropomorphic representation was used, rather than an inanimate object.

These studies demonstrate that users respond to the physical proximity of virtual characters in ways which are comparable to responses to real people. However, existing work is limited to specific contexts and interpretations. Wilcox et al., Llobera et al., and Christou et al. all used stationary participants and restricted agent interactions. Bruneau et al. allowed participants to move, but again only in a limited way, using a joystick control, in discrete scenarios. The small number of studies using walking participants was limited to single static obstructions. This theme is most closely related to our own, and motivates our examination of user response to varying crowd density. We assert that the successful integration of human and virtual agents within real simulations warrants study of human participants in open scenarios, using naturalistic locomotion methods, motivates our experimental design, presented in Sect. 4. Furthermore, no existing work has directly measured affective state; we believe that this is an important dimension for understanding user experience and response, which compliments other measures, and uses this in our own work.

3 Simulation model

We implemented our simulation using Unreal Engine 4 (UE4) and the HTC Vive VR platform. Sample screen images (taken in 2D) are shown in Fig. 1. Users held a pair of controllers, which were tracked and appeared in the simulation as a pair of hands. The HMD was also tracked, so that the user could move around the scene by walking: this was the only locomotion method available.

Fig. 1
figure 1

Screenshots of the simulation. From top to bottom: low-, medium-, and high-density conditions

A larger number of models have been proposed for agent-based crowd simulation. Many of these are based on the well-known and popular social forces model (SFM) by Helbing and Molnar (1995); for example, researchers have recently used this model directly to model evacuation scenarios (Sticco et al. 2017) and with adaptations for specific situations (e.g. Li et al. 2017). Adaptations have been proposed to improve model performance (e.g. weighting forward motion, and adding groups, Farina et al. 2017). It has also been used in some previous investigations of VR simulations (Pelechano et al. 2008a, b). We have therefore chosen to use a social forces simulation based on Helbing and Molnar (1995) to create our simulation, as a baseline for user experience which can inform researchers and practitioners currently seeking to integrate VR with existing crowd simulations or those looking to design new experiences, with a variety of application areas and models.

We used the UE4 physics simulation (PhysX 3.3) to apply the model forces: we do not use explicit collision detection, as overlaps are handled by component forces of the model. We parameterised the model by hand, using test sequences observed in VR by two of the authors. These are shown in Table 1 and relate to the force equations as described by Pelechano et al. (2008a, b). The model was parameterised to produce walking speeds close to those of real pedestrians. We took samples of agent walking speeds during our experiments, which are reported in Sect. 5, and show little variation across conditions.

Table 1 Social force model parameters, used for simulation in unreal engine 4

Previous work (e.g. Karamouzas et al. 2009) has noted that collision avoidance in social force models can result in late, high-curvature adjustments to trajectories. In our initial tests, direct head-on approaches between agents occasionally resulted in near-collisions and visible model overlap in bidirectional traffic. To mitigate this, we created early adjustments to head-on collision trajectories: in the case that two agents are travelling directly towards each other, we applied small additional and opposing impulses to each, perpendicular to their direction of travel.

Finally, we modified the force model to successfully incorporate the human participant as an active agent in the simulation. We note that repulsive forces between agents are symmetric and are tuned to produce mutual deflections which are sufficient to avoid collisions in most cases. However, no repulsive force is experienced by a human participant: in the case of human–agent interactions, the deflections are less, and more apparent collisions (or overlaps) occur. To avoid this, we use an asymmetric force similar to that of Curtis et al. (2013), so that agent forces are scaled when interacting with a human participant. We used a scaling factor of 2.0 in our experiments, which we set by trial and error during testing. This was sufficient to reduce collisions/overlaps to a comparable level. In all other respects, agents responded to the human participant in the same way as to other agents.

3.1 Other interactions

Previous works (Kyriakou et al. 2016; Narang et al. 2016) have shown that agent–human interactions such as gaze, salutations, and vocal interactions can increase users’ sense of presence and realism in VR. In our case, we are interested primarily in users response to the crowd simulation model itself. Furthermore, no existing works have investigated the effect of such interactions in high-density simulations, and we consider that they may not be consistent across different crowd densities; for example, agent gaze may increase presence in sparse interactions, but may be unnerving in denser crowds. We therefore omitted such interactions in our experiments, so that differences in user experience across conditions were produced only by agent behaviours generated by the simulation model, participant interactions with the agents, and differences in density. We suggest that the additional effects of agent gaze (for example) could be taken up as future work.

3.2 Experimental set-up

Agents in the simulation were represented by animated models, created using Adobe Fuse™ and Mixamo™. We used 20 different models to create a heterogeneous crowd, though some repetition of characters was evident. Our scene reproduced the ground floor of a real campus building, constructed from CAD drawings. The area is a busy thoroughfare, with lots of bidirectional pedestrian traffic, and society stalls with staff interacting with passers-by.

Virtual characters were spawned at a set frequency at each end of the thoroughfare (one every 8 s at low density, 4 s at medium, and 1.75 s at high density) and move to a point at the opposite end, beyond the view of the participant. This appears as a naturalistic flow of pedestrians. A navigation mesh was used to generate way-points, and we varied the spawn rate for each of our experimental conditions (values reported in Sect. 5).

We constructed a task-based scenario for our study. In the centre of the scene, we defined an area of approximately 14.6 m2, bounded by three tables: this area corresponds to the walkable area of the HTC Vive VR environment, such that the participant can walk between the tables in the virtual scene. A top-down view of the area is shown in Fig. 2. The markers A, B, and C show the table positions, and virtual agents traverse the area, with bidirectional pedestrian flow moving left–right in the direction indicated by the arrows. Agents moved across the full width of the area during simulation.

Fig. 2
figure 2

The task area for our user study

Participants were asked to move objects (bottles and cans) between the three tables, such that all cans were moved to Table C, and all bottles to Table B. This involved making a minimum of 14 distinct journeys perpendicular to the direction of agent traffic flow and 7 parallel journeys. The participant’s virtual hands could be used to pick up a single object at a time, but not to interact with agents. A circular area on the table surface changed colour to indicate completion of the task. Figure 3 shows an example trajectory map for a participant while completing this task. The task is representative of actual tasks which are undertaken in the real-world space on which this virtual environment is modelled (drinks and food are often served from tables). Furthermore, we designed the activity intentionally to create interactions where participants walk both perpendicular and parallel to the flow of agents.

Fig. 3
figure 3

Trajectory map for a participant completing the task. Green lines represent agent trajectories, and red lines the participant. The area shown represents a space of approximately 15 m × 7.5 m

4 Experimental methodology

The experiment included three conditions, featuring varying degrees of crowd density. Density levels were set using by controlling character spawn rate, and measured density estimates in the area occupied by the player are reported in Sect. 5. Screenshots from each condition are shown in Fig. 1.

4.1 Participants and procedure

The study was carried out in the research spaces of the University of Lincoln and approved by the institution’s ethics board, following prescribed health and safety requirements for the use of VR equipment. The simulation was presented simultaneously on the HTC Vive headset and on a projection screen to facilitate video recording. Twenty-five participants (12 female, 13 male, mean age = 31.0, SD = 10.3) participated in the study. 14 participants had previously used VR equipment. The HTC Vive HMD is a wired device, and we took precautions to eliminate interference of the wiring with the participant while conducting their task. The wire was suspended from the ceiling, and the investigator also lifted the wire as required to enable smooth movement while walking. Participants reported post hoc that they were not aware of investigator intervention. Our experimental set-up is shown in Fig. 4.

Fig. 4
figure 4

Our experimental set-up

Participants gave informed consent and were reminded that they could stop the simulation if they felt uncomfortable. They completed a demographics questionnaire and training sequence in which they completed the task required in the conditions, but with no agents present. Participants were then given the opportunity to practise the main task before starting the experimental conditions. We assessed participants affective state before and after the training session. After training, participants entered the first, and then completed another questionnaire to assess their affective state. This process was repeated for all three conditions. The mean duration of trials ranged from 132 s in low density to 148 s in high density. At the end of the study, we conducted a semi-structured exit interview that explored their experience. The study followed a within-subjects design, and conditions were counterbalanced using a Latin square to avoid ordering effects. All sessions were video-recorded to allow for post hoc analysis of participant behaviour.

4.2 Measures

The experiments included a range of dependent measures including logging of trajectory data within the simulation, participant responses to validated scales, and video data of participant behaviour. We use the positive affect negative affect schedule (PANAS) (Watson et al. 1988) as a validated self-report measure of user affect. PANAS has not be previously used for VR-based crowd simulation studies, but is widely used in related fields such as games user research; the scale comprises two dimensions: negative and positive. Based on previous works, which report a negative response to close proximity, we conjecture that discomfort due to crowding will be most evident in the negative dimension, and adopt the null hypothesis: H0 = there is no measurable difference in negative affect associated with changes in density of a virtual crowd.

We gathered trajectory data from participants: positions and orientations of the participant and agents were logged approximately 30 times/s and time-stamped, along with other events (for example, picking up objects). We post-processed this date in order to extract two metrics. We wish to estimate to what extent participants were impeded and so computed their mean walking speeds when crossing the walkable area: we defined a central area (with boundaries 0.5 m within the walkable area) and computed the speed for each visit into this area (in most cases this correlates to crossing the flow of virtual characters). We computed mean speeds for each participant, for each condition. To connect our work with existing work on proxemics, we also used this data to compute the mean minimum distance to an agent, for each condition. To this end, we took sample points at regular time intervals, and at each sample we computed the distance from the participant to the nearest agent. We were thus able to compute a mean distance to the nearest agent for each participant, across each condition.

We also wished to conduct a more detailed analysis of player behaviour, to quantify patterns of common behaviour, and also to further examine anecdotal accounts of participant behaviour from previous works (e.g. Pelechano et al. 2008a, b; Narang et al. 2016). Accordingly, we made video recordings of participant behaviour during all conditions. Initial coding schemes were developed through independent iterations by two of the investigators. These focused on aspects such as: different participant actions at distance (e.g. looking, stopping, or changing direction); reactions to close proximity of agents (e.g. side-stepping or reactive hand gestures); participants verbal reactions to the simulation; and operational artefacts, such as equipment adjustments. The coding scheme was developed using a subset of pilot data (five users) collected before the main study. The scheme was refined through iterative coding and comparison of individual videos, by the two investigators, guided by experiences reported by DeCuir-Gunby et al. (2011). Differences in individual interpretations were discussed and resolved through discussions; instances of potential ambiguity were identified and informal guidelines were agreed. (Although we note that in a some cases, some subjectivity in interpretation of actions remains.) The full data set was then coded by the first investigator.

The post hoc interviews explored participants’ perceptions of the conditions, feelings about real crowds, and to what extent they found the simulated behaviour realistic. We also enquired about interactions with the virtual agents, and whether participants found it easy or hard to move around. Interview results are reported to support quantitative findings.

5 Results

In this section, we present quantitative results from the questionnaires, trajectory, and video data, and we provide qualitative data to further support our findings. Crowd density and speed measures were calculated post hoc from trajectory data. For density, the walkable area was sampled 20 times for each participant, for each condition, and the number of agents counted (500 samples per condition). A density measurement was computed, and the mean and standard variations are reported in Table 2. The highest density level corresponds to approximately one agent per 2 m2, which can be challenging to negotiate at full walking speed in bidirectional traffic.

Table 2 Density and speed of virtual agents for each condition, with corresponding spawn intervals

5.1 Participants’ response to the simulation

Summary statistics of the results collected from each condition using the PANAS self-report questionnaires are presented in Table 3, with corresponding box plots shown in Fig. 5. The high-density condition showed higher mean values than the medium or low conditions, on the negative dimension, and a lower mean on the positive dimension. Our data showed significant deviation from the normal distribution, so we selected the Friedman test to examine the statistical significance of our results, which showed significant effects on the negative dimension χ2(2) = 13.975, p = 0.001. Wilcoxon signed-rank tests were used post hoc, with a significance level set at p  < 0.05 (0.017 with Bonferroni correction). Statistically significant increases in negative affect were identified between high and medium density conditions (Z = − 2.842, p = 0.004) and high and low density conditions (Z = − 2.725, p = 0.006), establishing an increase in reported negative affect at the higher crowd density. There was no significant difference between the low and medium conditions (Z = − 1.393, p = 0.163). We accordingly reject the null hypothesis H0. The Wilcoxon signed-rank tests (Bonferroni correction, significance level p  < 0.017) revealed no significant effect on positive affect between conditions.

Table 3 Summary statistics of PANAS results (positive and negative dimensions)
Fig. 5
figure 5

Box plots for positive and negative affect, by condition

Regarding their responses to the simulation conditions, a number of participant comments mirrored quantitative results regarding negative affect, for example: “The more people there were the more frustrating I think it was”, and it got more claustrophobic with more people”, another stated “I felt more anxious and more distressed because there were lots of people in the way”. When considering affective state, another aspect that warrants closer examination is participants’ interpretation of agent behaviour that emerged throughout analysis. Several participants described the agents as being “rude” or “aggressive”; for example “they didn’t stop and give way as you would expect from a real crowd they tended to be more aggressive”. Another participant commented: “if they saw me coming they wouldn’t change direction, it would almost elevate hostility. You almost feel like you’re being walked over”.

5.2 Trajectory data

We used the trajectory data to further quantify features of participant activity and experience, and post-processed it to extract a number of metrics. A data file for one participant, for one condition, was incomplete: we removed this participant from our post-processing. We computed mean speeds for each participant, for each condition, and summary statistics are presented in Table 4, with corresponding box plot in Fig. 6. The data passed the Shapiro–Wilk test for normality (p > 0.05). We used a repeated measures one-way ANOVA which revealed a main effect of density on speed at p < 0.05 (F(2,46) = 41.339, p = 0.001, ƞ2p  = 0.977). Post hoc tests with Bonferroni correction showed significant differences between low- and high-density conditions (p = 0.001) and medium and high conditions (p = 0.001).

Table 4 Summary statistics for participant walking speed while crossing flow of virtual agents
Fig. 6
figure 6

Box plots for mean velocity and closest approach, by condition

Qualitative findings further support the notion that crowd density had an impact on participants’ ability to walk and complete their tasks. For example, one participant remarked “when there were less people it seemed less worrying cause there was more space and you could see a line to go through”. Along these lines, another participant stated “it was the only time I forgot what I was supposed to do” when describing the high-density condition. Additionally, there was some evidence that participants had difficulty in navigating past agents because their behaviour did not afford fully naturalistic interactions; for example, one participant reported that “they felt they were aware of me when they were in proximity to me. I didn’t feel they were aware of me in their trajectory”, and “They seemed like they had semi-awareness”. Another participant noted that “they weren’t stopping as they would in real life, like stop and let me go past, obviously, but they were changing direction if I was there”.

We computed a mean distance to nearest agent for each participant across each condition. The data passed the Shapiro–Wilk test for normality (p > 0.05). Summary statistics are presented in Table 5, with corresponding box plot in Fig. 6. A repeated measures ANOVA revealed a main effect of density on proximity (F(2,46) = 766.938, p = 0.001, ƞ2p  = 0.971). Post hoc test with Bonferroni correction showed significant differences between all conditions at p < 0.05 (all at p = 0.001). In Sect. 6, we discuss how this result connects our work to previous works in VR proxemics. We note that the proximity results in particular may depend to some extent on the model parameters, particularly as the parameters control the speed and rate of change in direction of agents. However, the parameters are consistent across conditions, so we can reasonably consider that the observed difference is due to the change in density of agents and response of the participants.

Table 5 Summary statistics for agent proximity

5.3 Analysis of video data

Our code book comprises 28 separate codes, which we further classified into broad categories, shown in Table 6. We coded all videos from all conditions. Three video files were incomplete or did not record correctly due to equipment failure. We removed data from the corresponding participants, leaving data from 66 videos (22 participants). We then counted the codes in each video and computed the mean number of occurrences for each code (in a single video) for each condition. We used the Friedman test to identify eight codes which showed statistically significant differences (p <0.05) between conditions, as described in Table 7. We further used the Wilcoxon signed-rank post hoc test to identify statistically significant differences between pairs of conditions within the identified behaviours (assuming significance at 0.017 with Bonferroni correction). We then used the Wilcoxon signed pair test post hoc to identify statistically significant differences between pairs of conditions for those behaviours. These are shown in Table 8.

Table 6 Behavioural code summary
Table 7 Statistically significant behaviours
Table 8 Post hoc tests on significant behaviours

The codes in Tables 7 and 8 mainly relate to considered actions and reactive behaviours (which show the biggest differentials in occurrences). Observational behaviours occur with high frequency in all conditions, but show no significant differences between densities with the exception of the watching behaviour which is prevalent in low and medium density. As with the affect measures, there is most differential between the high-density condition and the other two.

Interview results further explain some of these codes, suggesting that a few participants initially tried to interact with the agents in an experimental fashion, for example, speaking to or physically touch agents. For example, one participant commented that “I did say sorry once which was a bit odd”. From the perspective of future applications, this does hint at possible challenges for mixed human-agent VR simulations, if interactions are not fully intuitive (where simulations only model human movement, but not communication). Interviews also supplied evidence that some participants may be able to disassociate themselves from the simulation in order to complete the task. One participant stated “you realise they just of bounce off of you after a while oh well, just carry on walking through them” and another similarly said “The crowds activity didn’t really affect me at all, as soon as I realised that me bumping into them had no effect it didn’t seem to matter”.

5.4 Findings

Here, we present our main findings with a focus on our initial research questions around the impact of crowd density on user experience and behaviour. Most significantly, our results show that high crowd density has a negative impact on participants’ affective state, as demonstrated by quantitative results for negative affect along with qualitative feedback from participants. This suggests that participants generally perceived high crowd density in VR as an uncomfortable experience, mirroring responses to real-world crowds, reported in previous works (for example, Filingeri et al. 2017). However, interview feedback also suggests that certain agent behaviours were perceived as rude, possibly contributing to negative perceptions among participants. Thus, we conjecture that negative affect may be attributed to a combination of agent proximity and behavioural artefacts produced by the simulation model.

Additionally, we have shown through video analysis that higher crowd density affects participant behaviour on various levels, suggesting that participants found it more difficult to carry out the task in the high-density conditions. In terms of movement planning, the number of direction changes increased with density, and higher numbers of planned stops were observed at high density. Participants visually tracked agents more frequently in low density, and four reactive behaviours (avoiding and deflecting gestures, small reactions, and sudden stops) also occurred more often in high density. Interestingly, participant feedback generally supports the suggestion that higher density affected movement planning and execution; however, there was some evidence that a lack of realistic communication led to unnatural behaviours (for example, completely ignoring agents while carrying out the experimental task).

6 Discussion

Our work explores the impact of crowd density on user experience in VR settings; it complements and extends existing work by evaluating user affect and behaviour when participants are performing a task-based activity in a continuous social forces-based simulation. Our results show a significant increase in negative affect in high crowd density as well as significant reduction in participant movement speed, increased proximity, and changes in some types of behaviour. Here, we discuss our results with a focus on the impact of crowd density in VR settings; we consider the implications of our findings for models of crowd behaviour, applications of VR crowds, and we generalise our findings beyond our experimental setting.

6.1 The impact of crowd density in VR settings

A key motivation for our work is the potential to embed human participants into existing simulation models, for the purpose of developing a more detailed understanding of human behaviour in crowds of different densities. The advantages of including human participants are that existing agent models are comparatively simple and do not capture the full range of human response and behaviour (e.g. Moussaid et al. 2016). However, crowd density is an important variable factor in real-life settings: evacuations or other extraordinary events are often characterised by unusually high crowd densities. Our results indicate a clear response to crowd density in VR, and we thus assert the utility of such simulation scenarios, and the intentional creation of stressful or challenging crowd conditions, which could help to better inform processes such evacuation planning, building design, event management, and training.

Our work contrasts with previous work in proxemics which uses either stationary participants or non-naturalistic control methods, such as game controllers. We used the HTC Vive platform, in which participants walk in a natural way while interacting with agents and are represented by a pair of hands with which they can interact with the environment. This incorporates a kinaesthetic dimension to the study: participants have a finer level control over interactions, proximity, and behaviour which they can express naturally through body movement and which are also subject to natural constraints such as balance.

We observed differentials in proximity by density, shown in Table 5: these appear small compared to the granularities used in proxemics studies, but they show a significant effect which is the result of interactions between participants’ response to proximity, the movement-based interface provided by the platform, and the simulation itself. We believe that these give a more useful and deeper insight into user experience and response to increasing crowd density than previous studies and can help guide the design of mixed human–agent simulations.

6.2 Leveraging self-reported affect as indicator of human response to crowd density

We again draw attention to the measurements and metrics used in our study. Previous works have mainly utilised participant presence (e.g. Witmer and Singer 1998), bespoke questionnaires, trajectory data, and bio-physical measures such as EDA. We have instead used the PANAS scale (Watson et al. 1988) which evaluates users’ affective state, rather than their perceptive or cognitive response. Previous work has correlated presence and emotional response (e.g. Diemer et al. 2015); however, we believe that the affect scale, which is widely used in experimental psychology and games user research, gives a direct and validated measure of experience and user outcomes from a design perspective. Moreover, unlike EDA, which measures arousal, PANAS is suited to experimental contexts with significant user movement (e.g. walking). We have also systematically categorised and quantified user behaviours (previously reported anecdotally), and we believe that such measures are useful additional tools for future study.

6.3 Reconsidering models of crowd behaviour

The combination of qualitative and quantitative data offers the opportunity to gain further insights into the use of models of crowd behaviour for human interaction, further addressing participants’ response to the simulation, agent behaviour, and interpretation of quantitative measures.

Many participants made comments about the general behaviour of agents which indicates a negative response or interpretation: for example, using descriptors such as “aggressive” or “rude”. These are direct responses to the simulation model, which, like most crowd simulations, is not explicitly designed for human interaction. This may be further exacerbated by an increase in artefacts such as rapid changes in agent direction and orientation, which are more common in higher densities and may influence user experience; however, participants did not comment on such artefacts. Together, this introduces a number of questions around the design of simulations which incorporate human participants, which warrant further study.

A number of previous works have looked at the use of gestures and other interactions in VR simulations, and the layering of such interactions on top of existing simulation models may go some way to alleviating this problem in future. This is a potentially interesting direction of future study, with good results reported by, for example, Narang et al. (2016) and Kyriakou et al. (2016); however, as we have noted, combining such techniques with varying density conditions is not necessarily straightforward. For example, there is potential for conflicting and confusing behaviours to arise and also for seemingly positive interactions to produce negative effects when combined with other conditions. For example, experiencing eye contact may put the user at ease in a sparsely populated environment, but could be intimidating or upsetting when experienced repeatedly in a dense crowd. As mentioned, we have omitted such interactions in order to isolate user response to the simulation model and density conditions; however, the use of such interactions warrants deeper consideration.

6.4 Application areas for VR crowd simulations

Given that our work suggests an impact of virtual crowd density on the behaviour and affective state of individuals, we believe that it motivates the further study of simulations that prepare individuals for possibly challenging situations; for example, guiding evacuations. To this end, VR simulations could not only be used to induce negative affect through large sizes of crowds, but would also offer the opportunity to further adapt environmental factors that can contribute to difficult human behaviour, e.g. lighting and visibility, environmental noise, and comparable factors.

Additionally, there are a number of other potential application areas worthy of further consideration. We note firstly that VR has been widely used in therapy; for example, for treatment of anxiety conditions (Opris et al. 2012). We believe that our results motivate further study of the use of crowd simulations to either directly treat anxiety associated with crowds, or to use crowds to affect higher levels of stress or anxiety associated with other scenarios or tasks.

We also note that social force models are used outside of crowd simulations: for example, in robotics, such models are used to mediate human–robot interactions (HRI) (for example, Ferrer et al. 2013). The use of VR crowd simulations could thus be developed as an effective research tools for HRI, for example, enabling the creation of extraordinary scenarios or the hypothetical deployment of large number of robots.

7 Limitations and future work

We believe that our results are both more general and more detailed than previous studies in VR Crowd simulation. However, our findings need to be interpreted in the light of the following limitations. Our study employed a repeated-measures design that only offered short bouts of interaction with the VR simulation. To fully understand the impact of crowd density on human and agent behaviour, and possibly explore application to real-world scenarios, longer-term study of participants’ response to such simulations is warranted. Likewise, future work should explore the possibility of combining different measures of participant response to crowd simulations, for example, by combining self-report measures such as the PANAS questionnaire with physiological measures and integrate existing work that utilises means of measuring interaction between humans and agents, e.g. gaze.

In terms of technology, the HTC Vive only offers a restricted walkable area (which is typical of room-scale VR systems). This currently limits the scale and types of scenarios in which a human participant can engage. We propose that other methods of locomotion (or combined methods) could be investigated to alleviate this restriction. Teleportation methods are commonly used in games for movement over distance; however, we consider that the ability to displace oneself might substantially alter the effects we have described in this paper.

In our study, we represented participants in VR using a pair of active hands. Some previous works (Peck et al. 2013) have shown that full body representation can significantly increase user’s sense of agency and ownership, and the use of such representations may alter users’ experience of virtual crowds. We also note that HMD view angle is narrower than normal human vision. This is common with HMDs; however, in dense situations this could contribute to increased sense of crowding, as peripheral vision is effectively reduced, or to some of the reported behaviours. For example, sudden reactions or gestures could, in some cases, be due to delayed perception of agents outside of the field of view.

Our findings (particularly comments made by participants) indicate that our choice of simulation model has had a direct affect on participants’ experiences. Indeed, this is unavoidable, as different models are likely to produce different behavioural artefacts. We used the social forces model by Helbing and Molnar (1995) as a baseline for user experience, as this model has been used extensively, and many adaptations have been proposed and are still being investigated by researchers. Predictive models (e.g. Karamouzas et al. 2009) might provide a more positive user experience, and we propose to make comparisons with such models as future work. Despite the limitations of using a single model, we suggest that our results in high-density situations generalise to other models, where close contact is unavoidable.

Finally, we believe that the success of mixed human–agent simulations in VR require a new perspective on crowd simulation models. Currently, work in this area is focussed on determining whether agents’ interaction with each other appears realistic to a human observer. Such models also need to appear realistic when they are mediating interactions between agents and humans, so that participants can interact with them more naturally. This presents a quite different challenge for future work.

8 Conclusion

In this paper, we explored the impact of crowd density on user experience and behaviour in VR simulations. We contribute a study of user affect and behaviour using a task-based scenario in a continuous simulation, in which we use crowd density as independent variable.

Our results demonstrate a significant increase in negative affect and reactive behaviour with increasing crowd density, which correlates with previous work in proxemics, and also with participants accounts of feelings about real crowds, suggesting that participants respond to crowd density in VR in a comparable way to real crowd situations. This opens up new opportunities for the refinement of simulation models along with the application of VR to explore crowd simulation along with human behaviour and suggests that VR can be leveraged to create virtual experiences that allow for the study of human behaviour that can be extrapolated to human response and action in real-world settings.