1 Introduction

The term ‘Virtual Reality’ (VR) was coined by Jaron Lanier in the late 1980s, defined as a three-dimensional, computer-generated environment, in which people can immerse, explore and interact [32]. Emphasising users’ dynamic control of viewpoint, Brooks defines a VR experience as “any in which the user is effectively immersed in a responsive virtual world” [2]. VR has recently gained widespread attention for research and applications with affordable headsets, such as the Oculus Rift and the HTC Vive. Both are tethered to high-end PCs with cables, with a wide 110-degree field of view and high-resolution displays. At the same time, several mobile VR devices have been well-received. Two typical examples are the Samsung Gear VR and the Google Cardboard. These are wireless, light-weight and low-cost options, but are essentially cases with lens arrangements for mobile phones. Therefore they suffer from two key challenges: first, the refresh rates and resolutions are restricted by the mobile phone platforms on which they are delivered - and while we can assume this issue will go away as the technology miniaturises, the second issue is more likely to persist: limited options are available to control the device. While equipment like the Gear or the Google Daydream offers handheld controls, the same does not apply to systems like Google Cardboard. As the cheapest and most available form of VR, it is worth considering how we might interact with and control experiences in this limited context.

Meanwhile, museum and cultural heritage exhibitions are shifting from being collection-based and communicating history with labels, to activity-oriented, engaging visitors to offer captivating experiences [12]. They are exploring the use of digital technology to attract visitors and to provide interactive experiences which are otherwise impossible with conventional museum technologies. Museums such as the Natural History Museum in London and the Acropolis Museum in Athens [16] are adopting VR technologies to create virtual exhibition experiences and apply AR technologies to develop augmented guide systems. In addition, Brown introduced a mixed reality system that enables a collaboration between onsite and online visitors [3]. The use of VR in an exhibition context essentially comes in two forms: (1) situated experiences where the VR equipment is set up in the gallery and visitors use that equipment to explore a curated exhibition. (2) At home experiences - often delivered on mobile phones allowing a user to either ‘visit’ an existing exhibition remotely or recreate a historical environment.

As mobile phone-based VR becomes increasingly ubiquitous and familiar, the prevalence of such home VR exhibitions is only likely to increase. In the light of this, we suggest that it is necessary to understand the way in which the interaction in such experiences should be designed given the constraints of limited control. To begin to address this, we created a simple virtual exhibition of famous paintings and performed a user study to question users’ preferences about two aspects: movement control and access to additional information. The User Experience Questionnaire (UEQ) [18] and the Simulator Sickness Questionnaire (SSQ) [17] were adopted in the user study and interviews were carried out to discuss their preferences. The key findings of our research are (1) users prefer to be able to directly control their movement; (2) this does not make a notable difference to simulator sickness; (3) embodied guides may be a good way to deliver additional information in VR exhibition settings.

2 Related Work

Museums and cultural heritage exhibitions have traditionally been based on physical collections, providing static labels to communicate history and the story behind them. However, they are adopting activity-oriented presentations to engage visitors and offer more captivating experiences [12]. The deployment of digital technologies in the cultural heritage sector has significantly increased in the last few years, promoting the diffusion of culture by developing creative narratives to support education and recreation [34].

Researchers have investigated and exploited the immersive, interactive and imaginative nature of VR [5] to disseminate and present arts and humanities [7, 8, 11, 14, 25]. This includes the emergence of ‘Virtual Museums’, of which there are two main types [4]: one is the replication of an existing museum, which allows remote access to its digitalised exhibits; the other is the reconstruction of a lost archaeological site and activities that enables users to navigate and observe virtual objects, such as the Rome Reborn project [10]. Thanks to the recent advances in rendering techniques, the immersive systems could provide high-fidelity visualisations and simulate a virtual construction that is both physically correct and perceptually equivalent [15].

When immersed in a virtual environment, users need some navigation capability to be able to move around and explore the environment, just like they could walk around in the real world. Aside from ‘walking’ in VR by specifying a direction of movement, users can also teleport in VR, which could help reduce motion sickness by eliminating visible translational motion [1]. However, as teleporting is only afforded in the digital world, it could make the experience less natural. Synchronising real-world movements to VR is also feasible with external tracking sensors. This feature is usually supported by desktop VR equipment, however, becomes less feasible with mobile phone-based VR. Additionally, such systems also depend on the availability of a significant open physical space, which may be impractical for at-home use.

Similarly to physical exhibitions, it needs to be considered in a virtual exhibition whether and how to direct and maintain users’ attention towards different exhibits. For example, whether the experience should be made a guided tour or a free look-around. The visiting experience varies when different navigation techniques are adopted. Previous designs reflecting navigation concerns include providing visual cues in the virtual environment, such as landmarks to help the user to know their location within the 3D environment [31]; use of map menus allowing direct movement to the selected site of interest that the user wants to go to [4]; or allow users move around freely in the environment using keyboard and mouse [6]. In many real-world museum experiences, visitors would find it difficult to navigate around the site without guidance. It is common practice for visitors to be shown a recommended visiting route by a tour guide, a map or an audio guide system. Similarly, a virtual museum visiting could also be a guided experience and the system could take over the movement control and automatically show the users around. However, surrendering the control in VR could contribute to the cause of simulator sickness when a moving visual imagery drives a compelling sense of self motion [19]. In addition, studies have shown that simulator sickness symptoms were reported at higher levels of passive viewing compared to active control over movement in the virtual environment [27].

Most guide information in virtual museums was presented in the form of 2D text and audio clips. They either stayed there as part of the scene or could be triggered by a button. Chittaro et al. [6] presented an embodied guide that assisting users with exhibit information in a virtual museum tour. Although the guide was programmed to give a tour that encompass objects in a certain sequence, which restricted users’ control over the information, it has been shown that an embodied guide could contribute in drawing user’s attention by providing visual cues [26], and help bring users more motivation and engagement [22].

Virtual museums used to adopt different control techniques for movement and guide information, yet the effect of different control methods on user experience has not been well studied. In addition, while complex systems with high computation efficiency allow more control capabilities with joysticks and tracking sensors, there is less control available on mobile systems and low-cost equipment. The control on these devices is supported by the buttons or touchpads that are embedded on the headset. In this paper, we propose that the application of AR object and marker recognition techniques could be used as an efficient supplementary control method that allows a mobile computing device to detect and track the augmented object. This may allow for interaction without the need for joysticks or controllers, which may work well for applications delivered on self-contained mobile VR platforms, where the headset is used as the sole input and interaction method. Meanwhile, using marker recognition is similar to hand gesture control, which could support more intuitive interactions than using controllers. As indicated in [21], incorporating movements and gestures in the real-world environment could be a more intuitive interaction approach for mobile VR scenes. In this paper, we describe an experiment using AR-inspired marker recognition as a stand-in for gesture control, without introducing extra sensors on the headset or users’ hands. Using marker recognition could be a cheap alternative to allow certain control in the absence of external control devices, such as joysticks, particularly for mobile VR experience.

Fig. 1.
figure 1

Control matrix of different guide approaches

Fig. 2.
figure 2

Top-down room view with movement and guide controls settings (Color figure online)

3 Experimental Virtual Museum

Figure 1 presents a matrix of different approaches that can be offered to users with regard to control over their movement and navigation within the VE and guide information. By fixed movement control, we refer to the situation where users are passively shown around and have little proactive control over movement, such as following a museum tour guide on a planned trajectory within a certain period. Fixed guide control refers to the case where users are given the information automatically, regardless of whether they have asked for it or not. This usually happens during a group visit, where a tour guide talks to the group and the user listens. A similar situation happens when users take a tour with a time-based or position-based audio guide system. The most common user experience in museums is free movement around the site with guide information provided by in-situ placards and externally-triggering audio guides. In Fig. 1 we have categorised these options as free movement and free guide, denoting options for allowing users to freely move around in a physical space and read or listen to guide information at will. As indicated in the matrix, tour guide and audio guide systems limit users’ freedom of movement and guide control while more flexible options such as labels are usually less interactive.

In order to explore user preferences for these control options, we developed a very simple virtual museum as a low fidelity experimental set up, displaying famous paintings with a series of rooms - each of which offered a different combination of controls of both the users’ movement and access to additional guide information. The experience was built in Unity and presented using a Samsung Galaxy S6 mobile phone in a Gear VR headset. The experience with different combinations of movement and guide controls was configured in four different rooms mapping to the control matrix (see Fig. 1), which we distinguished visually by four different colours (see Fig. 2).

3.1 Guide Design

One innovation tested in this study is the idea of embodying [9] audio guides as a virtual character, as shown in Fig. 4. Rather than have the virtual tour guide visible all of the time and leading the user around the VE, it can be called upon when required during the visit. For example, the virtual guide can appear when the user is looking at a museum exhibit to give the user information about the exhibit, but does not visually accompany the user as they move around the museum. The core design principle was that the guide could be ‘carried around in the user’s pocket’ and taken out when needed to introduce the exhibits to users. To achieve this, we utilised an AR technique that recognised a code card held in the user’s hand and augmented the virtual guide onto the card (see Fig. 3). Several designs for the guide presentation in the VE were explored. The initial implementation mapped the character exactly to the card, including both its position and orientation. This makes sense when one can see a virtual representation of one’s hands. However, without that visual cue, it was difficult for users to relate the movement of their hand to the movement of the virtual guide, especially with the limited tracking rate of the phone’s relatively slow processor. We also considered placing the guide at a fixed position within the world space, such as the left bottom corner of an exhibit. This could reduce the visual distraction as users’ visual attention could then be more focussed on the exhibit located in the centre of the view. However, this option disassociated the virtual guide from the AR card - making it harder for users to have a clear mental model of how the guide works.

Ultimately we settled on mapping the guide to the position of the card but orienting it consistently vertically to the user’s viewpoint, regardless of card orientation. A low-pass filter was applied to smooth its movement and avoid flickering that could be caused by a sudden change of the code card position and the hand. The character was further embodied with some natural movement animations, such as idling, looking/turning around and talking.

Fig. 3.
figure 3

Use code card to control virtual guide

Fig. 4.
figure 4

Virtual guide introducing Starry Night (https://www.mixamo.com)

3.2 Control Design

Guide Control. The guide’s presence is triggered by the system recognising a code card (a proxy for actual hand gesture recognition). Once the code card is recognised, the virtual guide is presented at the place where the card is recognised. When the guide first appears, it is facing away from the user, allowing the users to ‘show’ the guide the exhibit that they would like to receive information about. Once the guide has recognised the exhibit, it turns to face the user and begins to talk. The user could adjust the position of the guide in the virtual room by moving the code card in the real world. The audio could be stopped by hiding the code card to dismiss the guide character, however simply looking away from the guide would not dismiss it - the action has to be deliberate (see Fig. 3). This is designed to alleviate the physical burden on the arm so that users do not have to hold the card all the time to keep the guide talking. It could also help prevent accidental stop of the audio information while wandering around in the room. In rooms with no guide control, the audio information is instead triggered automatically when users reached an exhibit and looked closely at it.

Movement Control. For the control of the movement around the VE, users were either moved on a fixed path (so-called ‘on rails’ movement) or had their own control of the movements by interacting with the touchpad on the headset as follows: a single tap to move forwards; swipe backwards to move backwards; single tap (again) to stop moving (see Fig. 5). However, the speed of movement was kept consistent at a walking pace that is slow enough to eliminate simulator sickness caused by fast-moving and ensure a clear focus at all times. For the experiment, the speed was set to 5.0f in Unity. The direction of movement is consistent with their head orientation, which is approximately their gaze direction. Using head movement for input has been shown to be an effective way as it is easy to learn and does not tire participants [13].

Fig. 5.
figure 5

Movement control

Fig. 6.
figure 6

Virtual room layout

3.3 Room Settings

The virtual art gallery developed for this study comprised a tutorial room and four exhibition rooms. The tutorial room and the exhibition rooms had the same virtual room layout. 24 paintings were randomly distributed in the exhibition rooms and each exhibition room had six paintings on the walls (see Fig. 6).

In the pink room, users were moved on a fixed path and audio information was automatically played when the user reached each exhibit. The system moved the user to the next exhibit once the previous audio information finished. The experience ended after the user had been taken to all of the exhibits in the room. Users could turn their head freely to look at the exhibits on the wall.

In the yellow room, users were moved on a fixed path. Users had control over the audio guide using the AR code card. The system auto-moved the user to the next exhibit after a fixed amount of time (5 s plus the audio length) at the previous exhibit. The experience ended after the user had been taken to all of the exhibits in the room.

In the green room, users had control over movement and were able to navigate themselves using the headset touchpad, but they had no control over the audio information. This was triggered automatically with a proximity detector. Any currently playing audio ceased when they moved close to another exhibit. Users were free to stay in the room for as long as they wanted and so they could finish the experience at any time.

In the blue room, users were able to navigate themselves using the headset touchpad. They also had control over the audio information in the same way as in the yellow room. Users were free to stay in this room for as long as they wanted and finish the experience at any time.

3.4 Technology Setup

The system was developed in Unity with the Vuforia AR platform. In this study, it ran on a Samsung Galaxy S7 and Samsung Gear VR headset (SM-R322). The graphics were displayed at a 60 Hz refresh rate and a 96-degree field of view.

A Vuforia ImageTarget was created for the camera to recognise and apply a 3D character on top of it. The virtual guide character followed the position of the code card in the real world while the rotation of the character was disabled. The movement was smoothed to avoid character flickering caused by sudden changes of card position. An idle animation was played by default; a turning back and talking animation were played when it detected an exhibit and started to introduce it. The triggering of the audio was implemented using a Raycast fired from the character’s eyes. When the ray hit an exhibit, the character ‘recognised’ it and began talking.

The movement control was achieved by getting the inputs from the headset touchpad and updating the position of the main first perspective camera in accordance with input from the touchpad. With forward input, the system moved the user towards the direction that they were facing. The opposite direction was applied to the backward inputs. The moving speed was set to a constant at approximately walking pace.

The exhibits displayed in the virtual rooms were 24 paintings. 24 audio clips were recorded to describe the paintings and each audio clip lasted around 60s. In addition, four instructional audio clips were recorded to be played at the beginning of each exhibition room to remind the users of the controls that were available to them.

4 Methodology

4.1 Participants

In total, 10 participants (6 male, 4 female) were recruited from a pool of university students, after appropriate ethical review. The experiment was conducted with two groups (A and B). Participants in Group A experienced the pink room first, followed by the yellow room, green room and the blue room. Participants in Group B experienced the rooms in the opposite order. The study lasted approximately 1 h. Each participant was compensated for their time with a voucher.

4.2 Study Design

The aim of the study was to investigate users’ preferences for the control of their movement and the control of the guide information in the virtual exhibition. The independent variables were the control of movement and the control of the guide information. The dependent variables were the user experience (measured by the user experience questionnaire), simulator sickness (measured by the simulator sickness questionnaire), and users’ attitude towards the controls in the systems (discussed in a post-experience interview). The configuration of user control over movement and guide in each room were presented in Fig. 2. The sequence of room visiting order followed a pattern of increasing control for users in Group A and decreasing control for users in the Group B. Two pilot studies, one with an expert user and one with a naive user, were conducted to finalise system design details and questionnaire settings before recruiting participants.

Two questionnaires were used to measure users’ experience and the simulator sickness. The User Experience Questionnaire (UEQ) [18] uses 26 pairs of contrasting attributes with a 7-point Likert scale to measure the attractiveness, perspicuity, efficiency, dependability, simulation and novelty of the system. The Simulator Sickness Questionnaire (SSQ) [17] has 16 items that allow users to report possible symptoms related to nausea, oculomotor and disorientation and evaluate the severity on four scales: none, slight, moderate and severe. A semi-structured interview was conducted to discuss users’ preferences of controls and attitudes towards system interactions.

Considering the potential risk of motion sickness when being in the virtual environment, users were informed that they could remove the headset at any time during the study if they felt uncomfortable. No user made this request. In any case, users were required to take off the headset at the end of each experimental task and so they spent no longer than ten minutes wearing the headset at one time.

4.3 Procedure

At the start of the study, users were asked to read the information sheet and sign a consent form. An overview of the study was provided with screenshots to explain the room layout, menu settings, movement interactions, and guide information control. Headset fittings were adjusted and the tutorial room was used to allow users to get familiar with the interaction methods. When they were ready to start the experimental study, users in Group A started with the pink room whereas those in Group B started with the blue room. Users from both groups were required to explore all four exhibition rooms and fill in the two questionnaires for user experience and simulator sickness after exploring each room. During their explorations in the room, their head positions and the time duration were logged to monitor the exhibits they looked at and to analyse their points of interest. In addition, they were asked if there was anything they found interesting about the exhibits in each room, either from the audio information or their observations. After having explored all four rooms, they were asked to rank the rooms based on their preferences. A semi-structured interview was conducted to discuss their rankings.

4.4 Data Collection and Analysis

Quantitative data included the two questionnaire responses collected from the users after visiting each room, as well as the gaze data captured during their exhibition visits.

The original UEQ used a 7-point Likert scale evaluation for each item, where the middle value does not indicate any preferences. In addition, some items such as ‘secure/not secure’ did not apply to the context of this system. Therefore, we adapted the questionnaire to use a 6-point Likert scale and an additional ‘Not applicable’ option. Data collected from this questionnaire was analysed using Laugwitz et al. analysis toolkit [18]. The analysis transformed the 26 items and scores for the six evaluation scales were derived and compared with the benchmark, which contains data from 246 product evaluations with 9905 participant evaluations in total. For each scale, the benchmark classified a product into five categories: Excellent (10%), Good (10% to 25%), Above average (20% to 50%), Below Average (50% to 75%) and Bad (75% to 100%).

The SSQ results were analysed following Kennedy et al. scoring procedure [17]. The scores for each symptom are quantified as none (0), slight (1), moderate (2) and severe (3). The Total Severity (TS) score plus separate scores for three sub-scales: nausea, oculomotor and disorientation were derived by adding the corresponding symptom scores and applying a constant weight. The score values do not have particular interpretive meanings, but were used in comparison with scales based on the 1,100+ calibration samples provided by Kennedy et al. [17] (see Table 1).

Table 1. Descriptive statistics for SSQ scale provided by Kennedy et al.

Gaze information, inferred by a user’s head position, was captured using a script that logged when the user looked at an exhibit. The data was stored in a list and written to CSV data files on the local storage of the mobile phone before users exited each room. This log file captured the identifier of the exhibit that the user was looking at, the absolute time when they faced towards the exhibit and when they looked away, as well as the duration of the gaze. The mean time that users spent in front of each exhibit were calculated and compared to the length of the audio information.

In the interview, users were asked to comment on their preferences for the different control methods in the rooms. A theme-based content analysis method was applied to analyse the interview transcripts [23], with top-level categorisation divided into two groups: (1) movement control and (2) guide control.

5 Results

After experiencing the four rooms, users were asked to rank the rooms from 1 to 4 based on their preferences, where 1 indicates the room that they enjoyed the most. The responses are shown in Fig. 7. It can be seen that most users preferred the blue room, where they had control of both movement and the audio guide information. The second preferred room was the green room, where they were allowed to move freely but the guide information was automatically triggered. The pink room with no control was ranked third and the yellow room, in which they had control of the audio guide but no movement control, was ranked last.

Fig. 7.
figure 7

Users’ preference for room control (Color figure online)

Fig. 8.
figure 8

User experience results (Color figure online)

A similar pattern of results was revealed by the UEQ (see Fig. 8): the blue room (free movement, free guide) created the best user experience, with most scales falling into the ‘Excellent’ category. Next was the green room (free movement, fixed guide) with slightly lower scores in each scale than the blue room. Although there was no control allowed in the pink room (fixed movement, fixed guide), users reported a good experience in general. The yellow room (fixed movement, free guide) yielded the lowest UEQ scores on most scales, with scores generally at, or below ‘average’. In all rooms, the lowest rating was for ‘novelty’.

Analysis of the SSQ indicated that in general, the total severity of symptoms reported in each room was below the Kennedy standard average (M = 9.8, see Table 1), except the yellow room. In addition, the blue room resulted in fewer reports of discomfort and sickness symptoms; whereas the yellow room had the highest sickness score (see Fig. 9). These results correlated with the results from the UEQ as well as the rankings provided by the users (see Table 2). Fewer sickness symptoms were reported in rooms with a better user experience in general. Meanwhile, an analysis of symptom change did not indicate a significant consistent increase or drop in sickness throughout the experimental period for either of the participant groups.

Fig. 9.
figure 9

Simulator sickness scores

Table 2. Correlations between the preference, UEQ and SSQ

Figure 10 showed the average time users spent looking at an exhibit in relation to the length of its audio information. In the pink room and the yellow room, although users could look around freely and look at other places while situating in front of one painting, they were moved around by the system on a fixed path. Therefore, the time users spent in front of each painting was roughly the same as the audio length. While the time users spent looking at the exhibits was less predictable in rooms with free control over movement because they could linger longer in front of the paintings they were interested in and skip those they were less concerned about. However, it was worth noting that the average time users spent in front of each painting in the green room were shorter than the length of the audio clips. Leaving the exhibits half way through the audio due to a loss of interest could account for this. A more convincing reason was that users did not observe the exhibits they were not interested in, or may have accidentally triggered it for several seconds which they did not intend to view. Nevertheless, it was included when deriving the average viewing time. These cases were significantly reduced in the blue room and false positive results were avoided because it required intentional triggering of the audio. The audio was triggered when the system detected the user facing an exhibit. Therefore the audio would start again if the user remained in front of it - something users commented on and they disliked. When this happened, they would usually leave and move to another exhibit. Being unable to stop the audio information did not permit them to appreciate the exhibits without listening to anything. It was one of the reasons given in the interview to explain why they preferred the blue room (with a control of the audio guide information) over the green room.

Fig. 10.
figure 10

Average time spent in front of each painting compared with the audio length (Color figure online)

Users suggested in the interview that they preferred to have control of their movement because they would like to move freely in the exhibitions, decide the order of visit and take their time. When they were moved on a fixed path, they felt that they were forced to look at the exhibits and listen to them.

figure a

Users considered the guide as a vivid embodiment of the audio information. They preferred to take control of the audio guide information and decide when to start or stop it, rather than having the audio triggered automatically.

figure b

In general, no significant sickness symptoms were reported and users enjoyed the VR experience. Some slight discomforts were raised regarding the movement directions and potential discomfort can result from the time of use and the headset design.

figure c

6 Discussion

In this section, user preferences for the exhibition rooms and their attitudes towards the control over movement and guide are discussed. Comments on simulator sickness that could be caused by the interaction design are also presented with a view to supporting future system designs for virtual exhibitions. The quantitative data from UEQ and SSQ indicated that rooms with both movement control and audio guide control provided better user experience and caused less sickness or discomfort.

6.1 Movement Control

In general, most users said that they preferred the control of their own movement, allowing them to choose their order of viewing and take their time.

As for the method of control over movement, users commented that “it is easy to pick up” and “the interaction is quite intuitive”. However, one user suggested setting swiping forwards to move forwards as it provides a better mapping between the direction of movement on the touchpad and in the virtual room. It was observed that other users also tried to control movement in this way, especially after they did a backwards move and would like to go forwards. The reason for selection of ‘Tapping’ to trigger forward movement was because the directions on the touchpad were not visible to the users. However, this comment and observed behaviour suggested that a more intuitive system based on direction may be preferable. From an ergonomics point of view, the design of the movement control should consider the natural mappings and the conceptual model user might have in mind [20, 24], but at the same time, the affordance, visibility of control and the learning cost should be taken into consideration [33].

Some restrictions on the types of movement made available with a touchpad control were discussed in the interview. For example, in a real-world exhibition, users may look at an exhibit and navigate from one side of it to the other - and it happens quite often. The direction we are looking at is not always consistent with the direction we are walking towards. However, the movement direction in the virtual world is dominated by the user’s head direction, which is roughly the gaze direction. Therefore one cannot move from side to side in this system. One possible implementation of this feature could be to use a gyroscope to adjust positions through head tilting. However, the effect of this needs to be explored as a disconnection between the movement direction and the facing direction could be an inducer of simulator sickness [19]. Another issue raised was about the distance between the user and the exhibit. In the real world, a guardrail or at least a warning sign would prevent users from going too close to an exhibit. However, there is no concern of the exhibits being damaged or stained in the virtual world, users are allowed to go as close as they like. One user commented that the downside of this is “it is unnatural to go too close to the painting, we are like in search of pixels in there”. It was also suggested to slow down the movements when one becomes close to the exhibits. Given the lack of resolution in VR environments, it is arguably desirable to be able to move close to the object than one might in the real world.

Meanwhile, as a digital representation, the virtual world could offer more possibilities than the real world [28]. Users identified some features that the system could possibly have, such as having a zoom-in function to allow observing the paintings in even more detail. In addition, it was asked if the navigation could be designed in a way that looking at a painting for several seconds would automatically move the user position close to that exhibit. This is currently the most widely used approach for menu selection in VR systems. A dot in the virtual world indicates the object to be selected and a loading bar is usually presented along with it to provide an instant visual feedback. The users may have proposed these interaction approaches deriving from their experience of picture viewing and game navigations, which indicates these interaction approaches could be accepted if adapted to the virtual world.

A natural and intuitive way of interaction should consider the potential mental model of users and the visibility of the control, as well as the learning cost in the system context. Although the virtual movements have many restrictions compared to the movements in the real world, it offers many possibilities of interaction approach as a digital representation, and it is known that there is a relationship between methods of movement and simulator sickness in VR settings [19]. In the rooms with no movement control (pink and yellow), users were moved on a fixed path, on part of which users were moved from left to right. It was reported that this caused some instances of discomfort. This is entirely understandable as the moving direction was inconsistent with the direction they were looking and sensory disconnect is a well-known cause of simulator sickness [29]. Additionally, some users commented that after they finished the sessions, they did not know the orientation with respect to the room they were sitting in. Of course, room-scale movement provided by systems such as the HTC Vive sidestep many of these issues, however many current VR systems, especially those delivered on mobile phone platforms are unlikely to provide such facility in the near future.

In this study there does not appear to be a difference reported levels of simulator sickness symptoms between rooms where the user is moved (fixed path) and rooms in which the user controls their own movement (free path), possibly because the movement is fundamentally similar in both cases - being a smooth translation of position. It would be interesting to compare between full ‘on rails’ movement, controlled free movement, and the so-called ‘room scale’ movement as epitomised by the HTC Vive system.

6.2 Guide Control

The guide acted as an embodied controller of the audio information for the exhibits. Generally, users would like to have their own control of an audio guide because there were cases where they found it annoying that they could not stop the audio information or the audio started straight away. When being asked about their real-world museum and gallery visiting behaviour, some considered the museums and galleries places for knowledge and they would like to acquire information where possible; while many suggested they would like to take their time looking at the exhibits and may only look for information when they are interested in them. In particular, some deemed art appreciation to be a personal experience and would feel bothered when people keep talking and introducing exhibits to them. Therefore, they preferred to take their own control of the audio guide and decide when to trigger and stop the audio information.

As for the guide character, analysis of the interviews indicated that users favoured it for the following reasons. First, the look of the character was interesting; and the animations added some semblance of ‘intelligence’ to it. In addition, they considered the character as an embodiment of the audio guide, which explained why and how they are receiving the audio information and informs them where the voice is from [9]. Moreover, some assumed the guide was an artificial intelligence because it could ‘recognise’ the painting the user was situated in front of and started talking to them. The idea that even a simple embodied guide suggests intelligence stands as an indicator that there may be a place for actually intelligent agent-based guides, which could converse more dynamically with the user, such as those virtual humans described in [30].

When listening to the audio information, some users preferred to dismiss the guide character while the audio is playing. They found the character distracting because when it was in the view and animated, they were more inclined to look at it. Similar situations could occur with the real-world tour guide or expert as well - but of course, this does not offer the option of dismissing them as easily! When a person is talking to you, a natural social interaction is to look at his eyes to show attention. Providing users with the flexibility of keeping or excluding the guide character in the view could suit for different requirements and enhance the user experience in general.

In addition, the embodied interaction in the guide control was shown to have a positive impact on the user experience. Dourish introduced tangible and social computing in [9]. Tangible computing is concerned with physical interaction in an augmented environment and it uses computation as part of the physical world; while social computing uses social understandings of interaction to enhance the interaction with computation. Dourish argued that both exploit the sense of familiarity to smooth the interaction and are founded on the same idea based on the embodiment. In this study, the audio information was presented to the users through the animated guide character, which involved users’ proactive interaction with the physical marker and fit with users’ mental models about the guide’s activities in an exhibition. The embodied interaction that users are familiar with could enhance their user experience and contribute to the engaged participation and comprehension of the digital content.

7 Conclusion

In this paper, we have presented a simple study to look at aspects of user control related to movement and audio-guides in VR exhibitions. An indication of the results suggested that users prefer to have control over both movement and the audio guide. However, an increase of control does not necessarily result in an enhancement of user experience or a decrease in simulator sickness symptoms. When users were moved on a fixed path, they would prefer to have everything automatically controlled, instead of them taking control of the audio information alone. From the interviews, we surmise that users would prefer to walk freely and decide their own order of visit in an exhibition. They would ask for information about an exhibit when they are interested in it and they find it overwhelming and annoying when being automatically provided with all information about every exhibit. When they take a second visit, they would like to look at the exhibits without having to hear about anything. Therefore, having control of both the movements and the audio guide is preferred.

We have also shown that there may be benefits to providing an embodied virtual guide, with users generally reacting positively to its presence, while appreciating the control of being able to call or dismiss it at will. We were surprised by the amount of ‘intelligence’ users attributed to the guide.

The key contributions of this paper are as follows:

  • Experimental evidence that users prefer the direct control of movement in VR exhibitions, but there is no notable difference in simulator sickness between these conditions.

  • An example of the use of an embodied virtual guide, which suggests that this may be a good alternative way of providing users with information in VR exhibitions, different to traditional audio guides or labels.

  • Experimental evidence that users would prefer the option to dismiss supporting information at will.