Real-Walk Modelling: Deep Learning Model for User Mobility in Virtual Reality

. Abstract Understanding human interactions in virtual reality (VR) can help develop intelligent applications that adapt to users’ needs and enhance the user experience. The signiﬁcant development of VR content has expanded the impact on VR complexity, making understanding VR spatial characteristics more diﬃcult. While user mobility is a crucial part of their interactions with the VR environment, the current literature still does not provide a suitable framework to interpret and model VR user mobility data. We conducted a user experiment in the context of an abstract VR painting exhibition where users are prompted to walk naturally in a physical area to explore the VR painting. Deep Learning models are used to model user mobility sequences and predict their future movements while engaging with the art exhibition. Our user mobility model can support the development of new VR applications for the improved user navigation and social experience in VR.


Introduction
VR tends to act the reality; most applications seek to simulate the real world by adding more elements and visualisation.The development of VR headsets has extended the virtual environment (VE) that embeds more human characteristics inside VR, such as eye-tracking, hand gestures, and human movements.The unrestricted development of VR headsets towards reality adds more physical characteristics to VE.
Within a typical VR environment, users can use different navigation methods including teleport or walk using VR controllers and real walk where users walk in a physical space while wearing the VR headset.Hence, navigation is a fundamental attribute of user interaction in VR.Research shows that the effectiveness of VR navigation is often determined by the user's previous experience [1].A common challenge of VR navigation is users getting lost while exploring in an open virtual environment.Machine learning approaches have been used to improve users' navigating experience in VR.For instance, Alghofaili et al. developed a deep learning model to predict when users need navigation help and adaptively aid them in finding the right way [2].The results demonstrated the potential in improving the engagement of users in virtual navigation while effectively guiding them to their destinations.
The real walk is a vital aspect of the immersive experience which adds more reality sense to the virtual environment [3,4].It is the most immersive mobility approach for navigating VR as it adds more sense of presence to participants [1,5,6].Therefore, modelling users' real work is a crucial aspect for studying user interactions in VR and supporting the future development of intelligent applications that can adapt to users' needs and preferences.
We piloted a VR real walk study in the context of fine art VR painting exhibitions where audience can explore paintings using the free walk.To this end, we teamed up with a fine art artist and developed a mobility experiment based on a large scale abstract VR painting.The study's main goal is to investigate and model human movements while free walk interactions with a VR artwork take place.Thus, we involved a range of human-related features in this experiment.The VR painting was created by artist Goodyear using Google Tilt Brush [7].The experimental environment was developed using a Unity3D game engine with a combination of hardware sensors and software tracking tools.It enables us to collect user data using eye gaze and body movement tracking capabilities to capture fine-grained user interactions.The experiment was carried out with 35 participants invited to explore an abstract VR painting while freely walking within 4*4 meters of physical space.Collected data includes time-coded eye gaze, head orientation, hand movements, mobility, and voice comments.
Our study on users mobility in VR aims at supporting human-driven navigation tools for people new to VE; thus, they can navigate the VE profitably.We collected eye gaze, head elevation, and mobility movements in a purposely designed user experiment.These data have been used to train and evaluate a deep learning (DL) classifier model.The classifier model takes a series of user' The remainder of this paper is organised as follows.Section 2 discusses the background and related work in VR art, behavioural tracking, and modelling in VR.Section 3 introduces the authors' VR artwork, experimentation system, and user experiment.Data analysis and modelling are discussed in Section 4 and Section 5. Section 6 concludes the paper.

Background and Related Work
There is an increasing adoption of alternate reality platforms by content creators and visual artists worldwide.Blortasia is an abstract art world in the sky where viewers fly freely through a surreal maze of evolving sculptures [8].Authors believe the exploration through art and nature reduces stress, anxiety and inflammation, and has positive effects on attitude, behaviour, and wellbeing.Hayes, et al. created a virtual replication of an actual art museum with features such as gaze-based main menu interaction, hotspot interaction, and zooming/movement in a 360-degree space.Authors suggested that allowing viewers to look around as they please and focus their attention on the interaction happening between the artwork and the room is something that can't be replicated easily [9].In [10], Battisti, et al. presented a framework for a virtual museum based on the use of HTC VIVE.The system allows moving in the virtual space via controllers as well as walking.A subjective experiment showed that VR, when used in a cultural heritage scenario, requires that the system should be designed and implemented by relying on multi-disciplinary competences such as arts and computer science.
The recent COVID-19 outbreak has greatly impacted the art and cultural sector in many parts of the world.Many galleries and museums are closed indefinitely while exhibitions and auctions are postponed or cancelled.This triggered a new wave to explore alternative digital spaces with online and VR exhibitions [11].Now physical spaces are no longer the priority, the cultural sector is rushing to adapt events, exhibitions and experiences for an entirely digital-first audience.In America, the Art Institute of Chicago and Smithsonian are among institutions that have embraced VR and taken on new significance as the lock-down deepens [12].
Pfeuffer et al. investigated body motion as behavioural biometrics for virtual reality to identify a user in the context of authentication or to adapt the VR environment to users' preferences.The authors carried out a user study where participants perform controlled VR tasks including pointing, grabbing, walking, and typing while the system monitoring their head, hand, and eye movement data.Classification methods were used to associate behaviour data to users [13].Furthermore, avatars are commonly used to represent attendees in social VR applications.Body tracking has been used to animate the motions of the avatar based on the body movements of human controllers [14].In [15] full visuomotor synchrony is achieved using wearable trackers to study implicit gender bias and embodiment in VR.The gender-based eye movement differences in indoor picture viewing was studied using machine learning classification in [16].Authors discovered that females have a more extensive search whereas males have more local viewing.
The real walk is the most familiar travelling method for humans.It helps humans to have more sense of the present and naturally navigate the surrounding environment [1,3,17].Virtual reality environments help users conceptualize a spatial reality.Different locomotion techniques within the virtual model can influence how people conceptualize a spatial reality [18].In a virtual reality environment, the effect of locomotion on spatial cognition has already been observed through many studies.Different navigation techniques cause different levels of spatial awareness [19].In a study In [20], researchers examined human eye-head coordination in virtual reality versus physical reality.The results showed that users move their heads more often in VR than in physical reality.
Virtual reality is designed to fit in a virtual environment to control the locomotion experiences in a realistic and functional manner.The direction of locomotion is decided by the head-mounted display to point backwards, forward or sideways movements [21].The technology must be customized to master the body movements of a human being and understand the meaning of every command.For instance, the use of finger movements such as pointing, curling, or straightening the fingers help individuals to carry out real-life experiments using virtual reality.A data glove put on the hands of individual helps to pass commands virtually in a rush against time just like realistic life experiences [21,22].
In order to improve the reconstruction of 3D geometry estimation methods based on earlier methods, machine learning and deep learning techniques are very tempting and very desirable options [23][24][25].In order to quantify and measure VR sickness during adaptive interactions in the virtual environment, a model based on LSTM was proposed using dynamic information from the normal-state posture signals [26].Other researchers have discussed the telepresence of the participants from the perspective of behaviour understanding [27].Abtahi et al. [28] have proposed different methods to enable walking in VR, they found that the experiment will be more immersive when users are at the Ground-scale level at same increasing the speed of walk in order to navigate more locations in VR.Physical navigation (including head/body movements) are essential to improve user interaction and engagement in VR applications [29].Recently, the LSTM provided a good contribution to locomotion prediction in VR [30].The LSTM has been used to predict the future position after 2.5 seconds of the current.The research has shown a 65 CM average error for the prediction.

Experimental Design
To collect the required data for VR mobility research, we have considered abstract VR painting as the use case to construct the virtual environment.VR paintings often see the most unpredictable movements from the audience as each person has their own favourite perspective and viewing habit when it comes to art exploration.The experimental VE consists of a 3D exhibition room with a large scale abstract VR painting that is made of tens of thousands of brushstrokes.The VR artwork is placed on one half of the room, while it is opposite of participants' starting position.Participants can freely move their locations to observe different parts of the artwork from different viewing angles.Participants can also walk into the painting to explore the extensive content inside the painting behind the brushstrokes on the outside.
The design of our experiment is endeavouring to employ more human factors to investigate their effect in understanding behaviour.Thus, we have used the HTC VIVE Pro EYE as the primary headset for the experiment.The Tobiibased eye tracker comes embedded in the headset which allows eye-tracking data collection and gazed objects mapping.Moreover, it provides head orientation and position tracking using externally based stations.We attached a Leap Motion to the VR headsets to track hand movements and reactions for the following research.The experimental system also supports the FOVE0 headset, which comes with built-in eye-tracking.

Virtual environment
For this research, we have design in-door conditions to construct the VR environment.There are three main elements in the scene: abstract painting, virtual space, and lighting.The abstract painting has been used as a core aspect of the environment.Goodyear is a professional VR artist who has created the VR painting for this experiment [31].She has made several VR artwork exhibitions in public galleries.Goodyear uses Google Tilt Brush to create VR artwork.The painting selected for the experiment consists of several brushstrokes types in a range of colours.The brushstrokes are constricted in a virtual space that allows the participants to walk through.They also take different shapes and styles and have other light conditions.The artist aims to investigate how participants split their attention among these brushstrokes.In previous work, we studied user attention modelling and eye-gaze based community generative art [7,32].In this paper, we are focusing on user interactions related to walking and navigation.
The virtual environment was also designed based on the artist's requirements on how the artwork should be perceived and interacted by the audience besides other environmental settings such as lighting and scaling.The VE consists of a 3D room that has black walls, with a paint pallet as a floor (as shown in Figure 1) where the painting is placed.The room is scaled to suit the artwork as well the painting pallet is adjusted to suits as walking terrain for the environment.The lighting is one of the most critical elements to imitate the appearance of the painting.It has been customised to produce a bright view over different brushes.The experiment has been arranged in a university public space.Since the experiment aims to study human mobility and behaviour, the authors emphasise having appropriate space and conditions to achieve the research aim.The VIVE Pro EYE comes with two tracking-based stations that could cover a distance of up to 4 meters, which determines the research space as 16 square meters as shown in Figure 2. We consider the design of physical space to match the virtual space of the artwork.The benefit of this matching is that it elicits a more immersive experience while participants are navigating in the VR.

Physical Space Observations
The experiment space has been surrounded by belt barriers that stop participants from travelling beyond the edges of the physical experimental space.Safety measurements have been carried to ensure sufficient precautions for participants as part of the experiment has been done during the COVID-19 pandemic.Overall, the experiment attracted 35 participants, 20 female and 15 male (Figure 3).The user information shows that the vast majority of the participants are aged between 16 and 25.More than half of the participants stated that they do not play or rarely play computer games (MD -Many times every day, OD -Once a day, OW -Once a week, RL -Rarely, NA -Not at all).Regarding their experience with VR, 15 had not tried VR before, while 18 had some experience.Only 2 participants claimed to be very experienced with VR.Similarly, only 3 participants, who studied Fine Art, had extensive knowledge of abstract painting, while 18 participants were familiar with this form of artwork (Figure 3).

User movements
The experiment led to a dataset that includes: head orientation, position, eye tracking, and hand tracking [33].The dataset was gathered using a range of sensors, including headset, position-tracking base, and Leap Motion.The raw data has been collected as follows: Head orientation is one of the headset parameters.It represents the head rotation in the virtual environment using four coordinates(x,y,z,w).The player's position is represented in two vectors in the virtual and physical world in (head x, head y, head z) and (player x, player y, player z) consecutively.The Pearson correlation between these vectors is 0.99, which indicates the tracking of the player in the virtual and physical world is highly mapped mobility.Each frame captures these vectors data during the experiment time.This data is also labelled and timestamped to have a recorded journey for the participants.
Figures 4 and 5 show the walk paths from participants.The blue lines mark the edge of the artwork at ground level.The area to the left of a blue line is where the artwork resides while the area to the right of the blue line is the open space.The orange lines refer to the traces of users walk inside the virtual environment.All users started the work on the outside of the artwork.The two coordinates used to generate this figure are (head x, head z); which represent the head position within the experimental area.It is evident in the figure that participants had different and distinctive walking patterns.Some participants preferred to stay within a small area and mainly viewed the artwork from a distance as how they would behave in a physical art gallery (e.g., p3703 and p1679).Some others enjoyed exploring wider areas by chose to stay on the outside of the painting and avoided too much direct virtual contact with any brushstrokes (e.g., p2654, p7613 and p7075).There are also participants who were very adventurous and walked very deep into the artwork (e.g., p3425 and p4786).
The lack of user interactions from some participants reflects a major challenge in designing VR applications in an open virtual environment.We believe that many participants did not have sufficient knowledge and confidence in navigating the VE, especially when exploring an unfamiliar environment such as a new abstract VR painting.We analysed the raw data from the participants and compared it to the recorded comments and interview questionnaire.Some participants prefer to dive into other objects and attempt to interact by touching brushstrokes.Also, during the walk, participants sometimes lower their bodies to have a different view of the artwork which is reflected in the changes of their head elevation (head y).There also appears to be a connection between the change of the head elevation level and the intensity of eye-gaze when participants lower or raise their heads to get closer to some objects in the scene.The natural walk generates small elevation waves that can be recognised as a walking pattern.This pattern can help differentiate between walking and head movements while standing.When participants navigate in the VE and change their location using a free walk, their views of the artwork will also change accordingly.The construction

Data processing
Data processing is one of the essential operations implemented on the data to prepare it for modelling.The dataset was gathered from multiple sensors at various time points during the experiment.One of the first priorities is to ensure the data from all devices is synchronised.The synchronisation level ensures the timestamp is fixated over multi-sensors capture.All data have been translated to match the game time to map participants actions to their interactivity.Data pre-processing such as handling missing values, removing anomalies, bias and outliers are applied to prepare the data for the next stage of processing operations.Human-data processing is a complex task.
The collection of such data usually operates in an experimental environment.For instance, data cleaning includes removing redundant data and repetitive attributes from the dataset and ensuring the data have the appropriate format.We considered removing incomplete data entries from the dataset instead of using filling techniques to extrapolate any missing values.Filling techniques are not appropriate for human activity data.
Taking out the outliers in the data is challenging as the measurements fluctuate throughout the experiment, so there is no stable reference level for walking data flow.There are different causes for the sensors to generate outliers values such as bystanders occasionally blocking the sensor during the experiment, participants out of the tracking range, electromagnetic interference or geometry asynchronous.We have applied statistical moving windows with fixed duration over the data to extract outliers in the data series to improve the quality of the dataset.To ensure all the data from the sensors have the correct timestamps and are all sequential, we followed mining techniques that verify these data based on a statistical approach.We have considered a threshold to maintain the difference in body velocity.For each position in the data series, we created two windows with the same size of data frames, and considered a static size for the window to ensure the size covered is relevant to the distribution of the data.The edges of the window have no sharp change to the root variance of the window.We calculate the previous and next window velocity in equation 1. and compare the changes of the current participant's position P c to previous P c−1 and next P c+1 positions are in range of directions to the w p , w N .While we verify the change of velocity based on calculating of the root variance for changing the locations over the playtime as in equation 2.
Where w p , w N are previous and next window consecutively, L refers to window length, c refers to current position index,∆v pi refers to change of velocity for positions in the window, t time during ∆v pi .
Where σ 2 is square changes in mobility of participants, n is total captured frames per a participant.P are positions set where each P has (x, z), µ is the mean of set P .Following the data pre-processing stage, features extraction was performed on the raw experimental data.The raw data collected from the experiment are captured per frame (in Figure 6(a)).At different time slices, we found different time frames captured from the experiment.This can result in bias results as the density of data may vary across different experiments.To remove the bias, data samples are aggregated using windows with a duration of one second.In each window, we calculate the statistical features for modelling.As a result, we have a dataset with the same data samples across all participants based on a unified timestamp as shown in Figure 6

Clustering Walk Data
To investigate how participants navigated in VR and how they changed between stand, walking and elevating in different locations, we defined three primary keys behind the mobility of users in VR which are: 1) user's personal background such as their previous VR and gaming experience, 2) the virtual environment, and 3) the physical setup.The virtual environment includes the design of virtual elements and environmental characteristics.At the same time, the physical setup includes the headset and the physical space that accommodates the experiment.In this research, we employed these keys in the experiment to develop a dataset that reflects the use of virtual and physical spaces.
Participants moved their locations to explore different parts of the artwork; thus, any single position in the VE could be an individual viewpoint with different brushstrokes and environment conditions.Neighbouring positions show similar views as a group (scene) but the views are distinctive between groups.In order to comprehend the relation between the VE and participants, we mapped the participants' locations in the experiment into clusters using Kmeans clustering to find the most visited scenes in the VE.The decision to use K-means was based on the nature of data as we are using a Cartesian's coordinates (x,z) for grouping.We experimented with various clustering configurations to investigate the effect of the generated scenes.The process took into account three factors: the distribution of data among these clusters, the distance among the clusters' centroids, and the diameter of the clusters.The efficiency of the clusters was measured based on the W CSS (within-cluster sum of squares) value to test the number of clusters from 1 to 50 clusters, as shown in Figure 7.A lower W CSS value indicates improved clustering results but normally at the cost of a larger number of clusters.We found that clusters with more than 40 groups suffer from small clustering diameter, resulting in many minimal virtual space and a too fragmented floor space to develop a useful model.For clustering with less than 20 groups, the W CSS values are still quite significant and they may not separate distinctive scenes.For clusters where groups > 28 and < 32 , we found a good balance between W CSS and the number of clusters as they have an area between 5f t 2 to 9f t 2 per cluster.
We have used 30 clusters to maintain the three-factor we elected to choose the number of clusters.As shown in Figure 8, 30 clusters help to reveal more data pattern when building user paths as navigating in a such number of cluster help us to understand user mobility.The format of the current data is the coordinates in the space in addition to cluster number.Each cluster indicates a different view that a participant visited or may visit in the future.We also ensure the clusters cover the entire physical and virtual space.The traces of

Deep Learning Modelling
The DL modelling aims to discover a common pattern that can capture and simulate user mobility in VR.The advantage of deep learning is that it allows us to make predictions about complex problems that require discovering hidden patterns and features in the data.We experiment with predicting users mobility in VR based on users' previous walk steps.Both data from the clustering process and the original data are used to develop different DL-based models in order to compare their performances.Our data is time-series, where each tuple is linked to the next.At the current stage, the data is represented as a flow path drawn from one cluster to another by an arrow.The data has been mined to represent a one and multi directions path to easily create subpaths that help models mobility.After the processing and clustering stage, the dataset has a fixed data density across all time slices.Each data tuple represents a participant's data with a one-second duration.The data structure consists of spatial coordinates, time, cluster-ID, and participant ID.We believe that users' previous locations are a deterministic factor for their future movements.Therefore we consider a series of historical movements for the prediction of a user's next location.Locations can be defined as the view or group which has been generated from the K-means clustering.The prediction will be based on participants navigation among the clusters (views) and predict the next potential cluster (view).To achieve this goal, we have a full path for 35 participants navigating in VR.We considered formatting these paths into smaller subpaths that can be used to feed the deep learning model.Each subpath consists of a sequenced data flow that refers to several clusters that a participant has visited.Any node on the path is considered as a time-step that a participant has generated from the experiment.subpaths are formatted to avoid repeating clusters within a single subpath, so each subpath contains unique clusters.
The modelling of subpaths has been tested on various configurations of the numbers of clusters to generate a series of subpaths.The data has limited paths and views, so generating subpaths with a high number of time-steps (nodes) results in fewer total generated paths and more complex mobility patterns.A lower number of time-steps in generated subpath can lead to a model with low machine learning efficiency.We have considered a four-step sequence to build the path of the participants walks in the experiment.The four-step path refers to four different sequenced clusters that the participant has visited.The prediction will be the next cluster (fifth cluster) in the data based on the timestamp to act as the predicted class.
We developed different DL models with various criteria to test different prediction techniques' learning efficiency and data performance.We started with a recurrent neural network (RNN) Long Short Term Memory (LSTM) as we deal with time-series data in this modelling.In the first attempt, we consider clusters as categorical data without taking into account their physical locations in space and converted data into a One-hot encoded vector, which is an approach commonly used for deep learning.Each vector contains elements equivalent to a cluster-ID.The vector element corresponding to each cluster is set to one while all other elements are set to zero.The DL model is structured to receive N of clusters as input C t−n , C t−n+1 , C t−n+2 , ..., C t and predict the next cluster C t+1 , where t refer to the timestep, and n refers to number of sequential clusters or nodes in participant path, as shown in Figure 10.The One-Hot encoded time-steps produced average results.With the configurations of different numbers of clusters, from 1 to 40, but this didn't enhance model performance as the model reach the peak of prediction where C n=30 .The one-hot inputs have a low influence on modelling the mobility where it results in an average of µ = 0.32 for testing accuracy.The Top-K validation is fairly high to be considered for the prediction as it results in an average of µ = 0.64 accuracy, using C n±1 , W here n = 30.The consideration of Top-K accuracy lies behind the high number of classes where Where n = 30 and it perform higher than the state of the art where the prediction probability is equivalent to P (C t+1 ) = 1 n .We then piloted different techniques to model the data by using Geo-VR location.Instead of treating clusters as categorical data, their coordinates are used as the input.The Geo-VR location consists of the coordinates (X, Y ) for the VE floor, which can be mapped to the physical space coordinates.Geo-VR location includes more information of the spatial relationship between clusters.The prediction in this technique is to use these coordinates as an input for the DL and predict the next potential cluster, as shown in Figure 11.The trial of Geo locations has run over the same procedure of generating different numbers of clustering and manipulating model layers.The Geo-Locations have shown a better performance over the one-hot encoded clusters as the testing accuracy average µ = 0.66 while Top-K accuracy (0.99), using C n±1 , where n = 30 as shown in Figure 12.
It is discovered that one of the challenges for accurate prediction is the time-step prediction in the correct order.For instance, the deep learning model may have a very accurate prediction of the next 10 movements from a user but the results can sometimes be out of order when comparing with the actual movements done by the user.We have applied two additional approaches to The top-K checker is more efficient when calculating the model accuracy as most of the top(K) predictions have a true class in the data label.The Top-K checker is more valuable to restrict the options for the potential prediction.It can be used for recommendations on the top(K) predicted classes that can be considered the range of the most likely next views.Based-Nearest Destination is an approach we introduce to enhance and evaluate walk prediction.This approach uses the futures visited views to the prediction accuracy.We assume the number of future places is equivalent to the number of assigned time-steps for model input.The Based-Nearest Destination has significantly improved the model, resulting in an accuracy average µ = 0.90.We also tested using only neighbour clusters for prediction, which resulted in an average accuracy of µ = 98.The neighbour clusters are selected based on the last cluster in the input sub-path, clusters have varying numbers of neighbours, as shown in Figure 8.Using the trained output, we identify the Top1 neighbour cluster from the prediction to evaluate the model performance.
Patterns prediction have also been tested on Feed-forward Dense Network (FDN).We build an FDN model to perform a pattern prediction on the same data modelling.The flow of generating clusters and different FDN configurations are applied to maintain the influence on model performance.The prediction of FDN has a lower accuracy than the recurrent neural network LSTM (Figure 13 ).The FDN model has been used with both types of data input, the one-hot encoded and coordinates.The examination of FDN shows a better performance for the coordinates rather than the encoded input; however, the overall level was found to be 38%, lower than that of LSTM.The impact of the number of K-means clusters on prediction results is also studied.Although 30 clusters were chosen as the result of our clustering analysis, we experimented with predicting participants' movements using a higher number of clusters.This led to very complex movement patterns and no improvement to the model performance.Furthermore, gender information was used in combination with the walk data in an attempt to enhance the model based on the hypothesis that there is a correlation between gender and user interaction in VR.However, gender did not show any significant impact on the model performance.This means that male and female participants didn't exhibit significantly different mobility patterns while exploring VR paintings.

Evaluation
To evaluate the prediction model, we used unseen data samples to test model accuracy.The testing data have been processed in the same methods of testing data as well as the data have been clustered to be connected to the nearest centroid in the training data clusters to ensure the testing data belong to one of the previously trained clusters.The testing data will be modelled into subpaths that have the same length of the trained data then it will be fed into the pre-trained model to obtain the predicted class.Figure 14 shows the original mobility (left side) of a holdout dataset for a group of participants compared to the predicted mobility (right side) using the deep learning model.The figure shows clearly that participants consider different navigation patterns while walk in the environment.Some participants have considered shorter paths to move in the environment like Figures 14(b) and 14(a), thus limiting the visited area of exploring.Some other participants have walked deep into the VR artwork and reached the edges of the tracking area (physical environment) as shown in Figures 14(f), 14(e) and 14(h).The model prediction shows outstanding results in predicting participants' different travelling patterns in the VE based on the first few steps of their movements.

Discussions
The experimental environment produced a unique dataset that allows us to track user activities and behaviour in a VR environment.The deep learning models show an outstanding performance in modelling and predicting user movements in VR environments especially when no clear walking paths and directions were given to the participants.The current modelling consists of different sequential steps that need to be applied in order to obtain the required results.Different models and techniques have been used to improve the prediction.We reached the best performance for the model using clustering, Geo-VR locations and LSTM.
The deep learning model has its limitations.There is a potential "cold start" issue, similar to that of a recommendation system, that the model cannot draw good inferences when it has not yet received sufficient information.Very short mobility can lead to poor prediction as shown in figure 14(b), where the participants are not willing to explore more views in the scene then the model may not predict the right next view to be visited.In addition, a participant visited clusters A, B, C, then ending in B will cause a loop in mobility in the input pattern.Looped movements can lead to excessive travels between neighbouring clusters in prediction as shown in 14(f).
One of the main use cases of our deep learning model is a virtual tour guide that recommends areas for exploration to new visitors of a VR exhibition.The recommendation will be based on the first few movements of the visitor and the modelling of user movements from previous visitors or artists.While our results show a high performance in model predictions, the integration of such a model in a virtual environment for recommendation requires additional considerations.Firstly, the model's Top1 suggest for the next location (cluster) may not be immediately next to the user's current location.So the application will need to provide a path for the user to travel or different suggestions can be used from the model's TopK results.Secondly, the model may suggest a path that includes loops (users returning to one of the previously visited locations) based on data from previous users.The loops can be avoided by considering the prediction with the lowest loss to replace the prediction of the model C n+1 ; Where C n ̸ = C n+1 .
Our work encapsulates a sequence of data processing and modelling steps.Many detailed configurations and machine learning hyper-parameters were tailored for the specific VR environment, physical space, and user interactions.However, the entire process from raw data acquisition to data pre-processing and deep learning can be automated without human intervention.This is particularly important when the system is deployed to support a public VR exhibition where expert support is limited.

Conclusions and Future Work
The understanding of user interactions with the virtual environment has a pivotal role in the design and development of future VR applications.We conducted a VR user experiment to model user free walk mobility in a VR art exhibition.Based on the analysis of complex user movements, range of machine learning techniques have been used to define the scenes of user interest in VR in order to capture user mobility patterns.An LSTM-based deep learning model is developed to model and predict participants' movements during VR art encounters.The model shows a good performance in predicting user's future navigation movements based on their previous locations.The model can greatly benefit artists' understanding of the audiences' interactions within the artwork while supporting the development of new applications such as community-based navigation, virtual art guide, and virtual audience.Future work will investigate additional use cases beyond abstract VR painting.We will also investigate the relationship between real walk, eye gaze and hand gestures captured during the user experiment.

Fig. 1
Fig. 1 Virtual and physical environment for the abstract VR painting experiment

Fig. 2
Fig. 2 Floor arrangement for the experiment

Fig. 9
Fig. 9 Traces of participants' movements based on clustering generated views, Numbers 1 to 30 represent the ID of the generated cluster/view

Fig. 13
Fig. 13 Validation accuracy of the FDN model