Keywords

1 Background and Introduction

Wearable Immersive Virtual Reality (WIVR) offers interactive virtual experiences that simulate realistic or imaginary environments on Head-Mounted Displays (HMDs). These are worn on the head as part of helmets and have small display optic devices in front of user’s eyes that make users “feel inside” the virtual space. In the last years, the evolution of WIVR technology has made HMDs cost-affordable and much more comfortable than previous generation devices. In addition, 360° cameras have enabled the spread of realistic 360° videos that can be played on WIVR viewers. As a result, we witness an increasing popularity of WIVR technology and applications in various domains that range from gaming [24], entertainment [16], cultural heritage [14], and tourism [22] to education, professional training [12], therapy and rehabilitation [13].

An example of WIVR applications used for educational purposes, which is also one of the inspirations for our work, is Google Expeditions [17]. This is a mobile app, that allows groups of students to explore 360° video-based virtual trips around the world under the teacher’s control. The teacher can select one of the available tours that is automatically visualized on all the students’ HMDs and can put a marker on a specific point of the 360° environment to move their attention on that.

Currently, there are two types of HMDs. Embedded solutions provide a complete VR experience without needing any integration with external devices. Examples of these technologies are Oculus Rift [18] and HTC Vive [21]. Embedded solutions usually ensure a more accurate user interaction and a better graphical quality than modular solutions, but at a much higher cost. Modular solutions exploit external devices, mostly smartphones, as enabling technology for displaying the simulated world, and are much cheaper. Examples are Google Cardboard [20] (starting from 5$) and Samsung Gear VR [25] (starting from 79$). Viewers are composed of two biconvex lenses mounted on a plastic or cardboard structure available in different colors and shapes. The smartphone set inside the visor displays the visual contents as two near-identical bi-dimensional images: the illusion of 3D depth and immersion is the result of the human-brain interpretation of the stereoscopic effect generated by the viewer lenses (Fig. 1).

Fig. 1.
figure 1

On the left, example of Google Cardboard with a smartphone inserted. On the right, a stereoscopic view of a 360° video.

This paper presents the design and implementation of XOOM, a novel tool for the development web-based WIVR applications. Specifically, XOOM enables non ICT experts to develop wearable immersive virtual tours based on 360° realistic videos played on any smart-phone placed on Google Cardboard. Video materials can be any MP4 file downloaded from a video platform or recorded using a 360° camera available on the market. In XOOM applications, the user can change her/his visual perspective on the virtual space, move forward/backward across it, and interact with active areas and active elements that generate visual or audio effects. XOOM provides the functionality to import 360° videos, concatenate them, and superimpose active elements on the virtual scenes, so that the resulting videos and its interaction affordances are customized to the requirement of a specific user group. The VR user’s view can be shown on an external display (e.g., TV monitor or projection) through standard casting protocols over Wi-Fi. This enables other users to observe and participate in the WIVR experience, seeing what the VR user sees in the HMD. In addition, XOOM applications are integrated with an external web-based application that enable an external supervisor to (i) control the VR user experience, e.g., pausing, restarting, stopping and changing the video while it is playing in the HMD; (ii) add active elements at run-time. (iii) have a real-time perception of the user’s behaviors, e.g., to understand what the user wearing the HMD is focusing on the most. Finally, XOOM supports automatic data gathering and visualizations (e.g., tracking the user’s gaze direction) of relevant data about the users’ experience in the immersive VR space that can be inspected for analytics purposes, as well as for user evaluation (e.g., in education, training, or therapy contexts).

The remaining of the paper is organized as follows. Section 2 presents the design of tool in terms of the functionality available to designers. Section 3 describes the implementation approach and the software architecture. Section 4 reports a case study in which XOOM has been used in a therapeutic contexts to create WIVR applications for children with neurodevelopmental disorders. Section 5 discusses the contribution of our work. Section 6 outlines our future research directions.

2 XOOM: Design

XOOM is a web based software platform that, through various interactive tools, supports the creation, customization and analysis of web based WIVR virtual tours called “XOOM experiences” (X-Experiences). An X-Experience is defined as a sequence of one or more smoothly integrated 360° videos (hereinafter “scenes”) enriched with customized interactive elements. The user proceeds along the flow of scenes by effect of his/her specific interactions, or of the commands activated by another external user (if any) or both. An X-Experience is natively integrated with an external web application, for observation, monitoring, control, and evaluation purposes. XOOM distinguishes 3 types of users:

  • developers, who create an X-Experience; the creation functionalities are organized as a guided, easy-to-understand flow of actions and available through as simple user-friendly interface; therefore the developer does not need to be an ICT expert to use XOOM;

  • external supervisors (developer, evaluator, or caregiver), who monitor and control the end-user’s running X-Experience;

  • end-users, who experience the immersive VR environment on the HMD.

XOOM is made of Experience Manager and Experience Viewer. The Experience Manager is used by the developer and the supervisor to personalize the X-Experience. The Experience Viewer is used by the end-user and manages his/her interaction with the virtual environment.

2.1 Experience Manager

The Experience Manager integrates three main tools called Creator, Runtime Controller, and Analyzer that are used respectively before, during, and after the end-user’s experience.

The Creator (Fig. 2) allows the developer to create or modify a WIVR tour by:

Fig. 2.
figure 2

Creating an X-Experience in a museum using the Creator component of XOOM. On the background there is the original video, while the customization element (in yellow) is superimposed by the developer. Next to it there is the menu to set the element properties. (Color figure online)

  • Selecting 360° video fragments previously stored in the XOOM repository, and inserting them in the X-Experience under development. They are visible on the bottom part of the window and, by clicking on them, the developer can switch to personalize one or the other.

  • Reordering the video fragments, to create a flow of different “scenes” to be shown in sequence.

  • Setting specific frames at which the scene must pause at runtime, by clicking the Pause button in correspondence of that instant of the timeline. During the running experience, the video will automatically stop at that frame, allowing for example the end-user to focus on a point of interest before going on.

  • Personalize the video and its interactive experience by adding different types of items like arrows, masks, geometrical forms, textual popups, images. These can be: Static items that are overlaid on the video with no dynamic effects; Dynamic items, which trigger dynamic effects when activated (e.g., showing animations, zooming in-out, rotating); Control items that, when selected by the user, control the video execution, e.g., playing a paused video or moving from a scene to the next one. For each of these elements the developer can set position, dimensions, duration from a specific starting frame and its properties. For example, with the Highlight function, the developer can select an area of the current frame in a way that, at runtime, it will be lighted up, obscuring the rest with a mask. This functionality is useful to drive the user’s attention on a desired point, area, or object of the virtual environment (in the Creator this area is visualized as a yellow sphere).

  • Saving the created experience in the XOOM repository.

The Runtime Controller (Fig. 3) enables the supervisor to monitor and control the end-user’s running experience. He/she can see what is displayed to the user in the HMD using two windows on the screen. The control window (Fig. 3 – left) shows the complete 360° scene “flattened” so that the external observer can move freely in the timeline of the experience, without affecting the viewer. The monitor window (Fig. 3 – right) shows what the end user is currently seeing through the HMD. Note: this is note what is displayed on the HMD, but what the user actually sees. We inisit on this aspect since we provide the controller with the user’s field of view of the 360° video after the transformation performed by the human brain from the 2 almost identical images on the smart phone screen to a single perceived image.

Fig. 3.
figure 3

View of Runtime Controller of the running experience (under creation in Fig. 2), showing the effect of the Highlight function at runtime.

The Analyzer visualizes a 360° heatmap of each session and summarizes with different colors the points and areas of the scenes on which the patient focused his attention the most and the least (Fig. 4).

Fig. 4.
figure 4

Heatmap of an X-Experience; red areas are those where the user focused during the interaction (Color figure online)

2.2 Experience Viewer

The Experience Viewer manages the end-user experience: it allows end-users to select an X-Experience and interact with it. The Experience Viewer can run on any end-user smartphone placed inside a Google Cardboard (or similar HMD). The phone must be equipped with accelerometer and gyroscope sensors (needed to track the user’s head movements), Wi-Fi connectivity, and a mobile web browser (recommended: Chrome).

End-users can navigate in the virtual world by rotating their head, which will consequently rotate the virtual scene projected in the screen. Technically, the phone sensors track the motion and the orientation of the user’s head, which are interpreted as gaze orientation and focus. The head orientation defines the direction of the gaze and the center of the screen is the gaze focus. XOOM casts a ray that intersects the visible “scene” and derives from the ray collision what is being “pointed” – either an active object or the background scene – and at which time. If an active object is hit by the ray for 3 consecutive seconds, the system registers the willingness to interact with it and triggers the reactive behavior defined for that object.

3 XOOM: Implementation

From the software perspective, XOOM is a client-server web application (Fig. 5), in which two different classes of clients (the Viewer and the Manager) access the two previously specified modules on a web server which hosts our application and is connected to a cloud storage that stores created experiences and the related data.

Fig. 5.
figure 5

Component view of XOOM

The videos used for the creation are selected from the ones available on YouTube [26] and must be equirectangularFootnote 1. In case users want to employ their own videos, they are automatically uploaded to the video platform before being inserted into an experience. This means that XOOM privilegies videos from external sources (e.g. YouTube) to avoid overloading our system storage, if this is not possible for users (for example with no YouTube account) we provide them the sorage possibility after a registration to our system. The experience is saved as a JSON file with the timeline of each video in the sequence and all the customization elements added with their properties.

In order to start a session, the Manager (in particular the Runtime Controller component) and the Viewer must be univocally associated through a unique code. The connection between them is achieved using Firebase [19], a platform developed by Google that offers several independent features to develop mobile and Internet connected applications. In particular, we use the Real-time Database component, a cloud-hosted NoSQL database to synchronize in both directions the two modules of XOOM: the Controller sends the commands issued by a supervising user, while the Viewer sends synch-signals (time and orientation of the video) every 100 ms to enable the synchronization of the monitor window of the supervising user’s screen. The experience is not streamed from the Viewer to the Controller, but it runs in parallel in the two devices, showing the same images at any time thanks to the specified synchronization information. We chose this solution because it would have been complex to support a real-time streaming, especially in case of slow Internet connection. In a slow network scenario caching strategies may be implemented to temper the perceived slowliness of the bandwidth. The Real-time Database, instead, allows to obtain a fluid experience reproduction, due to the small quantity of exchanged data and the fast response time of Firebase. Direct communication between the two modules would have required to create an ad-hoc tunnel and so a more complex network configuration.

Moreover, the time and orientation values sent from the Viewer are saved by the application and will be used by the Analyzer to build the final heatmap of the current experience. The usage of external platforms, in particular YouTube, Firebase and a cloud storage improves the scalability of our system, allowing the management of a large quantity of data and users. The management of the virtual environment, both at creation time and at runtime, is supported by A-Frame, an open-source web framework for creating 3D and virtual reality applications with JavaScript and HTML [15].

4 Case Study

We have applied XOOM in our research on WIVR technology for persons with Neurodevelopmental Disorders (NDDs). NDD is an umbrella term for a group of disabilities that appear during the developmental period and are characterized by deficits and limitations in the cognitive, emotional, motor and intellectual spheres [2]. The use of WIVR as a therapeutic or education tool for people with NDD was explored in the past and abandoned because of the drawbacks of first generation viewers [13] (e.g., high cost, weight, motion sickness effects). With the advent of cheaper, more comfortable, and technically more accurate hardware and software solutions, WIVR has raised a growing interest in the NDD [6, 11]. Our research in this area have so far considered virtual spaces based on the contents, characters and environments of fantasy tales used with NDD children at the therapeutic centers that collaborate in our research. Storytelling plays an important role in educational practices, in particular for children with NDD, to promote from high-level to more basic skills, such as the ability to focus on what is most interesting at a given moment, generalizationFootnote 2 skills and development of appropriate elementary behaviour. A number of empirical studies at therapeutic centers, involving overall 8 children with NDD and 4 therapists, has proved the effectiveness of our applications to promote attentional skills and cause-understanding capability. For example, we have witnessed that the highlight effect helps the users to understand which are the important details of the scene. In fact, while in the first therapy sessions this customization effect has been necessary to guide children’s attention, in the following ones with the same X-Experience they have progressively learnt to focus on the same points without the help of the Highlight.

Three therapists at the centers we are collaborating with in a number of national and international projects have used XOOM to create a new class of applications for this target groups, called WIVR social stories. The term “social story” is used in NDD therapy to denote visual materials (paper- or video-based) that describe everyday life situations as simple short narratives (Fig. 6 - left). These tools are used in the treatment of persons with NDD, particularly ASD (Autism Spectrum Disorder), to help them develop social and practical skills and to learn appropriate behaviour and norms [1, 3, 11].

Fig. 6.
figure 6

An example of paper-based social story (left), a WIVR-based social story (right).

XOOM has enabled the therapists to mix the power of traditional social stories with that of WIVR. Using XOOM, they have created a set of WIVR-based social stories on everyday situations (e.g., going to school, shopping at a grocery store, visiting a museum). An example is visible in Fig. 6: therapists have recorded a 360° video in a supermarket and have built an X-Experience from it to teach patients to search and buy specific products. As shown in the mentioned figure, a PCS (Picture Communication Symbols)Footnote 3 has been inserted over the video, at creation time, to suggest the end-user to look for some fruit.

Transforming social stories into interactive immersive virtual narratives can increase the benefits of the traditional social story approach. WIVR-based social stories allow persons with NDD to train in everyday life’s activities in a secure environment, while VR headsets promote attention and engagement as they remove the distractions caused by external visual stimuli [5] and help users focus on therapy or learning tasks. The increasingly low-cost of WIVR technology paves the ground towards large scale adoption of this class of assistive applications, at therapeutic centers and in other contexts of life (e.g. at home and at school). The customization features offered by XOOM empower caregivers and give them a control of the patient’s experience with WIVR technology at a degree which is not allowed by any other existing tool. Finally, the nature of the contents exploited in WIVR social stories offers novel opportunities to give an active role to family caregivers in the therapeutic process. Patients’ parents can be involved to record videos in the real contexts of their children’s life to feed the XOOM video repository.

5 Contribution and Discussion

Our work provides several contributions to the current state of the art in WIVR.

The first contribution concerns the nature of the applications supported by our tool. The applications developed using XOOM are interaction-rich. The gamut of interaction affordances go beyond the typical virtual tour “navigation” of most 360° videos for HMDs, which support only changing of the visual perspective of the virtual space, or moving forward/backward across it. In XOOM applications the user can interact also with a countless number of “active elements” that are rendered as active areas or graphic elements superimposed on the video content and generate engaging visual or audio effects. In addition, while most existing interaction-rich WIVR applications are delivered as native apps, XOOM applications are web-based and embrace a VRaaS (VR as a Service) paradigm. Few years ago there has been a migration from desktop native apps to cloud served solution (e.g. SaaS), and now we witness a similar trend on mobile platforms. WIVR end users can derive a set of benefits from a web-based migration of this class of applications. They are not bothered with installing apps that they may visit only once. Web guarantees easy access to contents, skipping the installation process overhead (download, install, grant permissions, open) with respect to just clicking a link to open the VR environment in a browser. Web apps are intrinsically cross-platform, as they require a browser alone, and are agnostic of the underlying operating system, thus they are easier to distribute and maintain. Their distribution mechanism does not need a dedicated market place and the associated approval period to be published: they are accessible “as-is”. Nonetheless, web application must still face some challenges. They strongly depend on connectivity (even if service-workers in progressive web applications are remarkably addressing this issue). They have access to a limited set of hardware features from the web interface (anew this limitation is being tackled by the physical web open approach). Being so easily served, they suffer from a broader attack surface than mobile apps and may be more subjected to security issues.

As second contribution of XOOM concerns the original functionality of the applications created with the tool. XOOM application are natively integrated with features that supports participation to, control and analysis of the VR experience by external users. A control panel enables them to see what the VR user wearing the HMD sees and to customize the experience content and the video behavior at run time; existing WIVR applications support the run-time presentation of the contents of the HMD on an external display only through screen casting. In addition, XOOM applications are integrated with an interaction analysis tool. Both these features are particularly useful for caregivers in education or therapeutic contexts. Educators or therapists need to know what the learner or the patient is currently watching at within the HMD, so to be able to intervene at the right moment of the experience if needed. In addition, these stakeholders enormously benefit from automatic data gathering and visualization. Properly aggregated and presented interaction data offer useful valuable information that would otherwise been collected manually, through textual reporting or video analysis. Heat-maps for example offer insights to evaluate the user experience and the user’s progressions, enabling to identify where and when during the experience the user’s attention and concentration were lower or higher.

As a third contribution, XOOM supports an end-user development process (EUD) [9]. The EUD approach goes beyond conventional methodologies for the design of interactive systems since its goal is to shift control of application design, development, and evolution from skilled ICT professionals to people who are the key owners of problems in a domain, creating opportunities for extensions and modifications that are appropriate for those who need to make changes and are not technology experts. EUD tools provide non ICT professional with tools to customize applications or create brand new ones to support personal, situational needs in order to address the requirements of a specific domain and intended users. An EUD approach is particularly relevant in domains such as training, education, and therapy, where the capability of autonomous customization by therapists or educators is fundamental, as the value of any interactive technology in these contexts is directly related to its ability to meet to the specific characteristics and the needs of the target user group(s). There is empirical evidence for example that digital therapeutic tools for persons with disability that cannot be easily customized, or can be customized by ICT professionals only, are more often abandoned [10]. To our knowledge, the EUD approach has never been applied in the WIVR domain at the degree achieved in XOOM. The only similar idea comes from the previously cited Google Expedition, which is a WIVR application easily usable by non ICT experts. However, the EUD features of this platform are currently limited. In fact, as already said, teachers cannot create their own experiences but can only select one of the available “explorations”, and the only possible customization consists in the insertion of passive “markers” in specific areas of interest of the virtual environment. Starting from this concept, we have enriched our tool with a lot more customization possibilities from the insertion of images in the 360° environment to the ability to control the video execution through control items.

The fourth contribution of our work is from a software engineering perspective. XOOM integrates several software frameworks adopting a modular approach and organizing each module by feature. This modularity opens to a top-down problem decomposition. For example, concerning the graphical modelling, XOOM is powered by A-Frame framework, establishing a high level abstraction to address graphical virtual elements with as less complexity as possible; if A-Frame capabilities are not sufficient our system deals with the problem by leveraging Three.js underneath layer, and eventually, it can interrogate WebGL library for virtual content manipulation. This vertical problem resolution is accompanied with a horizontal procedure to interconnect modules in a seamless way (e.g. real-time adaptation of the virtual world). This is done by leveraging fast online synchronization. XOOM benefits from low coupling in the sense that each module is not strongly dependent on others: for example if the Runtime Controller fails, this failure will not affect the experience viewer module. Moreover, XOOM was designed with high cohesion in mind in fact we organized components by features (e.g. geometries folder to group shapes that can be added, shaders folder to implement the highlight effect, …). Following this approach XOOM results in a flexible, robust and easy to maintain application.

6 Future Work

So far XOOM has been used to develop WIVR applications in therapeutic contexts, as the ones described in the previous section. According to the feedbacks received by the specialists that have used XOOM and its applications, or tried them in demo events, XOOM may pave the ground towards large scale adoption of WIVR assistive technology at therapeutic centers and in other contexts of a patient’s life (e.g. at home and at school).

Still, the spectrum of XOOM application domains is much wider. People can use this simple tool to create videogame-like WIVR experiences, gaming gadgets in brick-and-mortar shops, educational contents for professional training, or virtual experiences in cultural heritage or tourism contexts. For example, visitors of a cultural heritage site can use on the HMD display to zoom-in and explore details of the physical space they are visiting and get insights on its artifacts. Customers at a travel agency office can experience a possible destination virtually through an HMD instead of just being presented with the photos of a location on a generic catalogue. Each situation in each specific domain requires different videos and interaction elements, but the generality and power of XOOM enables non-ICT-expert domain operators to create VR experiences that can be quickly adapted to every different goal and target.

Our research agenda in the short term envisions an improvement of the user interface, the full-size engineering of XOOM functionalities, and its extension. We will simplify the UX features of some existing functions (e.g., move, rotate and scale commands, which created some initial difficulty to the users in our case study). We will enrich the available customization features and their interaction properties. Finally, we will improve the management system that manages accounts, authentications and end-user profile. In addition, we have designed two empirical studies that will start in Spring 2017. The first study will explore the adoptability of XOOM in therapeutic contexts. We will systematically evaluate with 10 NDD specialists the usability of XOOM and will elicit the socio-organizational requirements for its deployment and adoption as commercial tool. The second study (controlled study) will involve 20 children with NDD aged 6–12 and will focus on the therapeutic benefits of WIVR-based social stories, also compared to traditional social stories approaches.