1 Introduction

Recent advances in technology have led to virtual, mixed, and augmented reality (VR, MR, and AR) applications becoming more affordable and available. Aside from gaming and entertainment, they are used in many other sectors such as healthcare, education, industry, and tourism, to name a few (Tom Dieck and Jung 2018).

Particularly, in cultural heritage, Innocente etal. (2023) proposed a framework study to identify the key features of headset-based extended reality technologies used in this domain that boost immersion, sense of presence, and agency. They also highlighted core characteristics that favor the adoption of these systems over more traditional solutions as well as unsolved issues that must be addressed to improve the user experience. Pillai and Mathew (2019) focused on the applications of VR in healthcare and presented their benefits in different areas in this field. Adriana etal. (2022) reviewed the use of these technologies in different settings of Industry 4.0 and analyzed trends and research directions. The application of extended reality technology in sport was reviewed by Le Noury etal. (2022).

Moreover, the combination of these technologies with gamification and serious games provides engaging and motivating training scenarios (Cipresso etal. 2018). Focusing on healthcare, rehabilitation has become one of the principal applications of gamified VR technology. The development and use of commercial and custom-made VR systems for rehabilitation has increased significantly (Afsar et al. 2018; Huang et al. 2022). Evidence demonstrates that these applications improve motivation, adherence, and training dose among patients (Borrego et al. 2018; Lee et al. 2020; Weber et al. 2019), resulting in increased neural plasticity that enhances patient recovery (Tran et al. 2016; Törnbom and Danielsson 2018). In addition, the proposed systems have been customized to target specific patients’ disabilities including reduced motor function, mobility, postural control, or cognitive impairments, among others (Törnbom and Danielsson 2018; Lohse et al. 2014).

Focusing on VR-based rehabilitation, carried-out sessions use to be in the form of gamified exercises (grouped in sessions) that take place in a virtual scenario. The patients using input devices such as head-mounted-displays (HMD) interact with the scenario to achieve predefined goals while receiving feedback as output via visual and audio channels, vibration, or even olfactory cues, depending on the devices. The patient’s interaction requires some specific actions whose repetition will contribute to the recovery. The complexity of these actions depends on patients and pathologies and provided feedback is a key component to be considered when preparing rehabilitation exercises. Feedback is defined as the response to players’ interactions that inform about their performance to improve this. There are different theories on how and when feedback has to be presented. These can take into account different aspects such as the goals and tasks, the context, the target population, or the game environment  (Johnson etal. 2017). Feedback strategies include immediate-based feedback that returns information according to immediate actions, outcome- or summary-based feedback that returns information at the end of the exercise, or feedback as a guide to indicate to the user the actions that have to be carried out. There are also different strategies to present this feedback such as text messages, sounds to indicate correct/incorrect actions, images or videos presenting the actions to be carried out, or vibrations when using complementary devices.

In VR-based rehabilitation scenarios using HMD, feedback can be decisive in solving common issues that truncate patient performance. Particularly, during observation in different VR rehabilitation sessions in real clinical settings three categories of issues have been detected that are related to (i) the position of the hands with respect to HMD; (ii) the interaction between the hands and the virtual objects; and (iii) the patient’s impairments caused by a medical condition (Quintana etal. 2023). To solve them in the majority of cases the rehabilitator needs to interact with the patient and re-start the sessions.

Rehabilitator interactions disrupt patient performance, resulting in a loss of concentration that may contribute to patient frustration. In this paper, to reduce interruptions during rehabilitation sessions, the use of non-photorealistic rendering techniques as a feedback strategy to overcome detected patient limitations is proposed.

Non-photorealistic rendering (NPR) is a sub-field of computer graphics aimed to produce graphical representations that emphasize the expressive and illustrative aspects of a scene by applying techniques from paintings, cartoons, or drawings. Its final goal is to simplify or exaggerate scene features to convey a particular emotion, artistic style, or mood. These features can be exploited to emphasize parts of VR scenarios to tackle detected problems. However, the integration of these effects in a VR environment is challenging. To tackle this problem, a new system that integrates NPR techniques in the context of VR-based rehabilitation sessions is proposed. The system provides rehabilitators with an authoring tool that treats rehabilitation exercises as a set of components such as background scenes, non-interactive, and interactive objects and allows the application of NPR styles to each component according to rehabilitator interests. The main advantage of the system is that allows non-technological skilled professionals to use NPR as a new feedback strategy to reduce patient interruptions during rehabilitation sessions. The paper aims to present this system and the different experiments that have been carried out to demonstrate its good performance.

Besides this introduction, the paper has been structured as follows. In Sect. 2, related work is presented. In Sect. 3, the proposed system and the tests that have been defined to evaluate it are described. In Sect. 4, the obtained results are presented and discussed. Finally, Conclusions and Future Work are given in Sect. 5.

2 Related work

Our proposal builds on related research involving visual feedback strategies in virtual reality (VR) rehabilitation sessions, NPR techniques in VR scenarios, and authoring tools for non-technological skilled professionals.

2.1 Feedback strategies in VR rehabilitation scenarios

Repetition, feedback, and motivation are the three main concepts to be considered to design an effective rehabilitation session (Schonauer and Pintaric 2014). Gamification combined with task subdivision has been proposed as a strategy to increase motivation and support the repetition of rehabilitation sessions. Moreover, to properly perform these sessions need to be complemented with feedback to support patients during task execution. In virtual scenarios, feedback aims to reproduce the audio, visual, and haptic feedback provided by rehabilitators in real scenarios. The required balance between the amount of information, the time, and the form in which feedback will be presented makes the feedback integration challenging. It is necessary to control the level of disruption the user will experience and also the adaption to the context according to the target audience, rehabilitation goals, expected outcomes, etc.

The feedback that reaches the user can be visual, auditory, haptic, or a combination of these. Our interest has been focused on visual feedback in VR-rehabilitation scenarios where only HMD is available. In these cases, it is complex to identify the position of the hand with respect to the scene objects which can be placed at different depths. The absence of haptic feedback makes basic interactions such as selecting or grasping objects difficult and some form of additional visual feedback might improve the task by providing further cues. Basic visual feedback techniques include changes in the object’s colors  (Poupyrev etal. 1998), the use of shadows to display relative distances (Hu etal. 2000), the illumination of nearby collision areas (Sreng et al. 2006) or the vibration of the virtual hand when touching an object  (Prachyabrued and Borst 2015). Vosinakis and Koutsabasis (2018) evaluated four visual feedback techniques based on object coloring, connecting lines, shadows, and object halo for virtual grasping with bare hands. They concluded that coloring feedback techniques, with color or halo, are more usable than shadowing and no feedback, while the line technique often distracted users rather than contributed to usability. In this paper, we want to explore, in the context of VR rehabilitation sessions, the use of NPR techniques as a visual feedback strategy.

2.2 NPR in VR

One of the main advantages of using virtual reality for rehabilitation is its capacity to simulate real-world settings, allowing patients to practice real-life activities in a secure and controlled manner. Research has demonstrated that realistic environments enhance patient engagement and motivation during their rehabilitation process and consequently, the development of high-fidelity scenarios has become a continuous focus of research and development. However, there are situations such as visual impairment or cognitive rehabilitation scenarios where realistic scenarios can distract patient and simplified environments to focus patient attention on specific parts of the scene are preferred. It is in these cases where NPR techniques can be of great interest.

NPR techniques abstract or simplify scenes, making them less realistic and more artistic to enhance important information while hiding non-important one (Winnemoller etal. 2006; Akhtar and Falk 2017). As a result, faster visual recognition and better memorization can be obtained (Winnemoller etal. 2006). The benefits of image abstraction for improved perception have been known in psychology since long time ago (Ryan and Schwartz 1956). From a theoretical perspective, Viola et al. in Viola etal. (2019) considered NPR as a visual abstraction from a more photo-realistic (PR) visual representation to a less one, while a visual abstraction is defined as a transformation from data (in that case the photo-realistic image) to its visual representation with some information loss. Despite NPR benefits, few attention to its application to VR environments for health has been given. Particularly, Sauvaget etal. (2014) considered the potential of NPR in applications for visually impaired people, and Wattanasoontorn et al. evaluated the use of NPR vs PR in a serious game for cardiopulmonary resurrection (Wattanasoontorn etal. 2013).

NPR research has mainly focused on its application to images and animations. For instance, Magdics etal. (2013) reviewed the use of NPR in films and videogames, and developed a post-processing NPR library for Unity 3D, that allows changing in real-time the visual language of a videogame. This library was subsequently used in the development of educational videogames (Hernández etal. 2014; Hernández et al. 2015) and will be the basis of our proposal. Tokunaga etal. (2010) discussed the 3D perception of virtual contents using NPR techniques when used in the context of stereoscopic information visualization. Fajnzylber etal. (2017); Fajnzylber (2020) considered NPR as an extension to cinematographic language and studied its interaction with stereoscopy in VR 3D scenes and in cinematography. They aimed at studying the perception effects of NPR with the help of eye-tracking measurements (Fajnzylber etal. 2018). Curtis etal. (2020) introduced tools for their non-photorealistic animation short film to balance the screen-space rules of a 2D visual style against 3D motion coherence, and account for stereo spatialization and interactive camera movement. More recently, Doi et al. applied NPR in the context of global illumination (Doi etal. 2021). In our case, we want to exploit the benefits of NPR as a strategy to focus the attention of the patient in VR rehabilitation scenarios to overcome some of the limitations the patient can experience.

2.3 Authoring tools

Generally, VR applications are created by experts, with advanced knowledge of programming, 3D modeling, geometry, etc., using specific software tools such as Unity (Unity Technologies 2023), Unreal Engine (Epic Games 2022; Crytek GmbH 2022), etc. These tools include visual programming editors that make VR content creation easier, faster, and more efficient, although they still require a high level of technological knowledge to use them. To help programmers in the creation process, there are also immersive authoring tools that allow them to create VR content while immersed in the virtual environment. In the context of industry, for instance, Cassola etal. (2021) proposed a novel immersive authoring tool for experiential and situated learning in VR-based training. Focusing on academic content creation, López etal. (2017) reviewed VR tools for the creation of virtual learning environments analyzing how these kinds of tools can be combined with authoring tools in intelligent tutoring systems that present to students realistic, interactive, and immersive educational content. Zhang and Oney (2020) introduced FlowMatic, an immersive authoring tool that allows programmers to specify behaviors that react to discrete events such as user actions, system timers, or collisions and also provides primitives to create and destroy objects, abstract and re-use functionalities, and import three-dimensional models. More recently, Coelho et al. (2022) surveyed the existing literature on authoring tools for immersive content creation analyzing its features and how they are evaluated. In the context of rehabilitation, Quintana etal. (2022) proposed easy-to-use editors for the experts to prepare rehabilitation exercises that will take place in an expanding and varied open virtual world created using procedural generation strategies. Focused on non-experts, Sanna etal. (2016) proposed leap embedder (LE), a graphical editor to associate hand gestures/poses with interactive digital content actions without the need for writing code. The editor enables non-experts to create their own interfaces and helps programmers reduce implementation times. Later, A visual editing tool (2020) proposed the visual scene editor (VSE), an evolution of the LE, to allow users with little to no programming skills to create 3D interactive applications by combining available assets through an interactive visual process. More recently, Kuhail et al. (2021) analyzed different visual programming language tools considering different dimensions such as interaction style, target users, domain, platform, etc. and characterized how visual programming approaches empower end-users to develop applications.

Note that the majority of proposals have been designed focusing on the programmers and less attention has been given to the final users (rehabilitators in our case). In most cases, final users have been considered during requirements gathering to detect the needs of the application area and also during testing (Kuhail et al. 2021). Moreover, due to the novelty of the approaches, the integration of authoring tools for these final users has received little attention (Sanna etal. 2016; A visual editing tool 2020). As a consequence, modifications to VR scenarios generally require continuous interaction with the programmers. To overcome these limitations an authoring tool for final users, such as rehabilators, to integrate NPR effect into VR scenarios is proposed. Different from other proposals that focus on visual programming tools (Kuhail et al. 2021), our interest is oriented toward the configuration of existing algorithms already implemented into the system.

3 Materials and methods

In this section, the proposed system to integrate NPR effects in the context of VR rehabilitation scenarios is presented. To start, we will focus on requirements gathering. Then, the basis of the proposed system will be described as well as the improvements that have been required to support VR scenarios. Finally details on testing are given.

3.1 Requirements gathering

A group of experts have collaborated in requirements gathering. Requirements have been defined considering the needs of patients and rehabilitators assuming that patients perform rehabilitation tasks in a HMD-VR scenario following the indications of rehabilitators. For our purposes, low-cost devices and off-the-shelf solutions have been considered as the preferred ones to ensure the availability of the proposal.

Previous to the requirements definition, it is important to consider the structure of virtual scenarios where rehabilitation takes place. We consider these scenarios composed of multiple types of objects, some of which are part of the background environment, and others part of the exercise. The exercise objects can be further separated based on their interactivity, with non-interactive objects such as the exercise surface or a target, and interactive objects, which are the things that need to be manipulated as part of the exercise’s objective. These three categories of objects, background, non-interactive, and objectives need to be able to have different types of effects applied, in order to configure feedback strategies. Because these categories roughly correspond to different levels of attention from further away to closer, they will be referred to as layers.

Taking into account these considerations the following requirements have been defined.

  • With regard to the Authoring Tool for the rehabilitator to integrate NPR effects in the VR scene, the system has to:

    • Allow the rehabilitator to create and edit NPR presets that give the desired feedback.

    • Display a preview of how the NPR effect will look during the exercise.

    • During exercise creation, allow assigning objects to the different layers of a virtual scene.

    • During exercise selection, allow assigning an NPR preset to each layer.

    • Apply the chosen NPR preset to each layer when the exercise starts.

    • Do not apply the NPR effects to objects from other layers.

  • With regard to the rehabilitation session for the patient to perform the rehabilitation exercises, the system has to:

    • Work with a Virtual Reality Head-Mounted Display (VR-HMD).

    • Apply the NPR effects to both eyes.

    • Ensure that the image is consistent across the eyes.

  • With regard to performance, the system has to maintain a fluent frame rate to avoid breaking the immersion, while running in a system with CPU, GPU, and memory capabilities corresponding to the upper mid-range hardware available in the market today.

3.2 Proposed system

Taking into account described requirements, a new system to support NPR in the context of VR scenarios is proposed. The system is an extension of Magdics etal. (2013) post-processing NPR library for Unity 3D, which allows changing in real-time the visual language of a videogame. For the sake of comprehensibility, a quick overview of the Magdics et al. approach will be given focusing on their limitations in the context of VR and the proposed solutions that will lead to our proposal.

3.2.1 The Magdics et al. approach

Fig. 1
figure 1

The main stages of the Magdics etal. (2013) post-processing NPR library for Unity 3D that allows changing in real-time the visual language of a videogame

The main stages of the Magdics approach are illustrated in Fig. 1 and described below. For more details please see (Magdics etal. 2013). As shown in Fig. 1, given an image, the approach requires as input the color and depth information in the form of two textures and returns as output a modified version of the color information with the NPR effects applied. To obtain this output the approach applies the following stages (labeled with letters to follow Fig. 1 notation):

  1. a.

    Flow Field. Take the color information from the rendering, and generate a temporary texture with the flow field, which contains vectors indicating the direction of change and magnitude of this change, of the visual information in the image.

  2. b.

    Edges. Use the color and depth information from the rendering to calculate edges. It consists of the following rendering passes:

    1. 1.

      Simplify the image that will be given to the edge detection passes. This step is equivalent to the simplification step in Stage (c), but with a separate configuration block.

    2. 2.

      Apply an edge detection filter that uses the image color information to detect edges based on the magnitude of changes in the image. To maximize the effectiveness of the edge filter, use the image flow information provided by the Flow Field stage (Stage (a)) so that the filter is applied in the direction of flow.

    3. 3.

      Apply an edge detection filter that uses the depth information to detect the contours of objects and overhanging areas within objects.

    4. 4.

      Finally, combine the results from both (b.2) and (b.3) by taking the number at each pixel that has the biggest edge value.

  3. c.

    Shadows. Extract the shadow information by comparing the color information to a rendering pass that excludes shadow rendering (c.1), and in (c.2) it applies contrast and quantization effects to the shadows and highlights.

  4. d.

    Simplification. Apply a simplification effect to the color information, reducing the complexity of the image, and reducing the number of color shades. It consists of three passes:

    1. 1.

      Calculate the gradients of the image, or the rate of change of the color information, based on the Flow Field (stage (a));

    2. 2.

      Use the gradient information to apply a threshold such that high-detail areas are preserved, and low-detail areas are smoothed;

    3. 3.

      Reduce the number of color shades.

  5. e.

    Desaturate. Use the color (the result of stage(d)) and depth information to apply desaturation effects to the image. It has two modes of operation: (i) desaturation applied based on the screen position, such that it is more intense the further away from the center of the image; (ii) desaturation applied based on the distance from the camera, such that far away objects are more desaturated. In both cases, the desaturation intensity, rate of change, and the number of stages (continuous, 1 stage, 2 stages,...) are configurable.

  6. f.

    Compose. Take the outputs from each of the previous stages and compose them into the final image. During this composing, a number of additional effects are possible. For the edges (stage (b)), it is possible to configure the edge color that will be used for drawing edges. For the shadows (stage (c)), it is possible to configure the color of the shadows. For the image content (stages (d) and (e)), it is possible to indicate if the content will be displayed as-is, replaced by a texture pattern, or by a fixed color. In addition, when the image content is replaced, the original color of the image (the result of stages (d) and (e)) can be used to draw the edges instead of the fixed color.

3.2.2 Limitations of the Magdics et al. approach

Unfortunately, when trying to apply the Magdics et al. approach to VR scenarios, different limitations appeared (see Fig. 6).

  1. 1.

    The approach was designed for a legacy method of rendering and newer methods exist that may provide direct advantages and will be more future-proof. Therefore, for its VR support it is necessary the architecture of the post-processing feature to be redesigned while maintaining the same data flow.

  2. 2.

    The approach was not designed to support an environment separated into layers. Therefore, a new masking feature and a new composing stage are needed to support VR rehabilitation scenarios with scenes composed of background scenes, non-interactive, and interactive objects as main layers.

  3. 3.

    The approach was not designed for VR and some effects cannot be applied as-is because the stereoscopic rendering results in the output being different for each eye. Therefore, a modification to support stereoscopic rendering where object-space coordinates instead of screen-space are used. In addition, the shaders need to be modified to account for the presence of two eyes, which requires using some helpers provided by the Unity engine libraries.

  4. 4.

    The NPR effects have a high impact on performance and memory usage, which is compounded when stacking. Therefore some optimizations are required to tackle this problem. Particularly, a selective reduction in the resolution of intermediate render textures will be proposed to fix both factors.

3.2.3 The Magdics et. al approach extension to support VR

Taking into account detected limitations an extension of the Magdics approach is proposed to support NPR in the context of VR scenarios. This new approach solves the limitations previously described. A global view of the new system is illustrated in Fig. 2 where proposed modifications are represented as pink boxes.

In addition to the modifications needed to support VR, the approach will be updated to the modern rendering system present in the Unity engine, the Scriptable Render Pipeline (SRP) (Scriptable Render 2023). Specifically, the Universal Render Pipeline (URP) (Linowes 2020; Universal Render Pipeline 2023), an implementation of SRP designed to be flexible for use on a wide range of devices, has been selected. Moreover, two approaches have been considered for applying different NPR effects to different sets of objects. The first is to use contextual information such as material or depth, to segment the image (Kim and Lee 2020). The second is to render different objects in different cameras and compose the result. Since the object sets need to be defined by rehabilitation experts in the production environment, camera layering has been preferred to image segmentation.

Fig. 2
figure 2

The proposed system to support NPR in VR scenarios obtained from the Magdics et al. approach. Pink boxes indicate the modifications that have been carried out to overcome the limitations of original approach

  1. 1.

    Support the Universal Render Pipeline (URP). In contrast to the original system that was built on a procedural paradigm (where a single method call was able to perform immediate actions on the images and execute render passes) the URP system is more structured and requires some changes. Particularly, two main core modifications are needed: (i) changes to the code structure that performs the processing stages, and (ii) changes to the shaders that implement the algorithms. Regarding the code structure, the URP system starts with a RenderFeature, which configures a number of ScriptableRenderPasses. The passes declare all the temporary textures they will use, and when executed, enqueue a sequence of actions into the Command Buffer. To adapt to this system, each of the stages of the original system has been converted into a separate ScriptableRenderPass. The Render Feature can be added to a URP Renderer configuration, and the shaders can be assigned to it for use by the Render Passes. Regarding the algorithms, in order to adapt the shaders that implement them two changes are needed: (i) the same shader logic could be used but with updates to the use of engine-provided data and functions which are different in the URP system, and (ii) a new logic had to be applied for the vertex shader since the URP system uses a single triangle for performing a full-screen blit operation, instead of a rectangle composed of two triangles as in the legacy system. In addition, a last modification related to how the NPR configuration is assigned to the camera is needed. The original system used a single script to attach the configuration and logic to a Camera. Because the Render Feature is now a separate asset object that can be selected in multiple cameras simultaneously, it is necessary to separate the NPR configuration from the Render Feature. A new configuration object can be attached to a Camera which is queried by the Render Feature when setting up the passes before rendering a frame. Note that the changes described here did not result in changes to the structure in Fig. 2 because the stages and passes are still performed in the same processing flow.

  2. 2.

    Camera Stacking. To extend the original system in order to support multiple user-specified layers, an additional feature to allow selecting which objects are included in each layer, and rendering each layer with different settings has been implemented. The URP system allows for cameras to be defined as Overlay cameras, and added to a camera stack on top of a Base camera. This feature has been chosen as the means to implement the layering feature, such that all objects not kept in the background layer, will have an overlay camera defined for them, with a separate NPR configuration attached (see Fig. 3). Note that the camera stacking renders each camera on top of the existing image, optionally clearing the depth buffer before the new layer. This allows for different layers to render efficiently, but it means that the NPR feature cannot differentiate which pixels belong to the current camera’s render output, and which are from a previous layer. To tackle this problem, three main modifications are required: (i) a new mask texture to allow distinguishing the pixels that should be included in the effects; (ii) a copy of the original image color for the composing step; and (iii) adapt the corresponding passes to use these new textures. Regarding the mask texture, a new render stage before the Flow Field has been added to the system (represented as Generate mask in Fig. 2). This stage creates a mask texture by performing a new render pass that draws all the objects of the current camera’s object set, using a shader that marks all the pixels on which geometry has been rendered. The result is the mask texture which indicates in white the pixels that belong to an object of the current camera layer, and in black all the pixels that are from a previous layer. Regarding the image color, a new render stage (represented as Copy color in Fig. 2) has been added which makes a copy of the original image color, so that it is preserved and can be used by the Compose stage (stage (f)) to select between the original color and the color that has been processed by the simplification stage (stage (d)) and the desaturation stage (stage (e)). Regarding the stage and pass modifications, the following changes have been performed (represented as pink boxes in Fig. 2): (i) the Edge stage (stage (b)) has been modified to only sample texture pixels that are included in the mask, so that areas in a previous layer are considered to be a different color for the purposes of image edges (b.2), and are considered to have different depth for the purposes of geometry edges (b.3); (ii) the Compose stage (stage (f)) has been modified to use this mask. The edge color pass (f.1) already has the edge sampling masked, and it is wanted for edges to be able to draw on top of masked areas since the edges can extend outside the objects they belong to; the shadow color input (f.2) needs to have the mask applied, to avoid coloring shadows outside the objects that are part of the current camera; and finally, the image contents input (f.3) needs to have the mask applied so that only image pixels belonging to the current camera have the simplification (stage (d)), desaturation (stage (e)), and color replacement effects (stage (f.2)) applied.

  3. 3.

    Stereoscopic Rendering. VR requires rendering two eyes in order to allow stereoscopic vision to function. Stereoscopic rendering can be performed in multiple ways. Particularly, the URP system supports three methods: (i) the Separate mode uses two cameras designated one for each eye; this gives the most flexibility, but has the biggest performance hit; (ii) the Multi-pass mode uses a single camera, but draws the scene twice using texture arrays. The performance is improved versus the Separate method but still draws twice. (iii) the Single-pass instanced mode uses a single rendering pass, but draws each object twice in the same call, once for each eye; this reduces the number of draw calls, and can perform the culling and other optimizations only once, achieving additional performance over the other two, at the loss of some flexibility. The URP system provides a set of helpers for the shader language, which enable distinguishing between eyes during the rendering. Implementing these helpers is critical in order to support Multi-pass and Single-pass Instanced modes, so all the NPR shaders have been updated to support these helpers. Additionally, some of the effects do not work correctly when applied to stereoscopic rendering. The desaturate algorithm, which is used for the Desaturate stage (stage (e)), has a radial mode that uses screen coordinates in order to apply the desaturation. Because stereoscopic rendering has two eyes offset from each other, this results in the desaturation area mismatch between the eyes. A new method has been designed to replace this effect, which uses the view position (the position relative to the look direction of the virtual head) in order to calculate the distances. This allows the same desaturation intensity to be applied to both eyes, in both the radial and depth modes.

  4. 4.

    Performance. The NPR system has a high-performance cost (Magdics etal. 2013), and a high video memory usage. This cost is compounded when accounting for the high resolution of VR headsets, stereoscopic rendering, and multiple layers. A number of optimizations are needed in order to have real-time performance when the NPR effects are used. Particularly, in order to mitigate the performance cost, the system has been improved to support distinct render texture sizes for each stage and each pass. Additionally, each ScriptableRenderPass defines the temporary render targets it requires ahead of time, where the legacy system was able to request them and discard them on the fly. Because of that, the URP system cannot safely reuse those rendered targets between stages in an automatic way. To further reduce the video memory usage, a new helper has been created to declare and track which render targets are used only within a single Render Pass, so that the same render target can be reused in subsequent passes.

Fig. 3
figure 3

Camera stacking where the different layers of the image are represented as a base camera with a number of overlays to support the application of different NPR effects in the same scene

As a result of all these modifications the proposed approach is able to efficiently support NPR in VR scenarios. For the sake of usability, and especially focusing on non-experts on the topic, a user-friendly authoring tool with the different NPR effects that can be applied to the virtual scene has been implemented. For more details of this authoring tool please see Sect. 4.

3.3 Testing scenario

The system has been evaluated from a technical point of view and also from the perspective of the end user, in our case, the rehabilitator responsible for creating the rehabilitation scenarios. The role of the patient has not been considered. For the tests, a standard PC with an Intel Core i7-12700F processor and an AMD Radeon RX 6700 XT will be used, along with a Meta Quest 2 headset.

3.3.1 Technical evaluation

Image quality and performance will be considered for the method evaluation.

  • Image quality will be evaluated using the no-virtual reality scenario illustrated in Fig. 4 that contains a variety of shapes and textures at different depths. Since texture size directly impacts image quality, our evaluation will also consider texture size as a key factor. Note that to have real-time performance when the NPR effects are used in VR, the system has been improved to support distinct render texture sizes for each stage and each pass of the proposed approach. The reduced resolutions will be configured as a denominator of a fraction of the full resolution of the screen or HMD with 1/1 for full size, 1/2 for half, and so on. Therefore, to evaluate the impact of this optimization, the reduction in render texture resolution will be tested by iteratively configuring increasing denominators and comparing the results visually.

  • Performance of the system is affected by three factors:(i) the number of NPR effects active at the same time , i.e., the number of NPR effects in each layer independently of the number of objects that are in each of the layers; (ii) the number of layers in the camera stack, and (iii) the resolution of the render textures. The impact of each of these factors on the test scene of Fig. 5 will be evaluated in terms of frames per second (fps) and considering different texture resolutions (full vs reduced resolution), different numbers of layers (from 1 to 5), and the use of NPR effects (a single vs all the effects). For the tests, VR and non-VR scenarios will be considered and the applied configuration will be accepted if a minimum average, 90fps for VR and 60fps for non-VR, is achieved (Bhattacharya etal. 2021). Additionally, to test the stability of the fps measure, the first percentile will be provided. This will be considered acceptable if it is no lower than two-thirds of the average.

Fig. 4
figure 4

The test scene used for the image quality evaluation where a background layer with desaturate and simplify effects; b decorations layer with a simplifying effect; c interactive layer with edges effect; d final image

Fig. 5
figure 5

The test scene used for the performance evaluation of the proposed approach where a corresponds to background, b to non-interactable objects, c to interactable objects, and d the combined scene

3.3.2 End-users evaluation

Regarding end-users, three rehabilitators (R1, R2, and R3) will be considered. R1 is an experienced rehabilitator with more than 20 years of experience who belongs to the stroke rehabilitation team from our reference hospital. R1 delivers rehabilitation when the patient is in the hospital, i.e., during the first five to seven days after a stroke. R2 is also an experienced rehabilitator with more than 20 years of experience who belongs to a rehabilitation center that delivers rehabilitation after the hospital stay. Both rehabilitators have experience using computer-based rehabilitation tools as final users but have no technical computer knowledge. R3 is a less experienced rehabilitator, only five years of experience using computer-based rehabilitation tools and with no technical background

The rehabilitators will be asked to prepare a session using the software to which the proposed authoring tool has been added. This software allows selecting a rehabilitation scenario, configure the objects the patient will interact with, and the patient interactions. To add NPR effects to this scenario, no information will be given to them but only access to the NPR authoring tool with a brief introduction to the interface and the help description that can also be accessed from this interface. To assess if the information provided in the interface is enough to apply the NPR effects, the tool will be evaluated in independent sessions under the observation of a member of our development team who will identify issues and points at which the rehabilitation professional had difficulties, focusing on the NPR configuration step. The time spent in the configuration screen and the clicks done during the configuration session will be measured, and compared with the minimum and maximum number of clicks, i.e., when at least one option has been enabled and when all effects are enabled with advanced options, respectively. At the end of testing sessions, rehabilitators will be interviewed to collect their impressions.

4 Results and discussion

Results will be presented and discussed focusing on technical aspects, including image quality and performance, and end-users’ impressions of the system.

4.1 Technical evaluation

Fig. 6
figure 6

In the first row, NPR effects obtained with the Madgics et al. approach applied to the testing image: a scene as rendered by the legacy rendering system; b scene with the NPR effects applied. In the second row, NPR effects obtained with the proposed approach applied to the testing image: c scene as rendered by the URP rendering system; d scene with the NPR effects applied

4.1.1 Image quality

The proposed approach has been conceived as an extension of the Madgics et al. approach to solve its limitations when applied in VR scenarios. To illustrate the good performance of our approach in Fig. 6 the testing image with NPR effects using the Magdics et al. approach (first row) and using our approach (second row) are shown. From these images, it can be observed that outside the context of VR, the effects give equivalent results. Note that there are small differences in the way URP renders the scene using different lighting algorithms than the legacy renderer. Therefore, we can conclude that the proposed approach is able to reproduce in VR scenarios the NPR effects obtained with the Madgics et al. approach.

With regard to the texture size, the size for each render texture has been tested by iteratively increasing each denominator. The obtained results are illustrated in Fig. 7 and summarized in Table 1. From Table 1, it can be seen that: the Mask stage is somewhat sensitive to the resolution. A resolution 1/2 of the final target can be used, but only with thicker edges that will cover up the masking errors, and a full-size texture is recommended for any effect that does not include thick edges; The Flow Field stage can have a texture that is up to 1/8 of the original without affecting the results visually; The Edge stage is highly sensitive to resolution, and 1/2 is only possible when very soft edges are wanted. Keeping the original size is necessary otherwise; The Shadow stage can have the texture reduced to 1/2 or 1/4, if smoothing and quantization are applied to the shadows; The Simplify stage can have the texture reduced to a fraction proportional to the intended smoothing level, with 1/3 with smoothing level 2 looking visually similar to the full size at smoothing level 5, but with some loss of detail. A reduction to 1/2 reduces the loss of detail and may be preferable; the Desaturate stage is very sensitive to the texture resolution, and needs to be kept to the same resolution as the output of the Simplification stage, or full-size if simplification is not enabled; and the Compose stage is very sensitive to the texture resolution, and needs to be kept at the full resolution.

Table 1 The impact of texture reduction in each step of the method with columns representing the method stage, the sensitivity of the step to this reduction (possible values are no/somewhat/highly), the fraction of reduction respect to full size that is supported, and some restrictions to be considered
Fig. 7
figure 7

Effect of texture reduction on the rendering where from left to right the original image, with a reduction, and the optimal reduction are illustrated

Fig. 8
figure 8

Performance of the system for a VR, and b non-VR scenarios considering different texture resolutions (full vs reduced), different numbers of layers (from 1 to 5), and the use of NPR effects (a single vs all the effects)

Fig. 9
figure 9

Stability of the system, in terms of first percentile of the fps relative to the average, for a VR, and b non-VR scenarios considering different texture resolutions (full vs reduced resolution), different numbers of layers (from 1 to 5), and the use of NPR effects (a single vs all the effects)

4.1.2 Performance

The three factors that affect the performance of the proposed system (number of NPR effects (Single vs All); camera stack layers (from 1 to 5); and texture resolution (Reduced vs Full)) have been evaluated for the VR and non-VR scenarios.

Fig. 8 shows the fps average for all the tested cases in both virtual reality and no virtual reality scenarios. The dotted red line indicates the minimum fps to be considered acceptable (\(fps<90\) for the VR scenario, and \(fps<60\) for the non-VR one).

Focusing on the No NPR group of Fig. 8, for the VR scenario represented in Fig. 8a, a consistent 433fps is observed (higher than the minimum of 90fps) and, for the non-VR scenario represented in Fig. 8b, values range from 906 to 914fps (higher than the minimum of 60fps). Note that in a rehabilitation environment, the PC monitor would be limited to 60fps and the VR headset would be limited to 90fps; in the actual rehabilitation higher numbers would reduce the power usage of the device but not affect the execution speed. With regard to the video memory usage, it was observed a baseline of 4.5Gb when VR is enabled. While the fps performance is very good in the baseline, it can be seen that the VRAM usage is already high even without NPR effects.

With regard to the number of effects for one layer, it can be seen in Fig. 8a that, as expected, applying only a single effect (Single/Full) has a much lesser impact than applying all the suite of NPR effects (All/Full) at the same time, showing 116fps when a single layer is used with a single effect (which is above the threshold of 90fps), compared to 42fps with a single layer with all of the effects (which is below the threshold). This means that without other optimizations, only a single NPR effect can be achieved in VR.

With regard to the number of layers, the test has considered scenarios with 1 to 5 layers in the camera stack. The results show that the increase in the number of layers has an effect even without VR, but the effects for VR are a lot more critical due to the higher fps requirements and higher processing needs. As an example, looking at Fig. 8 All/Full, the 3 layers group shows a fps value of 13fps, compared to 42fps with only one layer. With a single effect (Single/Full group), three layers shows a performance of 45fps compared to 116fps of the single layer. Additionally, the video memory usage grows to 7.7Gb, or 3.2GB over the baseline when using three layers of all the NPR effects. While using multiple layers is better for visual quality, the performance without additional optimization is too low, and the video memory requirements are very high.

With regard to reducing the texture size, after applying the reduced texture configuration as described in the image quality results (Sect. 4.1.1), from Fig. 8a, it can be seen that up to 3 layers are possible with full NPR (All/Reduced) in VR without going under the 90fps minimum, but more layers cause the average fps to be too low. With a Single NPR effect (Single/Reduced), the performance is much higher, and even with 5 layers an average of 96fps is achieved. Considering the case with all the NPR effects and 3 camera layers, the total video memory usage in the tested scene has been reduced from 7.7Gb down to 5.2Gb, which is only 0.7Gb over the baseline video memory usage. This optimization has a big impact on the performance and video memory usage, and makes it possible to apply multiple layers even when all the NPR effects are used.

Additionally, to evaluate the stability of the frames, a first percentile value has been calculated for each of the test results and compared with the average to obtain a measure of the times where the fps is too low. It can be seen in Fig. 9 that all in both VR and non-VR scenarios, of the results pass the \(66\%\) threshold, indicating that the fps was stable across all tests.

From these results it can be seen that the NPR effects described by Magdics et al. retain a great visual quality, but this comes at a cost in performance and video memory usage. Multiple optimizations are needed in order to support the combination of stereoscopic rendering, and camera stacking. The reduction of the size of the temporary render textures is a great solution for this performance, even when the reduction is kept low enough to maintain visual fidelity. With regard to the number of layers, to ensure a proper performance it is suggested to use at most 3 layers, for example, those corresponding to background, non-interactive objects, and interactive objects. A fourth layer can be defined, but this will require that not all the NPR effects are enabled in all the layers, or that the size of the textures are reduced further, compromising the resolution.

4.2 Rehabilitators evaluation

Three rehabilitators, independently, tested the proposed system via the provided NPR authoring tool. This tool allows defining the NPR parameters for an exercise environment that has been previously created. As shown in Fig. 10, the authoring tool provides simplified and advanced options for each NPR effect, and allows defining the preset for each layer (see https://youtu.be/ely6JzyTPqU for more details). After the authoring tool was introduced to the rehabilitators, they were asked to prepare a rehabilitation scene under our observation. None of the rehabilitators asked for help.

Fig. 10
figure 10

The authoring tool to create and apply NPR effects has three main parts: Layers, in orange, to select the object layer to configure; Settings, in cyan, to configure the effects of the selected layer; and Preview, in yellow, to preview selected effects on the layer. Settings is further subdivided into four parts, each of which can be in simplified or advanced mode; in green, an example of advanced mode is shown

The rehabilitators were able to create VR scenes with NPR effects with no problems. They appreciated very much the possibility to previsualize the created scenes. After creating the scenes the rehabilitators were interviewed to justify the selection of applied effects. All of them agreed that effects were applied according to what is considered the focus of interest for the patient to perform the rehabilitation task. In Fig. 11 some of the created scenes are presented indicating the focus of interest to which the patient must pay attention. Particularly, in Fig. 11a a scene of a living room with an exercise area on a table where three trays are defined as objectives for three cubes of matching colors. The background is set to a desaturate preset, the trays are set to be non-interactive and set to not have NPR effects, and the cubes are set to have a white edge effect. In Fig. 11b a scene of a kitchen and some objects where the objective is to place the objects inside the sink. The environment has been set to a comic book-style NPR preset, the counter and sink have been set to have no NPR effect, and the interactive objects have been set to be outlined. In Fig. 11c a scene of a room with a desk and some buttons where the buttons light up one at a time. The environment has been set to have edges on a dark texture, the desk has been set to use a simplification preset, and the buttons are set to have no NPR effect.

Fig. 11
figure 11

Scenes created by the rehabilitators with NPR effects selected according to focus of interest

4.2.1 Usability

With regard to the usability, the time needed to set up the NPR effects and the number of clicks used were calculated theoretically and measured from the rehabilitators using a premade scene with three layers: background, non-interactive, and interactive.

Table 2 Theoretical counts collect the required number of clicks, assuming a process that starts by entering the configurator, editing settings, switching layers several times, and editing settings on the new layer, then finally leaving the configurator. First experience rows collect the number of clicks performed by each rehabilitator on their first attempt to configure the system

The user interface (see Fig. 10) consists of four sections, one for each of the NPR effects in the system. Each section has a simplified interface, and an advanced interface. The maximum, middle, and minimum number of clicks required to use them are collected in Table 2 (Theoretical rows). Considering the maximum clicks case the one where all the advanced options are visited, this case can be calculated to require 1 click to open the NPR configurator, 59 clicks to configure the NPR preset, and 1 click to leave the NPR configurator. For the case with 3 layers, this adds up to 178 clicks.

Considering a middle case where all the effects are configured but using the simplified interface, the result is 1 click to open the NPR configurator, 15 clicks to configure the preset, and 1 click to leave the NPR configurator. For the case with three layers, this adds up to 47 clicks.

Considering the minimum clicks case to be the case where a single effect is enabled and used with its default values, it requires only 3 clicks to open the configurator, enable one of the effects, and leave the configurator.

In their first attempt, the rehabilitators spent some time looking at how each option worked, but they avoided opening the advanced view, preferring to stay with the simplified sliders. As collected in Table 2 (First Experience rows), the times and clicks measured were 89 clicks and 173 s for the first rehabilitator, during which they configured three laters, 41 clicks and 68 s for the second, who chose to configure only one layer, and 67 clicks and 104 s for the third, choosing to configure two layers.

The rehabilitators were then allowed to explore the interface without having each click measured. It was observed that some of the rehabilitators explored the advanced options and the clicks were closer to the worst case, but they all chose to return to the simplified view by the end of their exploration.

When asked, the rehabilitators appreciated the simplified options, as the advanced options would become overwhelming, and expressed that the simplified options would be the ones used most of the time. They considered that it has the potential to help the rehabilitation process and improve the experience for some patients. They also considered that the tool can be of particular interest in patients with visual field disorders, as it would help to focus attention on specific points in the virtual scene.

In conclusion, the rehabilitators without any experience were able to navigate the interface and configure the NPR with little effort, relying on the simplified options. The rehabilitators rated the interface favorably and thought it will be a useful tool for rehabilitation.

4.3 Final remarks

The proposed approach satisfies the defined requirements and allows NPR effects to be ease integrated in VR scenarios. From a technological point of view it has been seen that it is viable to utilize NPR effects for feedback in VR scenarios, and the performance can be maintained within the required targets when utilizing a medium-high performance gaming PC. However, there are still some points that need further development.

Note that the evaluation of the proposal has been done from a technological point of view and focusing on rehabilitators. For the proposal to be complete it will be necessary to carry out a new evaluation focusing on the NPR effects that better fit patients needs in the context of HMD-VR rehabilitation sessions with patients. However, the time required to carry out these tests has led us to consider it as future work.

Regarding the application area, the proposed approach has been centered on rehabilitation scenarios since this is one of the main focuses of interest of our research group. However, our proposal can be applicable to any other context where objects of a virtual scenario need to be emphasized to focus the attention of the final user such as medical simulations or education scenarios. In fact, to apply the method it is only necessary to consider the decomposition of the scene in different layers. Even where the choice of background, non-interactable, and interactable objects, is not applicable.

Finally, although it was not the goal of the evaluation, considering the performance and the observation of the video memory usage it has become apparent that a computer with a modern high-performance graphics device with at least 8Gb of video memory will be considered the necessary target. This means that using the NPR effects on a standalone untethered headset would require additional compromises in quality, possibly using less advanced algorithms. Note that this value has been chosen considering a Meta Quest 2 HMD, and could be higher or lower depending on the screen resolution of the VR hardware.

5 Conclusions and future work

Feedback is a key element of VR-based rehabilitation sessions aimed to recreate the rehabilitator-patient interaction. Visual feedback refers to the information or stimuli provided to the user through the visual display of the VR system. In this paper, a new approach to integrate NPR effects as a new visual feedback technique for VR scenarios has been proposed. The system has been designed as an extension of the Magdics et al. approach to support VR requirements. Particularly, support for multiple layers in a camera stack, and support for stereoscopic rendering, have been added, and optimizations to texture usage and performance have been implemented. In addition, the system provides an authoring tool for rehabilitators to integrate the NPR effects into the scenes. The system has been technically evaluated to demonstrate its good performance. It has been seen that the full suite of NPR effects in each layer can be supported when a maximum of three layers in the camera stack is considered. The first group of rehabilitators that evaluated the system, have considered it a very valuable tool for rehabilitation sessions and the authoring tool very simple to use.

As a future work, we will focus on the evaluation of the system with a group of post-stroke patients to identify their NPR preferences according to pathology status. We also want to collect more rehabilitators’ opinions in addition to all the tasks presented in the Final Remarks section. Other potential improvements, such as the ability to select feedback strategies in real time during a rehabilitation session, will be considered. On the technical side, additional effects and feedback strategies will be considered.