1 Introduction

Recent advances in immersive display and interaction technologies, such as head-mounted displays (HMD) and three-dimensional (3D) tracking sensors, have led to renewed interest in various research areas, especially outside entertainment-related contexts. Immersive Analytics (IA), concerned with the application of immersive technologies for the purpose of data exploration, analysis, and meaning-making, is one such research area (Skarbez et al. 2019; Dwyer et al. 2018). Among others, the utilization of immersive technologies for data analysis has the potential to increase user engagement (Büschel et al. 2018), promote user mobility (Fruchard et al. 2019), allow for the exploration of new data interaction approaches (Roberts et al. 2014), and enable the creation of virtual 3D data spaces to support collaborative decision making (Hackathorn and Margolis 2016). Within that context, the actual visualization of data in the virtual environment (VE) is arguably just one important aspect. Equal importance in such immersive spaces should be attributed to their interactive features, enabling and encouraging the analyst to actively explore and manipulate the VE rather than just passively consuming the visualization. Based on a recent survey, covering IA research from 1991 to 2018, Fonnet and Prié (2021) describe and discuss various aspects of such data interactions, highlighting the need for more guidelines and best practices as well as encouraging researchers to go beyond just basic interactions. The importance of interaction in IA systems has also been highlighted by Ens et al. (2021), deeming it a major topic within current IA research challenges. Several studies across various contexts have shown that aspects of 3D gestural input for the interaction with immersive data visualizations can be generally intuitive, engaging, and easy to learn (Huang et al. 2017; Wagner Filho et al. 2020; Reski et al. 2020). Nevertheless, there is still a need for further investigations, for instance, to more clearly determine what types of 3D gestural interactions users prefer (Fittkau et al. 2015; Streppel et al. 2018), or what kind of preferred user interactions are feasible to implement depending on current tracking capabilities (Austin et al. 2020). An underrepresentation of 3D gestural input as spatial interface modality for the interaction with 3D visualizations has also been recently identified by Besançon et al. (2021), particularly compared to tactile, tangible, and hybrid interaction paradigms. The authors emphasize the need for more research that investigate the utilization of 3D gestural input within the context of 3D visualizations and IA, in line with similar interaction-related conclusions as described by Fonnet and Prié (2021) and Ens et al. (2021).

This article aims to address some of the research challenges by reporting on the design, implementation, and evaluation of a user interface that aims to facilitate engaging interaction with abstract data visualization in an immersive VE based on 3D gestural input and HMD technologies. In particular, our research contributes to the emerging field of interactive IA as follows:

  • We report on the design of a 3D user interface (3D UI) based on 3D gestural input with a focus on hand-based grasping and gestural command techniques, aimed to allow for engaging hand interaction with time-oriented data in immersive virtual reality (VR).

  • We present an applied use case of mapping 3D interaction techniques, data analysis tasks, and aspects of hand posture comfort to the designed 3D UI, following an interdisciplinary research approach that is grounded in the literature to inform and guide the 3D UI design.

  • We present and discuss the results of an empirical evaluation of the developed 3D UI, allowing for reflections and considerations for similar future applications.

1.1 Design process and article outline

Fig. 1
figure 1

Illustration of the 3D UI’s design process in alignment with the references to the respective section in the article, thus outlining the article’s structure

As illustrated in Fig. 1, the design of the presented 3D UI for gestural interaction with time-oriented data in immersive VR is greatly influenced by relevant related literature—as presented throughout Sect. 2. In particular, we started by examining important aspects of 3D interaction techniques in general, both as means for feature classification as well as general terminological and conceptual foundations for the developed 3D UI. We continued by exploring related works that have examined the utilization of 3D gestural input, i.e., hand interaction or mid-air gestural interaction, as modality to interact with abstract data in immersive VEs. The examined literature provided inspiration and important findings, both in regard to what worked well and what could be improved, subsequently influencing the design of our 3D UI. Additionally, within the overall visualization context, we also examined existing data analysis task classifications with the objective to further conceptually categorize and describe the respective features of our 3D UI. Based on the gathered insights from the literature, we describe the design and development of our 3D UI throughout Sect. 3. We begin by setting up the overall data context and scenario, focusing on time-oriented data exploration using a 3D Radar Chart approach (Reski et al. 2020). We continue by presenting our adaptation of various data analysis tasks, based on the reviewed classifications, for the specific context of immersive interaction with spatiotemporal data. The remainder of the section describes the features design and development of our 3D UI. Thereafter, we describe the evaluation methodology with the objective to gather empirical insights throughout Sect. 4. Our evaluation is centered around a series of representative analysis tasks with the 3D UI within the scope of a laboratory experiment and focusing on subjective evaluation methods, applying a mixture of in situ observations as well as post-study questionnaires and semi-structured interviews. The results of the conducted study with a total of twelve participants are presented in Sect. 5. We continue by discussing and reflecting on our work in Sect. 6, particularly highlighting aspects of the 3D UI design related to hand-based grasping, gestural commands, and unintentional commands. Finally, Sect. 7 concludes the article by providing a brief summary as well as directions for future work.

2 Related work

2.1 3D interaction techniques

An extensive overview of 3D interaction techniques is provided by LaViola et al. 2017, Chapters 7–9, describing approaches and metaphors for selection and manipulation, travel, and system control interaction techniques. Particularly relevant in regard to 3D gestural input are grasping metaphors (LaViola et al. 2017, Chapter 7) that allow a user to simply grab, move, and release a virtual artifact as one would do in the real world, as well as gestural commands (LaViola et al. 2017, Chapter 9) that utilize hand postures (static) and gestures (moving) that are associated with features to control the state of the VE. It is important to differentiate between direct and indirect interactions: While direct interaction allows for immediate manipulation of an object, indirect interactions build upon some sort of middle layer for object manipulation, e.g., a representative proxy object or a virtual control widget (LaViola et al. 2017, Chapter 7). Arguably, direct interactions tend to be perceived as somewhat more natural than indirect ones, as they reflect more closely how humans interact in the real world. However, this does not mean that indirect interactions should be avoided. After all, interactions should aim firstly to be useful with regard to their intended purpose (Norman 2010). A survey of spatial interfaces for 3D visualizations was recently conducted by Besançon et al. (2021). The authors differentiate between various spatial interaction paradigms (tactile, tangible, mid-air gesture, and hybrid interaction) on the one hand, while examining the application for various high-level visualization tasks on the other. In particular, visualization tasks are categorized as (1) view and object manipulation, (2) defining, placing, and manipulating visualization widgets, and (3) 3D data selection and annotation (Besançon et al. 2021). Based on the reviewed works included in the survey, their categorizations provide valuable impulses that can be useful for the description and classification of future 3D UIs within the presented context.

Modern tracking sensors allow for interaction not just with one controller or hand but two (Bachmann et al. 2018), commonly described as bimanual metaphors (LaViola et al. 2017, Chapter 7) that can be categorized with respect to their symmetry and synchronicity (Ulinski et al. 2009). Pavlovic et al. (1997) reviewed aspects of 3D gestural input for application in human–computer interaction (HCI) in general, describing a gestural taxonomy that (1) differentiates hands and arm movements as gestures and unintentional movements, and (2) divides gestures into communicative and manipulative modalities. Beyond hand and gesture recognition as fundamental prerequisites for any 3D gestural input (Pavlovic et al. 1997), a computing system’s ability to successfully infer intent in regard to subsequent hand interaction is equally important (Nehaniv et al. 2005). For instance, under consideration of the respective in situ context, similar hand postures and gestures may be used for different types of interactions (Nehaniv et al. 2005). As such, Nehaniv et al. (2005) classified gestures to infer human intent as irrelevant and manipulative gestures, gestures as a side effect of expressive behavior, symbolic gestures, interactional gestures, and referential and pointing gestures. Rempel et al. (2014) provided considerations for the design of comfortable hand postures for the utilization in HCI contexts based on insights from sign language, among others to prevent physical fatigue symptoms. The authors recommend the use of comfortable gestures for more frequent tasks, while infrequent tasks may also be performed through slightly less comfortable ones (Rempel et al. 2014).

2.2 3D gestural input for immersive data interaction

LaViola (2000) describes an interface that utilizes a multimodal approach of 3D gestural input and voice commands to interact with a scientific data visualization in stereoscopic 3D. Different analysis tools can be attached to the user’s hands and moved around in the 3D environment. Interestingly, rather than selecting these tools from a graphical menu, they implemented voice commands that allow the user to say aloud the tool they want to interact with, following a “show and ask” metaphor. They also implemented several hand-based grasping configurations to provide navigation features, i.e., user movement as well as translation, scaling, and rotation of the VE. An evaluation indicated that their participants valued the tool’s ease of use after an initial learning phase. Their results also indicated that the voice command interface worked well in single-user scenarios, while having detection problems in collaborative ones that featured more auditory input. In turn, the design and utilization of hybrid interaction paradigms should be carefully considered based on analysis scenario and potentially synchronous collaborative aspects.

Fittkau et al. (2015) explored gestural command design for interaction with an immersive data visualization following the “software cities” metaphor, implementing several unimanual and bimanual hand commands to support translation, rotation, zoom, selection, and reset tasks. The results of their evaluation indicate that the users favored one-handed gestures (translation, rotation, selection) over the two-handed (zoom) one that was performed through a “rowing” motion. Interestingly, the authors attempted to utilize more elements of embodied interaction for the zooming command, such as rotating the user’s torso or walking back and forth in the VE. However, such movements would inherently result in a change of the user’s point of view, which were not appreciated during early design iterations. Their work points toward the careful consideration whether or not whole body interactions should be integrated in an immersive visualization tool as means of interactive manipulation—after all, it might be challenging for the user to observe the results of their manipulation since they are actively changing their own field of view at the same time.

Streppel et al. (2018) explored 3D interaction techniques within the “software cities” context similarly as Fittkau et al. (2015), comparing 3D gestural input, physical controllers, and virtual controls. Their results indicate similar preferences for 3D gestural input and physical controllers as opposed to virtual controls. Even though the physical controller condition received better usability scores, participants stated that they would rather like to use the 3D gestural input in a real-world scenario, as it was perceived as more natural and appropriate for interaction in a VE. The expressed desire for better 3D gestural input controls is quite interesting, indicating that more work in that direction should be undertaken to further improve the usability aspects of 3D gestural input in the context of IA.

Osawa et al. (2000) investigated hand-based grasping and gestural command techniques for interaction with an immersive graph visualization. Their system allowed the user to select and manipulate individual nodes of the 3D network (translate, lock position in space, adjust characteristics), to translate the user’s position in space (move), and to adjust characteristics of multiple nodes through a “spotlight” approach. The latter was operated by pointing one’s hand in the general direction of the desired nodes, and creating an arc-like spread by moving the index finger and thumb apart, enabling dynamic control of the included network nodes. The results of their investigation indicate an intuitive and more appropriate interaction compared to the implemented 2D interaction techniques, the authors also observed certain frustrations with the operation of gestural interface, particularly in regard the technology’s precision aspects.

Huang et al. (2017) reported on the design of a 3D gestural interface for interaction with graph visualizations in VR, conceptually similar to the work presented by Osawa et al. (2000), providing gestures to move and highlight nodes and edges (one-handed interaction), to rotate and translate the entire graph, and to group nodes (two-handed interaction). An evaluation, comparing the implemented gestures with more traditional pointer input (mouse), revealed positive trends toward the participants’ ability to manipulate the 3D graph with the gestures, stating that the interface “was intuitive, easy to learn, and interesting.” While their implemented node/edge movement and graph rotation gestures were appreciated for their learnability, some usability issues were identified for the highlight and group gestures that involved aspects such as holding a specific hand posture or performing a gesture very quickly. Consequently, the reported results by Huang et al. (2017) show that the interface’s technological capabilities as well as hand posture comfort considerations should be carefully taken into account when designing for 3D gestural interaction. This is arguably of special importance when anticipating that analysis tools are applied more frequently and over a longer time duration compared to “just a few minutes”-experiences.

A VR system developed by Betella et al. (2014) featured 3D gestural input for manipulation and filter operations within a large network visualization. Their interface utilized a hand-based grasping technique and asymmetric bimanual hand interaction, i.e., one of the user’s hands had a cursor function to highlight and select elements in the network, while the other hand was used to operate task parameters such as filter strength and complexity. Their asymmetric feature mapping strategy is interesting insofar that the authors differentiate between left- and right-hand interactions instead of following a symmetric approach where the same features are provided independently of which hand performs the posture or gesture. Consequently, potentially simply to detect and comfortable hand posture configurations could be easily reused and mapped to different features—once for the left and once for the right hand.

As part of interacting with an immersive 3D trajectory visualization, Wagner Filho et al. (2020) implemented a mixture of hand-based grasping (scale, translate) and gestural commands (single and double tap via index finger to inspect and select). They evaluated their system in comparison with a desktop one, revealing generally better usability scores for the immersive VE. Participants overall agreed that the 3D gestural input enabled them to easily and comfortably manipulate the data, resulting in an engaging and intuitive experience. Room for improvement was identified toward the index finger tapping that required to be comparatively precise, and toward the two-handed scale and rotation commands whose similar operation was sometimes perceived as too constraining.

Austin et al. (2020) investigated common hand gestures for the interaction with large immersive maps that are placed on a virtual floor. In particular, using a participatory design approach, their study participants were asked to come up with hand gestures for typical operations to manipulate the virtual map, such as pan, rotate, zoom, and marker interaction. Their results indicate that the participants most commonly proposed unimanual gestures for interactions such as pan as well as creating and selecting markers on the map, while proposing bimanual gestures for rotate and zoom operations. Austin et al. (2020) reflected on their findings, stating that identified user preferences for these gestural commands need further investigation in regard to performance-related matters, such as efficiency, accuracy, and physical fatigue. Austin et al. (2020) also reflected on potential feasibility concerns for some of the proposed gestural commands, stating that an accurate and reliable implementation based on current 3D gestural tracking sensors might be difficult. In regard to comparing the feature mapping to unimanual versus bimanual gestures, one could argue that features that are likely to be used less frequently (rotate, zoom) were mapped to bimanual commands. Nevertheless, bimanual gestural commands were also utilized in the interfaces as presented by, for instance, Wagner Filho et al. (2020) and Huang et al. (2017), demonstrating feasibility and appropriateness under consideration of some practical design and implementation aspects.

2.3 Data analysis task classifications

Each visualization should be designed to serve a specific purpose and to accommodate the analyst with the extraction of insights and information by completing desired tasks. Aigner et al. (2011, Chapter 1.1) summarized considerations for the design of information visualizations on a high level with respect to (1) what kind of data are visualized, (2) why are the data visualized, and (3) how are the data going to be visualized. From a user-centered perspective, the specification of the analyst’s tasks when interacting with a visualization is particularly interesting, i.e., with respect to why the data are visualized and what purpose the visualization serve the analyst. Ward et al. 2015, Chapter 1.8 and Aigner et al. (2011, Chapter 1.1) differentiated between three main purposes for the interaction with visualizations:

  • Exploration or Explorative Analysis: The analyst utilizes the visualization and its interactive features to explore an unknown dataset, and extract first insights and relevant information with no hypotheses given (undirected search).

  • Confirmation or Confirmative Analysis: The analyst utilizes the visualization and its interactive features to confirm or reject given hypotheses about a dataset (directed search).

  • Presentation of Analysis Results: The analyst utilizes the visualization and its interactive features to convey and present their findings in the dataset, such as concepts or facts, to an audience.

With respect to the actual design of a visualization’s interactive capabilities, Shneiderman’s (Shneiderman 1996) Visual Information-Seeking Mantra of overview first, zoom and filter, then details-on-demand is arguably one of the most famous design guidelines. Based on it, Shneiderman (1996) proposes seven abstract task types that should be supported by the visualization, namely overview, zoom, filter, details-on-demand, relate, history, and extract. Another approach by Munzner (2014, Chapters 1–3) and Brehmer and Munzner (2013), describes abstract visualization tasks as a multi-level typology, organizing tasks as to why and how they are performed as well as what a task’s input and output parameters are. With respect to why, Munzner (2014, Chapter 3) classifies user actions across four overall groups, i.e., (1) analyze (discover, present, enjoy), (2) produce (annotate, record, derive), (3) search (lookup, locate, browse, explore), and (4) query (identify, compare, summarize). Depending on the scenario and context of the interactive visualization, all these classifications have the potential to be informative for the development, either in isolation or as a mixed and multimodal approach. This allows for guidance and facilitation of the design process toward purposeful interactions with a visualization, and thus with data. Yi and ah Kang Y, Stasko JT (2007) reviewed a multitude of information visualization taxonomies with respect to their described interaction techniques. Based on their analysis of the literature, they synthesized a set of formal categories (select, explore, reconfigure, encode, abstract/elaborate, filter, connect, undo/redo, change configuration) to describe a user’s intent for the interaction with a visualization in general (Yi and ah Kang Y, Stasko JT 2007). Aigner et al. (2011, Chapter 5.1) further built upon these categories and adapted them to support the more specific context of interacting with time-oriented data, i.e., multivariate data where each data item features at least one data variable related to a temporal context. The utilization of such task categories allows us to conceptually categorize the interactive features of a developed data analysis tool, similar as presented by Büschel et al. (2018), and thus aiding the tool’s description accordingly.

3 3D UI design and VR prototype

As seen throughout Sect. 2, there is a multitude of aspects worth considering when setting out to design a 3D UI for immersive data interaction. We begin by describing details about the context and scenario in Sect. 3.1, providing an entry point to our VR prototype. Section 3.2 presents the data analysis task terminology that we adapted for the immersive interaction with spatiotemporal data. Design and motivation for the developed 3D gestural interface are described in Sect. 3.3, presenting an overview of all features with respect to relevant taxonomies. A brief summary of involved technologies and implementation is provided in Sect. 3.4.

3.1 Context and scenario

The focus of this article is to investigate a 3D UI design to support user interaction with abstract data visualization using 3D gestural input (hand interaction, mid-air gestures) within the context of IA. More specifically, we are interested in the interaction with time-oriented data in immersive spaces, a comparatively common IA use case (Fonnet and Prié 2021). For this purpose, we build upon the 3D Radar Chart approach as presented by Reski et al. (2020). Their approach allows for time-oriented data visualization in immersive VR, enabling the user to explore multivariate data in regard to spatial and temporal dimensions. Conceptually, a 3D Radar Chart consists of a central Time Axis with multiple Data Variable Axes arranged radially around it, each depicting a respective time-series visualization. A two-dimensional interactive Time Slice, illustrating the more traditional radar chart-like pattern (Kolence and Kiviat 1973), allows for temporal analysis of the values across the different data variables (Reski et al. 2020). A VE may be populated with multiple 3D Radar Chart instances, each representing a different entity in the data, e.g., a location, thus allowing for spatiotemporal data analysis. Figure 2 presents the described concept of a 3D Radar Chart, providing an excerpt of the VE from the VR user’s field of view.

Fig. 2
figure 2

Excerpt of the VE from the VR user’s field of view, interacting through 3D gestural input with a 3D Radar Chart (incl. Time Slice and juxtaposed Information Window)

The results of their initial study validated the visualization approach in general, indicating that the participants were able to explore and interpret the displayed time-series data using a first set of basic interaction features, such as selecting time events and time ranges. As part of their initial explorative interaction design, Reski et al. (2020) implemented alternatives for the interaction using hand-based grasping as well as system control (via graphical menus attached to the user’s virtual hand) techniques. However, no clear preference for one interaction technique over the other could be identified by Reski et al. (2020).

In comparison with their initial prototype (Reski et al. 2020), we rigorously iterated on the design of the interactive 3D gestural interface, subsequently resulting in various key differences as follows:

  1. 1.

    Rather than focusing on the validation of the visualization technique in the VE, the main objective of the presented work is concerned with the design and evaluation of the interface’s interactive features and capabilities.

  2. 2.

    Rather than relying on a basic set of interactions, we extended the interface’s feature corpus to support additional important tasks that are typical for the analysis of time-oriented data (see Sect. 2.3). These include foundational sort, filter, and zoom capabilities, as well as complementary system features to reset the visualization state and pause/resume interactions.

  3. 3.

    Rather than adopting a variety of replaceable interaction techniques, the presented 3D gestural input design and implementation focuses on hand-based grasping and gestural command techniques with the objective to provide a uniform interface approach, i.e., without the utilization of any alternative graphical menu-based system control techniques.

  4. 4.

    Based on the initial feedback of their visualization technique validation, various matters were addressed as quality-of-life changes with the intent to improve the 3D Radar Chart approach in general. For instance, to facilitate the user’s time range selection experience with the designed 3D UI, we provide a semitransparent uncolored preview of the data outside the selected time range instead of simply hiding the unselected data.

The 3D UI design process, as outlined in Sect. 1.1, is inherently based on the various insights extracted from the literature as presented throughout Sect. 2, the mappings of relevant classifications (data analysis task, interaction technique, hand posture comfort) to the individual interface features as means of 3D UI descriptors, and the subsequent findings obtained from the conducted empirical evaluation as reflected upon throughout Sect. 6, allow for the contribution to the IA community with further insights toward the creation of 3D gestural interfaces for the interaction with time-oriented data in immersive VR.

3.2 Data analysis tasks for immersive interaction with spatiotemporal data

Within the scope of the presented context, i.e., immersive interaction with time-oriented data, we concluded that none of the discussed data analysis task classifications in Sect. 2.3 was directly applicable without, at the very least minor, changes. To facilitate the description and classification of the features in our anticipated 3D UI, we decided to adapt the combined work presented by Yi and ah Kang Y, Stasko JT (2007) and Aigner et al. (2011, Chapter 5.1) toward the contexts of IA and the interaction with spatiotemporal data in VEs. We chose to adapt these classifications particularly due to the reason that the work by Aigner et al. (2011, Chapter 5.1) was already focused on the context of time-oriented data, in turn being the most specific and closest related one to the context and scenario of our investigation (see Sect. 3.1). We further hope that our presented adaptation can be applied as is, or iterated upon, by other IA researchers and practitioners in the future, thus providing additional value to the community.

  1. 1.

    Select—Mark something as interesting: Select a data entity at a specific spatial location in the VE or modify the displayed temporal context through the selection of a new time event or time range, for instance, with the objective to perform various follow-up interactions, such as to display details-on-demand.

  2. 2.

    Explore—Show me something else: Look around in the VE with the objective to identify a location/region (spatial) or time event/range (temporal) of interest worthy of further inspection or move around in the VE in order to reach data entities, either in close proximity or far away (outside the physical real-world boundaries of the VR system’s calibrated safe interaction area), potentially utilizing virtual travel features.

  3. 3.

    Reconfigure—Show me a different arrangement: Perform an interaction that modifies the visual arrangement of the displayed data entities in the VE, for instance, with respect to their relative location in the VE or in regard to aspects of their individual visual representation (for instance, sorting the order of the displayed data variables).

  4. 4.

    Encode—Show me a different representation: Modify the visualization technique used to represent a data entity in the VE, i.e., mapping a data item’s data variables onto a new visual representation and in turn creating a different data entity.

  5. 5.

    Abstract/Elaborate—Show me more or less detail: Aligned with Shneiderman’s Visual Information-Seeking Mantra (Shneiderman 1996), display details-on-demand (elaborate) to show additional information about a selected data entity, or hide the details (abstract) to enable a more overview-like perspective and interaction mode.

  6. 6.

    Filter—Show me something conditionally: Perform an interaction that modifies the visual representation of one or more data entities in the VE to conditionally hide or add information, for instance, by deactivating entire data entities or aspects of their individual visual representation (for instance, filtering out undesired displayed data variables).

  7. 7.

    Connect—Show me related items: Perform an interaction in the VE that facilitates the inference of relationships between and the comparison of data entities, both with respect to spatial and temporal contexts.

  8. 8.

    Undo/Redo—Let me go to where I have already been: With respect to the interaction in the VE in general, enable the user to retrace their previous interactions, for instance, through undo, redo, history, or reset functionalities.

  9. 9.

    Change Configuration—Let me adjust the interface: Perform an interaction that modifies aspects of the user interface on a system level in general or with respect to the particular in situ interaction mode with one or multiple selected data entities (for instance, temporally accessing and switching between menus and widgets that assist with the interaction in the VE).

3.3 3D gestural interface design

Following a prototypical approach, we designed a 3D gestural interface for the interaction with 3D Radar Charts in immersive VR under consideration of various theoretical aspects, practical guidelines, recommendations, and lessons learned from related work as described throughout Sect. 2 and as initially illustrated in Fig. 1. We started with an overall task analysis, aiming to identify the particular interactions an analyst would likely perform when exploring time-oriented data. For this purpose, we adopted the data analysis tasks as described throughout Sect. 3.2. The actual design of the interaction was conceptually informed by the various 3D interaction technique classifications according to LaViola et al. (2017, Chapters 7–9), with additional considerations in regard to hand posture comfort as discussed by Rempel et al. (2014).

We envision explorative analysis (Aigner et al. 2011, Chapter 2) as one of the main use cases for such interaction with time-series data, i.e., using the immersive VE for data explorations and observations to extract first insights that can lead to subsequent analysis. In turn, the analyst is arguably going to perform certain task types more frequently than others. This requires keeping in mind hand comfort recommendations such as those reported by Rempel et al. (2014) to avoid the use of uncomfortable hand configurations for anticipated frequent interactions. Under the assumption that the VE is populated with a multitude of data entities (3D Radar Charts), each representing different time-series data, means for spatial exploration are needed, i.e., a Travel feature to enable movement in the VE beyond physical space limitations. This allows for utilization of the virtual 3D space by instantiating many data entities, enabling the user to explore the data in a more overview-like manner (Shneiderman 1996), conceptually similar to “walking among the data” (Ivanov et al. 2019; Streppel et al. 2018). When discovering something of interest, the user is expected to engage in situ with the data to display details-on-demand (Shneiderman 1996), thus entering a closer contextual interaction (Nehaniv et al. 2005). At this stage, we can expect the user to (1) Select Time Events and Time Ranges and potentially (2) Reconfigure (Sort) the order and (3) Filter out individual data variables.

Besides these envisioned frequent tasks, we also considered features for more infrequent ones. Depending on the number of time events in the time series as encoded over the static length of a 3D Radar Chart (its height in the VE), we were interested in providing a Zoom feature. With a time range selected, the user may Zoom In by temporarily “stretching” their time-series selection over the entire virtual length of the 3D Radar Chart, visually cutting off any time events outside that range. Reversely, assuming the entire time series is not already displayed, the user may also Zoom Out from previous Zoom In interactions. We implemented a history feature, allowing for step-wise Zoom Out based on multiple prior Zoom In interactions. It is also important to provide the user with means to reverse selections and manipulations, and therefore implemented a Reset feature, conveniently reconfiguring a 3D Radar Chart back to its original state. Finally, we wanted to explore the possibility to allow the user to temporary pause any kind of interaction, for instance, to avoid unintentional hand movements (Pavlovic et al. 1997) during periods when the user desires to make observations in the VE more passively.

To support these anticipated tasks and interactions, we designed the 3D gestural interface with a focus on hand-based grasping interaction with virtual objects as well as through means of gestural commands based on the user’s in situ context. Based on our interest and within the scope of this investigation, we deliberately avoided graphical menu-based system control techniques (LaViola et al. 2017, Chapter 9). We kept hand posture comfort recommendations in mind (Rempel et al. 2014), prioritizing seemingly more comfortable hand postures for anticipated frequent tasks in the VE. Figures 3, 4, 5 and 6 demonstrate the 3D gestural interface in the immersive VR environment.Footnote 1 Figure 7 illustrates the real-world hand posture configurations we applied within the scope of the presented 3D gestural interface. Table 1 provides a comprehensive overview of all implemented features of the 3D gestural interface, including their data analysis task, interaction technique, and comfort classification. Next, we describe in more detail the rationale behind the individual feature designs of the 3D gestural interface.

Fig. 3
figure 3

Overview of the Travel, Mode Toggle and Rotation features of the implemented 3D gestural interface (see Table 1), including black dashed arrows as annotations to illustrate the hand movements in the VE. See also video demonstration in Supplemental Information

Fig. 4
figure 4

Overview of the Data Variable Sort, Data Variable Filter and Time Event Selection features of the implemented 3D gestural interface (see Table 1), including black dashed arrows as annotations to illustrate the hand movements in the VE. See also video demonstration in Supplemental Information

Fig. 5
figure 5

Overview of the Time Range Selection, Zoom In and Zoom Out features of the implemented 3D gestural interface (see Table 1), including black dashed arrows as annotations to illustrate the hand movements in the VE. See also video demonstration in Supplemental Information 

Fig. 6
figure 6

Overview of the Reset and Pause/Resume features of the implemented 3D gestural interface (see Table 1), including black dashed arrows as annotations to illustrate the hand movements in the VE. See also video demonstration in Supplemental Information 

Fig. 7
figure 7

Applied hand posture comfort configurations, adapted from the recommendations by Rempel et al. (2014). To facilitate cross-referencing to their work (Rempel et al. 2014, Figure 5), we apply here the same label coding for convenience: 1–12 = hand posture example; c = comfortable; u = uncomfortable

Table 1 Summary of the 3D gestural interface design to interact with time-series data in the immersive VE as presented in Figs. 3, 4, 5 and 6 utilizing the 3D Radar Chart approach by Reski et al. (2020), including classifications in regard to time-oriented data analysis tasks (see Sect. 3.2), 3D interaction techniquesLaViola et al. 2017, Chapters 7, 8, and 9, and hand posture comfort Rempel et al. 2014, Figure 5 as illustrated in Fig. 7. Note:* next to a feature indicates a significant change, while a + indicates an added feature that was non-existing compared to the initial interaction prototype by Reski et al. (2020)

3.3.1 Travel

The design of the target-based Travel feature is inspired by an “I want to go there” analogy, i.e., allowing the user to look around in the 3D VE, spot a point of interest, and confirm their travel request through respective pointing (see Fig. 3, left). Similar multimodal gaze-based interactions are well established, among others as “gaze suggests, touch confirms” described by Stellmach and Dachselt (2012). The process of looking at a specific point of interest with subsequent referential pointing feels arguably intuitive and close to a similar real-world referential hand motion. Furthermore, such a pointing hand posture has been classified as comfortable by Rempel et al. (2014), in turn making it a suitable choice for an expected frequently used feature.

3.3.2 Mode toggle

A mechanism to allow the user to confirm their engagement with a 3D Radar Chart into a closer in situ exploration mode after respective traveling was required. With the absence of graphical menus, the decision was made to utilize a minimalistic virtual widget in the format of a 3D sphere, color-coded (red: disengaged; green: engaged) directly above a 3D Radar Chart to not obstruct any parts of the visualization artifact (see Fig. 3, center). The associated Mode Toggle feature, which is conveniently triggered by simply touching the sphere itself, serves not just the purpose for engaging and disengaging, but also controls the availability of the Rotation and Data Variable Sort/Filter widgets. These are placed in close proximity to the Mode Toggle widget, allowing the user to easily see them upon change of state.

3.3.3 Rotation and data variable sort/filter

The visual Rotation as well as Data Variable Sort/Filter widgets are designed consistently with the affordance of being interacted through hand-based grasping. The user should be able to intuitively reach out their hand, manipulate the widget by grasping it, and by extension manipulate and observe the subsequent change in the state of the 3D Radar Chart. Similar to the in-place rotation of the entire chart (see Fig. 3, right), the user can rotate the individual axes, thus changing their radial arrangement to sort the axes order (see Fig. 4, left). To filter out an undesired data variable, the user can simply grab their respective widget and (instead of rotating it around for a sort operation) simply “take it away” to temporarily remove it from the 3D Radar Chart (see Fig. 4, center)—similar to how someone would separate a physical item from a group of items in a real-world context.

3.3.4 Time event selection

Based on the composition of the 3D Radar Chart, the Time Event Selection mechanism is inspired by common timeline components (Aigner et al. 2011, Chapter 3). More specifically, similar to grabbing and dragging the currently selected time event in a 2D interface, for instance, when browsing through a video using the respective video player’s interface, the user can grab the Time Slice and drag it up and down to manipulate the currently selected time event (see Fig. 4, right). Similar to the other visually embodied elements of the 3D Radar Chart (such as the Mode Toggle, Rotation, Data Variable Sort/Filter widgets), the user is encouraged to use their hands to directly grab and manipulate the chart in an intuitive and comfortable manner, while at the same time being able to observe the respective state changes. This should allow them to build up a coherent model of “I can grasp what I can see.”

3.3.5 Time range selection

While the Time Event Selection is based on a unimanual technique with one hand, we designed the Time Range Selection with a bimanual (two-handed) gestural command in mind. After all, a time range is commonly composed of a start and an end point. In turn, one of the user’s hands is mapped to the selected start point, while the other one is mapped to the end point (see Fig. 5, left). We decided to utilize a pinch hand posture for this feature for two reasons: (1) the pinch posture was utilized as a method of differentiation to hand-based grasping; (2) while being conceptually similar to a conventional grasp, e.g., to grab and drag something, the pinch hand posture itself should be distinct enough for the respective hand tracking interface to adequately differentiate between grasp and pinch hand postures, facilitating reliable user input interpretation.

3.3.6 Zoom in/out

The Zoom In mechanism, designed as a symmetric bimanual gestural command, is conceptually inspired by a real-world “diving” motion, putting the hands together with their palms facing and moving them apart (see Fig. 5, center). With a time range selected, we envision that the user makes a vertical motion with their hands to literally “dive in” into a closer examination of that time range. Additionally, as the selected time range is visually stretched over the entire length of the 3D Radar Chart as a result, the visual state change of the chart is somewhat accompanied by the user’s hand movements, ideally allowing them to connect command and expected (feature) outcome. Since a Zoom Out is conceptually the reverse of a Zoom In, we designed the respective gestural command in the interface as the reverse motion (see Fig. 5, right). The utilization of both hands with a symmetric separation/joining motion was also among the common suggestions for a zoom feature within the conducted gesture elicitation study by Austin et al. (2020).

3.3.7 Reset

Under consideration of the various state changing operations so far, the need for a Reset feature arose as means for the user to conveniently reverse a 3D Radar Chart to its initial state. Considering that such a reset mechanism can be a comparatively drastic operation, depending on the amount of manipulations, we put care into the gestural command design as to avoid an unintentional triggering through the hand tracking interface. We designed Reset as a bimanual gestural command, letting the user perform a cross (x) with their index fingers (see Fig. 6, center). A cross is commonly used in visual interfaces as an indicator to reverse or reset the state of prior operations.

3.3.8 Pause/resume

Finally, the design of the Pause/Resume gestural command is inspired by a “stop”-like gestural expression that one might perform in a real-world context (see Fig. 6, right). Our intention with this feature was to provide means to the user to temporarily “stop” (prevent) the execution of any other operations in the environment through unintentional hand movements, for instance, should the user choose to explore and observe the VE more passively. Similar to the design intend behind the Reset feature, the symmetric bimanual composition of this gestural command is likely to not be performed by accident, in turn preventing unintentional triggering.

3.4 Technologies and implementation

The VR prototype utilizes an HTC Vive HMD (1080 \(\times\) 1200 pixel resolution per eye, 90 Hz refresh rate) and a Leap Motion controller (10–80 cm interaction zone depth, 120 \(\times\) 150° field of view, Ultraleap Hand Tracking V4 Orion software, attached to the HMD’s front) for the 3D gestural input. Both devices are commercially available. The HTC Vive is configured as room-scale VR with a 2 \(\times\) 2 m area for the user to move freely without any physical obstacles. Unity 2019.3, SteamVR Plugin for Unity 1.2.3, and Leap Motion Core Assets 4.5.1 have been used to develop the prototype.

4 Evaluation methodology

To assess the developed 3D UI design, we conducted an empirical evaluation using a series of representative tasks, questionnaires, and interviews (LaViola et al. 2017, Chapter 11). Allowing human users to go hands-on with the prototype enables us to apply subjective methods to collect quantitative and qualitative data, and thus to evaluate its design. This section describes the setup, task, applied measures, procedure as well as ethical considerations.

4.1 Physical study space and virtual environment

Each study session involved one participant and one researcher, who was moderating the study, collecting data, and ensuring that all hard- and software components were working as intended. Our research group laboratory provided enough space for both to conduct the study, including a dedicated space for the VR user, the researcher’s workstation, several chairs, and a participant desk that was physically partitioned from the researcher’s workstation. The researcher remained at their workstation at all times for the study moderation (introduction, prototype initialization, tasks) and data collection (observation, note taking, interview). The participant was seated twice at their desk to complete the informed user consent form (pre-task) and questionnaires (post-task), while otherwise remaining in the VR area (tasks) and its adjacent chairs (post-task interview).

We set up the VE as a representative IA scenario to allow for spatiotemporal data exploration as follows. European countries are displayed as extruded polygons on the floor. The VE is populated with 39 3D Radar Charts, each, respectively, placed at the center of a country. Each 3D Radar Chart features five data variables, each with a time series of 150 consecutive time events (per day basis). We artificially generated all the time-series data for this 3D UI design evaluation. The data scenario was conceptually designed to be approachable, demanding no specific prior knowledge, allowing for inclusive participant recruitment with no expert requirements. The five data variables were labeled as various types of fruits (Apples, Oranges, Bananas, Berries, Grapes), representing fruit production over time. This scenario allows for spatial (European countries) and temporal (time series at each country) data exploration featuring an easily understandable data context. All implemented features were available to the VR user (see Sect. 3.3). They could freely move within the physical space and interact with 3D Radar Charts in close proximity, or Travel to virtually distant locations.

4.2 Tasks

We created a series of 31 tasks (see Table 2), comprising a mixture of all implemented features, and structured to be representative of a typical analytical session, using the prototype in a walkthrough-like manner. We included definite tasks (e.g., navigate to time event X) as well as indefinite tasks (e.g., select the event X you deem appropriate), enabling participants to partially make their own data observations and interpretations. All participants started at the same location (the eastern border of all 3D Radar Charts). The researcher would read aloud each next task to the participant upon completion of the prior one. The same task series order was applied across all participants, and their spoken-aloud answers were noted by the researcher on a task answer sheet.

Table 2 Series of tasks and their associated interaction feature (see Table 1), completed by each participant within the scope of the empirical evaluation. Annotations: \(^*\) ensure understanding of visualization concept (T06, T07); \(^{**}\) interaction paused demonstration (T26)

4.3 Quantitative and qualitative measures

To make a generalized assessment of the prototype’s usability, we utilized the System Usability Scale (SUS) questionnaire (Brooke 2013). The SUS features ten 5-point Likert scale statements that are filled out post-prototype exposure. The reported answers are calculated into an interpretable score between 0 (negative) and 100 (positive). To further assist the numerical result interpretation, we also consider the adjective ratings as proposed by Bangor et al. (2009). Furthermore, we also intended to make an assessment of the user’s engagement as part of their overall experience when operating the implemented 3D UI. For that purpose, we utilized the User Engagement Scale—Short Form (UES-SF) questionnaire (O’Brien et al. 2018), and also completed post-prototype exposure. The UES-SF features twelve 5-point Likert scale statements across four factors (three per factor): Focused Attention, Perceived Usability, Aesthetic Appeal, and Reward. Received answers may be scored in regard to the respective factors and as a combined user engagement score. Furthermore, we integrated a complementary logging system directly within the prototype, enabling the recording of all detected user interactions with a respective timestamp. Such an approach assists with the measurement of the user’s task performance, for instance, by capturing their Time Event and Time Range Selections.

During the task completion, the researcher made observations and took notes about to the user’s interaction. After task and questionnaire completion, a brief complementary semi-structured interview with each participant was conducted, providing an opportunity for the participant to reflect on their experience operating the interface as well as allowing the researcher to address previously made observations. The semi-structured interview was comprised of the following steps:

  • Introductory preface: 3D gestural input, or maybe more commonly referred to as “hand interaction,” allows you to interact in a virtual environment, for instance, by directly grabbing and manipulating virtual objects, or by making hand postures and gestures that are associated with certain features.

  • Q1: How do you feel about hand interaction that allows such an interaction in virtual reality?

  • Q2: In regard to the experienced prototype, what is your impression of how the hand interaction was implemented there?

  • Additional open remarks and comments, potentially based on the observations made during the task completion.

4.4 Study procedure

The same procedure of five stages was followed in each study sessionFootnote 2:

  1. 1.

    Introduction (10 min);

  2. 2.

    Warm-up (5 min in VR);

  3. 3.

    Task (20 min in VR);

  4. 4.

    Questionnaires (5 min);

  5. 5.

    Interview (10 min).

The participant first filled out an informed user consent, after which some demographic information was inquired (professional background and prior VR experiences). Afterward, the researcher introduced the overall context, scenario, and prototype including all its interactive features (via pre-recorded video). Each participant was then given some warm-up time with the prototype, i.e., they could get comfortable wearing the HMD, and familiarize themselves with the composition of the VE and the 3D gestural input. Once they felt ready, the researcher initiated the task stage as described in Sect. 4.2. To avoid a potential insights transfer from the warm-up to the task stage, different datasets were used. Participants completed tasks one by one until all were completed (see Table 2). The researcher observed the participant in the physical real-world space as well as in the VE from their HMD point of view as mirrored to a screen on the researcher’s workstation, and took notes. The researcher read aloud the individual tasks, and noted the participant’s answers, likewise stated aloud. Once all tasks were completed, the participant was asked to complete in order the SUS and UES-SF questionnaires. Finally, a semi-structured interview was conducted, after which they were thanked and sent off.

4.5 Ethical considerations

We followed general ethical guidelines for the work with human participants within the scope of human–computer interaction research (Norwegian National Committee For Research Ethics in Science and Technology 2016; Swedish Research Council 2017). The presented empirical evaluation was conducted between April and June 2021 during the, at the time ongoing, global COVID-19 pandemic, requiring the implementation of additional practical precautions. All national, regional, and local health/safety recommendations according to the respective authorities were closely monitored and followed. Study sessions were only conducted when the researcher and participant were symptom-free. The researcher and participant kept recommended physical distance at all times during the study session. The researcher was wearing a face mask at all times. Each participant was provided with free access to face masks and hand disinfection gel. All technical equipment was carefully sanitized between study sessions.

5 Results

5.1 Participants

We recruited a total of \(n = 12\) participants, reporting a variety of backgrounds: 5 Computer and Information Science, 5 Linguistics and Language Studies, and 2 Forestry and Wood Technology. Eight participants stated a little, three average, and one a lot prior experiences with VR. None of them reported any visual perception issues when asked during the warm-up phase, e.g., in regard to their ability to differentiate the five Data Variable Axes of a 3D Radar Chart.Footnote 3 Fig. 8 presents some participant impressions.

Fig. 8
figure 8

Immersed participants during their task completion, wearing an HMD and interacting in the VE with the designed VR prototype as described throughout Sect. 3

5.2 Task

All participants were able to successfully complete the tasks (see Sect. 4.2) and provide correct answers as pre-determined, or otherwise contextually appropriate based on their own selection choices (tasks T06, T07, T08, T10, T14, T15, T23, T28, and T30). According to the log file analysis, the task duration times averaged with \(M = 13.95~min\) (\(SD = 3.15~min\); tasks were presented in a swift manner without noticeable breaks; participants were instructed to complete them at their own pace). When the participants were asked to select a time event that they deemed as “interesting” and to briefly describe why (T14 and T28), they made their own observations, generally ending up selecting time events that featured either comparatively high or low data variable values. These time events were visually noticeable, allowing them to make comparisons and to begin speculating for potential reasons. Such participant descriptions included:

  • Berries are very low, while Bananas and Oranges are high. This could indicate a different season of the year, thus the values across the different dimensions represent a change of season.” (T14, P1, day 75)

  • Oranges and Bananas appear to be very high, while Grapes and Apples are very low. It seems like there is a relationship between those, maybe a seasonal event.” (T14, P7, day 132)

  • Berries are very low, and then increasing afterward. This is interesting, what is happening here.” (T14, P9, day 72)

  • Oranges and Bananas are very high, while the others are very low. This looks like opposite trends.” (T14, P10, day 126)

  • Oranges appear to be very high compared to the time series before and after the selected time event, maybe this could be because of a seasonal effect.” (T28, P2, day 58)

  • The values ... seem to be at their dimension’s average at the same time. It’s a perfect overlap.” (T28, P4, day 86)

  • Peak in the Grapes dimension, and it seems that Grapes are generally rather low overall compared to all other dimensions, therefore this is interesting.” (T28, P6, day 133)

  • Grapes are high and we are in Italy, so this should be great for the wine season.” (T28, P12, day 145)

5.3 Usability and user engagement

Figure 9 presents the UES-SF and SUS scores.

Fig. 9
figure 9

Left: The results of the UES-SF, presented according to the different engagement dimensions and the overall user engagement. The medians for each individual factor score (incl. overall engagement) are above average. Right: The results of the SUS, presented including the original numerical scale and the supplemental adjective ratings according to Bangor et al. (2009): worst (25), poor (39), ok (52), good (73), excellent (85), best (100). The mean value (\(M=76.25\), \(SD=9.62\)) is well above the acceptable (68) threshold

5.4 Observations

For the most part, all participants appeared to understand the concept and learn the operation of the implemented features rather quickly, allowing them to interact in the VE seemingly naturally and in an enjoyable manner. Nevertheless, some interesting observations were made throughout all the study sessions, summarized as follows.

5.4.1 Usability issues

Most noticeably, the time navigation (Time Event Selection by grabbing, dragging, and releasing a 3D Radar Chart’s Time Slice) appeared comparatively sensitive during the interaction’s conclusion. The participants had seemingly no problems initiating and continuing the grabbing mechanic, navigating back and forth in time while simultaneously interpreting the data and reading the updated labels in the juxtaposed Information Window. However, when asked to select a specific time event (T04, T22, and T29), at times the Time Slice would snap to an adjacent time event during the hand-based grasping’s release. By opening up one’s hand, the hand tracking would first interpret a Time Slice movement before concluding the grasping gesture and discontinuing the time navigation. In these cases, participants had to attempt this interaction more than once until the Time Slice remained in the desired position. Such reoccurring observations were made during nine study sessions.

The Zoom (In/Out) gestural command seemed to require the comparatively longest learning phase. Depending on a participant’s hand placement, the tracking sensor would sometimes discontinue detecting the lower hand, as it appeared to be (partially) occluded by the hand above. Once the participants appeared to have gotten a more cautious understanding and feeling of the hand tracking, they were able to perform these gestural commands seemingly fluently. One participant was observed repeatedly attempting the gestural commands in their reverse concept, i.e., moving hands together to Zoom In, and moving hands apart to Zoom Out.

Some instances of unintentional commands were observed, i.e., a participant triggered a feature through the 3D gestural input without explicit intent. Most noticeably, this occurred during intended Mode Toggle interaction, resulting in unintended Travel. In these cases, rather than touching the 3D Radar Chart’s Activation and Interaction Toggle with a hand and all fingers extended, the participant would attempt to touch it with only the index finger extended, similar to a “poking”-like posture. This, however, was in conflict with the same hand posture configuration as part of the Travel feature’s gestural command, thus resulting in an unintentional movement.

5.4.2 General operation and interaction

To make data observations, the participants appeared to use a balanced mixture of actively moving around a 3D Radar Chart and in-place Rotation using its Rotation Handle. Even though not explicitly asked, some participants made on their own accord noticeable use of various implemented features to assist them with their task-solving process, e.g., sorting the data variables before selecting a time range (T08 and T10), or filtering out proclaimed “uninteresting” data variables (T14 and T28). The participants were asked to sort the data variables in ascending (T15) and descending (T30) order. However, at no point were they told what these orders mean within the presented context. We were curious to observe how the participants themselves interpreted these tasks. The majority associated ascending with a clockwise and descending with a counter-clockwise radial arrangement of the data variables with respect to their visualization in a 3D Radar Chart’s Information Panel. A few participants appeared rather self-critical with their perceived performance operating the 3D UI, but became seemingly more confident over time as they got “a better feeling” for the hand tracking. Sometimes, participants attempted to perform gestural commands rather quickly, while their hands were not yet in the tracking sensor’s field of view. Although their gestural input operation was correct in concept, the tracking sensor appeared too slow in its initial hand detection, thus preventing them from the practical execution of the respective interaction. This was frequently observed for those features classified as gestural commands, but not so much for the hand-based grasping ones.

5.5 Interview

5.5.1 General hand interaction in VR

When asked how the participants feel about using their hands as means of interaction in VR (Q1), they expressed a rather positive attitude toward it. They thought that hand interaction has the potential to allow for very natural and intuitive interaction mechanisms. Some of them mentioned their appreciation that no additional sensors needed to be attached to one’s hands. One participant expressed minor concerns about imprecise command recognition: When an interaction is not triggered, even though correct in concept, it might make the user feel insecure, as it is difficult to determine whether the detection problem was due to them or the system. Four participants explicitly expressed their appreciation for simply using their hands instead of physical controllers that can “sometimes feel weird for the interaction, as one is grabbing a controller and the controller is grabbing a virtual object,” therefore having some kind of middle layer impression—which, according to them, is not the case with hand interaction.

5.5.2 3D UI of the prototype

When asked how the participants perceived the hand interaction within the scope of the implemented prototype (Q2), they were generally positive about the provided features. The majority stated that the 3D UI felt very natural and easy to operate once one had learned all possibilities. They acknowledged their impression of learning the various features quickly, with one participant elaborating that it felt like “riding a bike” at that stage. Some of them noted that the 3D UI featured logical and coherent analogies for the different hand postures and gestures. A few were genuinely surprised that seemingly many features relied on the utilization of both hands simultaneously, expecting more one-handed gestures. Participants also addressed some of the encountered usability issues, most dominantly mentioning that the precise Time Slice placement appeared to be “fairly tricky” at times (as described in Sect. 5.4.1), making it feel as if the hand tracking was too sensitive in these instances. Some also reflected on experiencing unintentional gestural commands.

5.6 Limitations

The empirical evaluation of immersive interfaces in general, and within the context of IA in particular, is inherently demanding and poses various complex challenges (Stanney et al. 1998; Besançon et al. 2021; Ens et al. 2021). Based on the scope of the presented evaluation as well as the amount of recruited participants, some limitations need to be taken into account. The reported results allow for the identification of interesting trends and noteworthy considerations rather than definitive conclusions. The described evaluation methodology, particular in regard to the comprehensive task protocol as presented in Sect. 4.2, should allow for an independent replication of the presented study and subsequent future data collection to potentially reveal additional meaningful insights. The reported results should be interpreted within the presented context of IA, the respectively chosen task scenario and setup, as well as the motivated research focus. Finally, additional limitations are inherently based on the applied methodology and data collection methods as described throughout Sect. 4, for instance, the self-reporting nature of the administered questionnaires that were completed by the participants and the subjectivity of the researcher’s observations.

6 Discussion

Generally, all study participants were able to interact organically and intuitively in the immersive VE using the implemented 3D gestural interface, having a smooth and responsive experience with the prototype. In contrast to the gestural control results reported by Streppel et al. (2018), the majority of our participants managed to learn the features of the 3D UI comparatively quickly, both conceptually and operationally, completing the different tasks they were presented with. Huang et al. (2017) reported similar subjective impressions toward learnability and intuitiveness based on the evaluation of their prototype. When asked to do a certain action within the task series, our participants were able to quickly associate the correct interaction in VR, i.e., the visual object they had to manipulate or the hand posture/gesture they had to perform. The median and mean scores of the measured usability (SUS) were above the good threshold. Given our focus on hand-based grasping and gestural command techniques, we are overall satisfied with the results considering the participants were asked to conduct a multitude of predefined tasks rather than just freely exploring the data at their own leisure. The overall user engagement scores (UES-SF; between 3 and 5, median slightly below 4) are also encouraging, indicating positive engagement with the prototype by the participants. This aligns with our observations as they would often use features such as Rotation, Sort, and Filter, even when not explicitly asked for, seemingly naturally engaging with the prototype. A closer examination of the individual engagement factor scores reveals indications that the participants paid close attention during the task completion, finding the prototype aesthetically appealing, and their experience rewarding (all three with medians around 4)—all in anticipation of its general design objective. While the participants were overall excited about the 3D gestural interface and able to intuitively interact with data in the presented context, some overall reflections need to be made in regard to hand-based grasping interaction, gestural commands, and unintentional commands.

6.1 Reflection: hand-based grasping interaction

A major aspect of the 3D gestural interface’s design was concerned with the utilization of hand-based grasping for the interaction with visible virtual objects in the VE, which was appreciated by the participants. They were able to interact with the Axis Spheres of the Reconfigure and Filter Handle as an indirect widget to adjust the configuration of the 3D Radar Chart, similar to the node movement interaction as demonstrated in the prototypes by Osawa et al. (2000) and Huang et al. (2017). They could intuitively grab and drag the Time Slice in order to make respective Time Event Selections. While this interaction was valued, some shortcomings were identified when the participants had to place the Time Slice at a specific time event. The tracking and implementation felt “too sensitive” as the Time Slice would sometimes “snap” into one of the adjacent time events when attempting to release the grab, occasionally resulting in light frustration and requiring some additional interaction to recover from this error—a cost that should not be ignored at a larger scale (Büschel et al. 2018). The Time Slice movement is dependent on the detected grab-position of the hand, i.e., the position where fingers and thumb meet. In the process of releasing the grab, this position is likely to be updated slightly before the grab is detected as discontinued, thus no longer updating the time event selection. Based on the current implementation, this issue is proportionally dependent on the length of the 3D Radar Chart and the amount of included time events, i.e., the resolution of time events. As a reference, a 3D Radar Chart was scaled to correspond to a total length of 100 cm in the VE, with a total of 150 time events encoded, resulting in an effective gap distance between two time events of 0.67 cm. A lower amount of included time events over the same length would result in a larger gap between individual time events (as, for instance, in the case when Zoomed In), which would prevent the Time Slice from snapping to an adjacent time event accordingly. Vice versa, including even more events in the time series, would further increase the perceived sensitivity. While we expect 3D gestural input technologies to become more precise, we also envision some solutions based on the overall 3D UI design and implementation. For instance, rather than exclusively relying on the finger and thumb positions for grab detection, one could implement an additional dependency based on the hand’s back or palm position. In the presented case of grabbing and vertically dragging the Time Slice, the hand’s back and palm position are likely to remain relatively static in space during the release of the hand-based grasping compared to finger and thumb movements. A threshold could be implemented to prevent Time Slice movement in such instances, enabling the system to “interpret” the user’s intention to discontinue their interaction. Alternatively, another approach to solving this challenge could be based on an asymmetric bimanual interaction, similar to as presented in the prototype by Betella et al. (2014). For instance, while grasping the Time Slice with one hand, a gestural command made with the other could “lock” the current Time Slice position in place, allowing to safely disengage from the interaction without unintentionally moving forward or backward in time.

6.2 Reflection: gestural commands

In addition to interacting with visual objects, we also implemented a set of invisible gestural commands in the 3D UI. Gestural commands such as for Travel and Time Range Selection were positively received. The participants appreciated the responsiveness of the Time Range Selection, allowing them to live highlight the time ranges they were interested in. The continuous semitransparent uncolored visualization of the time series data outside these ranges provided them with a further preview of the data, which was particularly important for them when making the cutoff and deciding whether or not to include additional time events in their selection. The two-handed gestural commands worked generally well. However, based on our observations and the received feedback, some improvements can be made in regard to the Zoom (In/Out) feature. In the initial hand posture of holding both hands vertically slightly apart with their palms facing each other, the tracking sensor sometimes did not recognize the lower hand as it was occluded through the one above. Thus, even though the participants were holding their hands in the correct configuration, they needed to move them around slightly before the sensor tracked and translated them appropriately in the VE. Similar feedback was stated by the participants in the evaluation as reported by Huang et al. (2017), expressing a desire for more robust gesture recognition in such instances. Moving both hands together and then apart, or vice versa, for zoom operations was also reportedly preferred as an interaction design approach by the participants in the study by Austin et al. (2020). Both our findings and the previously described ones by Huang et al. (2017) highlight thus the importance of a reliable implementation of such bimanual interactions in the future to further satisfy anticipated user preferences.

6.3 Reflection: unintentional commands

Cases of unintentional gestural commands (Pavlovic et al. 1997) occurred most noticeably when a user wanted to display details-on-demand by touching a 3D Radar Chart’s Mode Toggle widget, but instead triggered a Travel interaction, as their hand posture was detected as pointing forward. While participants were able to travel back and recover from such an error comparatively quickly, it also caused them a mixture of light surprise, frustration, and uncertainty toward the Mode Toggle interaction. This is a great example of such an unintentional command, demonstrating that different users may attempt the same interaction differently in regard to their hand posture. We envision that such an issue can be fixed based on the current implementation in various ways, e.g., through the implementation of a distance threshold between the virtual hand model and the Mode Toggle widget, i.e., preventing Travel if a user’s hand is detected in close proximity to the widget. Thus, the 3D UI may infer in situ that the user intends to engage with a 3D Radar Chart rather than attempting to Travel. We reflect on this practical example by highlighting again the discussion by Nehaniv et al. (2005) about the importance of a computing system’s ability to infer the user’s intent with their interactions. On the other hand, no unintentional Reset operations were observed, even though the participants were able to perform the command swiftly. Similar to the considerations by Fittkau et al. (2015), we intentionally designed this command to prevent unintentional performance, as resetting a 3D Radar Chart’s configuration is a comparatively drastic operation.

7 Conclusion and future work

The designed and implemented 3D gestural interface allowed our study participants to interact with spatiotemporal data in an immersive VE to complete a series of typical analytical tasks. We described the 3D UI design and its features within the context of IA, informed by relevant foundational work, such as adapted data analysis tasks (see Sect. 3.2), 3D interaction techniques (LaViola et al. 2017, Chapters 7–9), and aspects of hand posture comfort (Rempel et al. 2014). The results of our empirical evaluation with \(n = 12\) participants point toward good usability and an overall engaging experience, where the participants were excited to intuitively use their hands to operate the VR prototype using hand-based grasping and gestural commands to interact with the abstract data visualizations as 3D Radar Charts (Reski et al. 2020). We discussed the results and were able to reflect on the 3D UI design, identifying aspects for improvement related to hand tracking detection and precision as well as a VR system’s ability to infer user intent to avoid unintentional gestural commands. Even though tracking sensors are likely to improve, we envision that most if not all of these aspects can be addressed through careful design and implementation on the software side.

In addition to the study presented here, we also utilized the presented 3D UI design,Footnote 4 in a hybrid asymmetric collaborative study setup, involving both an immersed and a non-immersed user (Reski et al. 2022). This follow-up study differed in various aspects compared to the one presented in this article; instead of interacting with the prototype in a walkthrough-like manner, the immersed users interacted on their own accord, in sessions that lasted approximately twice as long, to explore the data and solve confirmative data analysis tasks. Although not directly comparable to this study, among other obtained results, the usability and user engagement were rated similarly positive based on the administered SUS and UES-SF.

While we presented and reflected on the 3D gestural interface design within the context of the 3D Radar Chart visualization technique in an immersive VE, we envision that the demonstrated interactions are not necessarily limited to time-oriented data visualizations. Instead, we see potential for such interactions to be transferred and applied to similarly spatial 3D data visualization artifacts across a variety of IA contexts. The adapted data analysis task classification for the immersive interaction with spatiotemporal data (see Sect. 3.2) has been conceptualized independent of a visualization technique, in turn allowing terminology adoption across a variety of applications. Grabbing and dragging the currently selected time event along a time axis, for instance, by utilizing the Time Slice or an alternative visual representation, could also be applied in Space-Time Cube visualizations, such as demonstrated by Wagner Filho et al. (2020). As opposed to simply detecting a closed hand posture in mid-air in the VE without the need to establish contact with a visual artifact, unintentional time manipulations could likely be prevented, for instance, when the user simply closes their hands for relaxation without the intend to interact. Similarly, one could rely on symmetric bimanual “pinching” instead of mid-air “grabbing” to zoom or scale time in such an environment, for the same reason—a user is arguably less likely to perform a pinch hand posture unintentionally by accident. The Mode Toggle and Data Variable Sort/Filter widgets resemble visual presentations, i.e., spheres, that are commonly utilized in 3D scatterplots (Cordeil et al. 2019) and 3D graph presentations (Osawa et al. 2000; Huang et al. 2017). In turn, we envision that similar hand-based grasping interactions as demonstrated in our interface could be applied in those contexts, for instance, to trigger the display of more details-on-demand (Shneiderman 1996) about a data item/node or to manipulate their position for temporary removal (filter). Lee et al. (2023) recently utilized a “pinch and pull” technique similar to our Data Variable Filter interaction design as means to change the dimensionality of the bar chart visualization technique in their VE, changing between 2D and 3D representations. They also implemented a “slider” design to change the visual encoding of a geographic scatterplot (Lee et al. 2023), similar to the presented Time Event Selection mechanism, further demonstrating the utility of “grasping and dragging” in the 3D space along a conceptual axis—not limited to representing a temporal dimension. It would be interesting to replace the not so well received rowing motion with the presented “dive in” approach to Zoom In/Out within the context of the software cities prototype by Fittkau et al. (2015) to investigate whether or not this technique would receive better results. Similarly, we can also envision an integration of the target-based travel technique under utilization of the “gaze suggests, point confirms” principle to improve on the minimap-based travel approach described by Streppel et al. (2018). In our opinion, a technique that adheres to suggestion and confirmation could in their case prevent unintentional travel movements based on simple mid-air grasping.

We are generally satisfied with the outcome of the presented work and have some ideas for future iterations. For instance, we are motivated to improve the prototype based on the discussed aspects and investigate its application within the scope of longitudinal studies. Under the assumption of being immersed in the VE for a longer duration, it then also makes sense to investigate explicitly aspects of the 3D gestural interface’s comfort and physical fatigue, which according to Samini and Palmerius (2017) appear to be less commonly integrated in VR evaluation questionnaires. Even though the participants in our study were able to quickly learn the operation of the 3D UI, it would also be intriguing to investigate more specifically learnability aspects—a topic that is often disregarded and underexplored (Rempel et al. 2014). Furthermore, we see potential for the investigation of other relevant aspects depending on research focus and subsequent study and task design. For instance, popular performance metrics for immersive interaction evaluations include task completion time, accuracy, and success rate (Samini and Palmerius 2017). Under consideration of administering the tasks to the participants directly integrated and from within the VE itself, instead of from “outside” the VE through an external moderator, measuring aspects such as presence may provide further meaningful insights toward the assessment of the interaction in immersive (data analysis) environments (Schwind et al. 2019).