1 Introduction

The inspection of a virtual object is common in many 3D applications, which require a strategy to orbit the virtual camera around the object. This is, in general, not a trivial problem [18]. One of the main challenges is how to map the available set of degrees of freedom (DoF) of the interaction devices onto the DoF of the virtual camera. In desktop or web-based applications, we are constrained to use reduced-DoF interaction devices such as touchscreens or mice. Thus, certain assumptions have to be made about what users mean when they perform gestures or actions via these, still ubiquitous, interaction devices.

A family of algorithms assume that the priority is close examination of the object’s surface, such as, for example, when editing small details of an object. The challenge here is defining smooth surfaces adapted to the object’s geometry. A relatively early example is HoverCam [16], which implements a layered set of surfaces around an object via a sphere tree. More recent work such as the generalised trackball [17] builds on HoverCam with a multi-resolution mesh-less representation of the smooth surface around the object. These strategies have not been, in general, adopted by commercial applications involving 3D inspection of virtual objects. The risk is the excessive dependence on details of the object surface, which might result, for example, in shakiness of the virtual camera motion.

Most applications choose the opposite strategy, which is to constraint the camera to a virtual sphere centred at the object. The camera is always perpendicular to the surface of the sphere, and the radius of the sphere depends on a zoom level. Over the years, researchers have reported a number of variations to the ubiquitous Virtual Sphere controller, a mechanism proposed by Chen et al. in an early study [8]. The basics of the technique was to rotate the 3D scene by encasing it in a virtual sphere. To orbit the camera, the user had to roll the virtual sphere by dragging its surface in a desired direction: up–down and left–right. This created and accumulated rotations around axes perpendicular to the dragging gesture, axes that were arbitrary with respect to the object, but always parallel to the screen.

The virtual sphere and its subsequent variations, such as Shoemake’s Arcball [24, 25], are ideal to give the user the impression that the object is suspended in an empty space. In fact, they suggest that the user actions rotate the object, rather than suggesting that the user is orbiting the camera. However, if the scene includes a ground reference plane, the approach to map user gestures to camera motion is different. In fact, in the family tree of virtual trackballs, these approaches are in a different branch which stems from an earlier contribution, the Number Wheel [27], a two-axis valuator in which the user gestures in two dimensions were mapped into rotations around two specific rotation axes. A variation of this technique, the two-axis valuator (TAV) with fixed up-vector or Fixed Trackball [2, 20] is widely used in 2022 by commercial 3D modelling tools and repository viewers [1, 6, 14, 19, 26]. Here, horizontal gestures are mapped onto rotations around a fixed global vertical axis. This is consistent with scenes in which there is a ground plane which serves as reference.

Recent work [13] has shown that there can be advantages to generalising the fixed trackball to fixing other axes of rotation different to a global up-vector, such as, for example, a horizontal global rotation axis, when this better fits a mental model that the user has about the object. For example, a wheel or a turbine may have a horizontal rotation axis. If the object is placed with this axis parallel to the reference ground plane, Gonzalez-Toledo et al. suggested that fixing a horizontal global axis may be the best possible approach in terms of usability, performance and fatigue reduction, when navigating around these objects. Other axes might be preferred in other contexts, such as, for example, a diagonal rotation axis representing the tilting of the Earth’s rotation axis. This improvement of the ubiquitous TAV with fixed up-vector requires assuming a mental model that users share about an object, or, alternatively, interactive tools that allow selecting which axis to fix in navigation.

Fig. 1
figure 1

Trackballs work on the assumption that the objects inspected have high sphericity, i.e. their bounding box is similar to a cube (left). But objects with little sphericity (right) may require multiple zoom and/or pan operations for the camera, if the object is to be examined closely

The different members of the trackball family work on the implicit assumption that the objects inspected have high sphericity, i.e. their bounding box is similar to a cube. However, when the object is less spherical in proportions, all trackball approaches share a drawback: multiple zoom and panFootnote 1 operations are required to inspect different areas of the object. Since the camera orbits around the centre of the object, if the camera is too close, the sides of the object will be inaccessible—thus the need for the pan operation. If the camera is further, so as to access the sides, then it might be too far from other areas of the object surface (see Fig. 1). The accumulation of orbit, zoom and pan operations can be inefficient and unsatisfactory for the user. On the other hand, trying to adapt the camera motion to the object’s exact shape introduces the aforementioned issues related to excessive detail in camera motion [17].

What we ask ourselves is if we can take the best of all worlds: keep the simplicity and universality of the trackball, acknowledge the mental model of the users about the rotation axes of the object they are inspecting, but increase the interaction efficiency for objects with low sphericity, by automatically deforming its surface to adapt the camera orbit to the object’s bounding box. In the remaining of this paper, we introduce the formulation of the spheroidal trackball, and evaluate if it improves efficiency, perceived usability and perceived workload in the inspection of objects where sphericity is violated.

2 Formulation of the spheroidal trackball

The fixed trackball with generalised fixed vector [13] defines an enclosing spherical surface which constraints the camera orbit. Our goal is to generalise this strategy for objects that depart from the sphericity constraint. In this paper, we consider objects that extend or reduce their bounding box in only one of the three axis, i.e. two of the dimensions of the bounding box have the same length, which results in a trackball in the shape of an ellipsoid of revolution, or a Spheroidal Trackball. The axes which is different in size coincides with the preferred axis of rotation in the mental model that the user has about the object, the object intrinsic axis of rotation [13].

Let us assume, without loss of generality, that this dimension is aligned with a global X-axisFootnote 2, and that the bounding box is centred at (0, 0, 0). The bounding box extends \(e_x\), \(e_y\) and \(e_z\) units in each axis, so that the dimensions of the bounding box are \(2e_x \times 2e_y \times 2e_z\), and \(e_y = e_z\).

An ellipsoid is, in general, an affine transformation of a sphere defined as a quadric surface. If we define a Cartesian coordinate system with origin at the centre of the ellipsoid, then for any point (xyz) on the ellipsoid Eq. 1 holds:

$$\begin{aligned} \frac{x^2}{a^2}+\frac{y^2}{b^2}+\frac{z^2}{c^2}=1 \end{aligned}$$
(1)

where ab and c are half the length of the principal axes of the ellipsoid. An ellipsoid of revolution or simply a spheroid is an ellipsoid where two of the principal axes have the same length, which simplifies Eq. 1 to Eq. 2:

$$\begin{aligned} \frac{x^2}{a^2}+\frac{y^2+z^2}{b^2}=1 \end{aligned}$$
(2)

This will be the equation that restricts how the camera orbits around the object. In particular, in our proposal:

  1. 1.

    the camera position will be defined by a longitude and a latitude on the spheroid.

  2. 2.

    the camera orientation will be always perpendicular to the surface of the spheroid, as in the standard spherical trackball.

Thus, the problem can be formulated as follows: given the bounding box of the object (as previously defined), and a desired position and orientation in global coordinates of a virtual camera, we must solve for:

  • the parameters ab in Eq. 2, which define the spheroid that bounds the trajectory of the camera,

  • a strategy to map two-dimensional user gestures to variations in latitude and longitude of the camera position across the spheroid.

  • a strategy to map zoom level user gestures to variations in the size of the spheroid, i.e. to changes in ab.

Let us suppose an initial position and orientation of the camera that, without loss of generality, will simplify the mathematical formulation of the spheroidal trackball. The camera is at \((0,0,-Z_0)\), and the bounding box of the object is centred at (0, 0, 0) (Fig. 2, left). The orientation of the camera is such that it is pointing towards (0, 0, 0), and its up-vector is towards Y, and remember that there is a preferred axis of rotation towards X.

Fig. 2
figure 2

The virtual camera is at \((0,0,Z_0)\), and the bounding box of the object is centred at (0, 0, 0). The user performs a horizontal gesture, taking the camera from \((0,0,-Z_0)\) to (a, 0, 0). What does the user mean? What should be the orientation of the camera on every point of the trajectory? And, how to choose between a, \(a'\) or \(a''\)?

Any two-dimensional gesture of the user should result in a smooth motion of the virtual camera, such that the camera keeps looking at the object at all times. Let us further simplify the problem and imagine that the user is going to perform only a horizontal gesture to inspect the object towards X, i.e. the dimension along which the object departs from sphericity. The camera should describe a trajectory on the XZ plane, from \((0,0,-Z_0)\) to (a, 0, 0). The initial and final orientation of the camera are both clear, towards (0, 0, 0). We have established that the camera shall always be perpendicular to the spheroidal surface, but how to choose a? (see Fig. 2 right).

We start by remembering that the direction in which the camera is looking is always perpendicular to the surface of the spheroid. Since we simplified our problem, this means that, for a given position of the camera on its elliptical trajectory, the camera is always looking in a direction normal to the ellipse on that point. A fundamental difference with the spherical trackball is that these normals do not meet at (0, 0, 0). Instead, the camera travels almost in an automatic pan of the object, and the normals cross the major axis of the ellipse at points \((X_c,0,0)\) which advance along the X-axis more and more slowly until they reach a limit, a turning point that eventually makes the camera face the bounding box from (a, 0, 0). It so happens that the turning point is a perfectly known point, the cusp of the ellipse’s evolute [29]. The evolute of an ellipse is a stretched astroid, sometimes known as the Lamé curve [28] (Fig. 3). The cusp of the evolute \((E_c, 0, 0)\) can be obtained from the dimensions of the semi-axes of the ellipse, as:

$$\begin{aligned} E_c = a - \frac{b^2}{a} \end{aligned}$$
(3)
Fig. 3
figure 3

The ellipse represents the trajectory of the camera. The normals to the ellipse are the directions in which the camera points. The normals cross the X-axis at points that do not go beyond \(E_c\), the cusp of the evolute of the ellipse. We choose this point to be on the bounding box of the object to select the size of the ellipse-shaped curve that constraints the horizontal displacement of the camera

This expression allows us to choose a value for a (Fig. 3). For the spherical trackball, \(E_c = 0\). If we start to make a larger, we will start to get the panning effect. The value of a has to grow sufficiently so as to look at the bounding box of the object at all times, i.e. the turning point \(E_c\) should not exceed the dimensions of the bounding box \(e_x\). Moreover, if we make \(E_c = e_x\) (see Fig. 3), the camera will focus at the edge of the object while turning around it without letting the object go out of frame at any moment. Notice that choosing a larger value for \(E_c\) may make the object go out of frame in that situation. Thus, we finally solve for a by making \(E_c = e_x\) in Eq. 3. The other parameter b in Eq. 2 is \(b = Z_0\) since we only allowed the bounding box to stretch in one dimension.

$$\begin{aligned} \begin{array}{cc} b = Z_0 \\ a = \frac{1}{2}(e_x+\sqrt{e_x^2+4b^2}) \end{array} \end{aligned}$$
(4)

We have solved for a and b in 2, which define the spheroid that bounds the orbiting of the camera. Thus, to summarise, the spheroid shapes depend both on the shape of the object and on the distance of the camera from it. To achieve this, we match the cusp of the evolute of the ellipse with the edge of the object on the axis of rotation. As the camera orientation is always normal to the spheroid surface, this makes the camera look at the edge of the object when turning around it, which guarantees a correct visualisation of the object at all points. However, we still need strategies to map two-dimensional user gestures to variations of the camera position across the spheroid, and zoom-level gestures to variations in size of the spheroid.

At this point, it is convenient to define the movements of the camera across the spheroid in geodesic (i.e. longitude and latitude) coordinates. Remember that there is a preferred axis of rotation, which we have chosen to align with the X-axis, without loss of generality. This means that vertical gestures of the user result in rotations of the camera around a global X-axis, while horizontal gestures of the user result in camera displacements on the elliptic trajectory defined by a and b, a trajectory defined on a plane which may have been previously rotated by a vertical gesture of the userFootnote 3. Thus, gestures which are parallel to the preferred axis of rotation are always mapped to changes in latitude, while gestures perpendicular to the preferred axis of rotation are always mapped to changes in longitude (see Fig. 4).

Fig. 4
figure 4

Device-horizontal gestures are mapped to latitude displacements of the virtual camera on the spheroidal trackball, when the preferred axis of rotation is horizontal (left). Device-vertical gestures are mapped to longitude displacements of the virtual camera (centre), and zoom-level changes vary the size of the spheroid (right). We formulate all expressions in this section using this case, in which the preferred axis of rotation coincides with a dimension of the bounding box which is larger than the other two, resulting in a prolate spheroidal trackball. All expressions are given at the end of the section for the opposite case, i.e. oblate spheroidal trackballs (not depicted here)

2.1 Latitude displacements of the virtual camera

Let us start by mapping changes in latitude. A change in latitude was precisely what allowed us to find values for a and b. In order to map a horizontal gesture of a user to a movement of the camera on the ellipse defined by a and b, we consider an arbitrary initial position of the camera on the ellipse p(xz):

$$\begin{aligned} p(x,z)=(a~\cos (t_0), b~\sin (t_0)) ~~t_0 \in [0,2\pi ] \end{aligned}$$
(5)

A gesture of the user of \(\Delta \tilde{x}\) screen pixels is thus mapped to a change in the parameter \(\Delta t\), which takes the camera to a new position on the ellipse \((a~\cos (t_0 + \Delta t), b~\sin (t_0 + \Delta t)\) (Fig. 5). Our problem is to compute \(\Delta t\) as a function of \(\Delta \tilde{x}\) subject to the following requirements:

  • user gestures should result in camera displacements which are independent of the screen resolution.

  • user gestures should result in camera speeds which are perceived as independent of the distance between the camera and the object.

Fig. 5
figure 5

Incremental user gestures \(\Delta \tilde{x}\) parallel to the preferred axes of rotation (X in the example) result in incremental displacements \(\Delta a_{\varphi }\) that take the camera from \(p(x,z) = (a~\cos (t_0), b~\sin (t_0))\) to \((a~\cos (t_0 + \Delta t), b~\sin (t_0 + \Delta t)\). In the text, we compute \(\Delta t\) from \(\Delta \tilde{x}\), subject to usability requirements

Making gestures independent of the screen resolution is straightforward. For a screen with W pixels, we compute a normalised gesture \(\Delta r\) simply as:

$$\begin{aligned} \Delta r = \frac{\Delta \tilde{x}}{W} \end{aligned}$$
(6)

This normalised gesture r has an important property: a gesture which spans the whole screen results in a normalised gesture \(r = 1\).

Let us imagine that we are inspecting a generic object with the shape of an spheroid that is perfectly inscribed in the bounding box. The intersection of such object with the horizontal plane is an ellipse with semi-axes \(e_z\) and \(e_x\) (Fig. 6).

Fig. 6
figure 6

An object with the shape of a spheroid is inspected by the camera. If the camera is close enough to the object, what we see is a fraction of the intersection of the object with the camera plane that can be approximated by \(k_{\varphi }\) (see text)

The camera at p is looking in the direction of \(p'\), a point on the ellipse inscribed in the object, i.e. \(p'\) is the point on the object that we see at the centre of the screen. If the camera is close enough to the object, so that the object spans the whole screen and beyond, what we see is an arc of the inscribed ellipse which can be approximated by \(k_{\varphi }\):

$$\begin{aligned} k_{\varphi } = 2 \cdot \sqrt{(x - x')^2 + (z - z')^2} \cdot \tan (\alpha ) \end{aligned}$$
(7)

where \(\alpha \) is the camera’s horizontal field of view. Since we know p(xz), the position of the camera, a and b, the semi-axes of the external ellipse on which the camera is moving, and \(e_x\), \(e_z\), the semi-axes of the inscribed ellipse, we can solve for \(x'\), \(z'\) and thus compute \(k_{\varphi }\) from 7:

$$\begin{aligned} \begin{array}{cc} x' =\frac{e_x^2 \cdot m_{t_0} \cdot D ~\pm ~ e_z \cdot e_x \cdot \sqrt{e_z^2 ~-~ D^2 ~+~ e_x^2 \cdot m_{t_0}^2}}{e_z^2 ~+~ e_x^2 \cdot m_{t_0}^2} \\ \\ z'= \pm ~b \cdot \sqrt{1-\frac{{x'}^2}{e_x^2}} \end{array} \end{aligned}$$
(8)

where \(D=(a^2-b^2)/b \cdot \sin (t_0) \) and \(m_{t_0}=a/b \cdot \tan (t_0)\) are the parameters in the equation \(z=m_{t_0} \cdot x - D\) of the line passing through p and \(p'\).

We now make use of the previous property of the normalised gestures of the user \(\Delta r\). For a gesture that spanned the full width of the screen, \(\Delta r\) would be equal to 1. The displacement of the camera projection \(p'\) that the user expects would be equal to exactly the fragment of the object that the user sees on the screen. and that we approximated by \(k_{\varphi }\) (Eqs. 7, 8). Smaller displacements of the camera \(\Delta a_{\varphi }\) are thus proportional to normalised gestures exactly by \(k_{\varphi }\):

$$\begin{aligned} \Delta a_{\varphi } = k_{\varphi } \cdot \Delta r \end{aligned}$$
(9)

Let us suppose that the camera is very close to the object, and let us consider a very small camera displacement. This allows us to approximate \(\Delta t \). by:

$$\begin{aligned} \Delta t \approx \frac{\Delta a_{\varphi }}{ \sqrt{ (e_x \cdot \sin t_0)^2 + (e_z \cdot \cos t_0)^2 }} \end{aligned}$$
(10)

Combining all of the above results in:

$$\begin{aligned} \Delta t = \Delta \tilde{x} \cdot \frac{2 \tan \alpha }{W} \sqrt{\frac{(x-x')^2+(z-z')^2}{(e_x \cdot \sin t_0)^2+(e_z \cdot \cos t_0)^2}} \end{aligned}$$
(11)

where \(x'\), \(y'\) can be computed using Eq. 8, \(\alpha \) is the horizontal field of view of the camera, and W is the width of the screen in pixels.

2.2 Longitude displacements of the virtual camera

We now follow a similar reasoning to compute how vertical gestures of the user should be mapped to longitude displacements of the virtual camera (Fig. 7). Longitude displacements are simpler to compute since the trajectory of the camera around the preferred rotation axis is circular, rather than ellipsoidal. Our goal is now to compute the motion of the camera \(\Delta \theta \) on its circular trajectory from a vertical gesture of the user of \(\Delta \tilde{y}\) screen pixels, subject to identical usability requirements, i.e. independence of screen resolution and independence of camera speed with distance between camera and object.

In a similar procedure to the one we followed to map latitude displacements, we imagine the camera looking closely at an object with the shape of the spheroid, inscribed in the bounding box. The intersection with the vertical plane is this time a circumference of radius \(e_z\) (Fig. 7).

Fig. 7
figure 7

The object with the shape of a spheroid is inspected by the user making a small vertical displacement. If the camera is close enough to the object, what we see a fraction of the intersection of the object with the vertical camera plane that can be approximated by \(k_{\lambda }\) (see text). To compute \(\Delta \theta \), the circular motion of the camera from the vertical gesture of the user \(\Delta \tilde{y}\) we use the same concept: a gesture that spans the whole screen vertically should result in a movement of \(p'\) of the object which can be approximated by \(k_{\lambda }\)

\(k_{\lambda }\) is the arc of this circumference visible on the screen through the camera, with a simpler expression than the previously computed \(k_{\varphi }\) for latitude displacements, as the distance between p, the position of the camera, and \(p'\), the point on the object, is simply the difference in radius of the camera trajectory \(r_\textrm{cam}\) and the radius of the object \(e_z\):

$$\begin{aligned} k_{\lambda } = 2(r_\textrm{cam} - e_z) \cdot tag(\beta ) \end{aligned}$$
(12)

where \(\beta \) is the vertical field of view of the camera.

The radius of the camera trajectory depends on the latitude of the camera. It can be obtained from the parametric equations of the ellipse, giving as a result: \(r_\textrm{cam} = |b \cdot \sin (t)|\).

For a vertical gesture of the user of \(\Delta \tilde{y}\) pixels, we follow the same reasoning as in latitude displacements to find that incremental arc displacements of the camera in longitude are proportional exactly by \(k_{\lambda }\) to the vertical displacements on the screen. Thus, the parameter \(\Delta \theta \) which moves the camera on its circular trajectory can be computed as:

$$\begin{aligned} \Delta \theta = \Delta \tilde{y} \cdot \frac{2 \tan \beta }{H} \cdot \frac{|b \cdot \sin (t)| - e_z}{e_z} \end{aligned}$$
(13)

where H is the height in pixels of the camera.

This mechanism adapts the longitude movements to any latitude in the ellipsoid. At the edges (maximum and minimum latitude), a longitude movement is translated into rotations of the camera around its view axis, resulting in an apparent rotation of the object in the screen which will be clockwise or counterclockwise depending on the direction of the mouse movement, but independent of the position on the screen where the mouse cursor is being moved. This leads to a somewhat undesired effect whereby the object may not follow the mouse cursor.

2.3 Zoom-level changes of the virtual camera

We have previously defined mappings between two-dimensional user gestures to variations in latitude and longitude across the spheroid. The only requirement left is the strategy to map zoom level gestures to variations in the size of the spheroidal trackball, i.e. changes to a, b in Eq. 2. The strategy to map zoom-level user gestures is straightforward: gestures \(\Delta \tilde{z}\) of the user are mapped to changes \(\Delta b\) in one of the semi-axes of the spheroid. Let the size of the spheroid before the gesture the spheroid be defined by semi-axes \(a_0\), \(b_0\) in Eq. 2. A gesture \(\Delta \tilde{z}\) of the user expressing a zoom-level change is mapped to a change \(\Delta b\) so that the new semi-axis a, b are simply:

$$\begin{aligned} \begin{array}{cc} b = b_0 + \Delta b \\ a = \frac{1}{2}(e_x+\sqrt{e_x^2+4b^2}) \end{array} \end{aligned}$$
(14)

It is interesting to note that when \(b \gg e_z\), i.e. when the camera is far from the object bounding box, the spheroidal trackball becomes more and more spherical:

$$\begin{aligned} \lim _{b \rightarrow \infty } a= & {} \lim _{b \rightarrow \infty } \frac{1}{2}(e_x + \sqrt{e_x^2 + 4 b^2})\nonumber \\ {}= & {} \lim _{b \rightarrow \infty } \frac{1}{2} (e_x + 2b) = b \end{aligned}$$
(15)

This is a convenient result by which from a long distance, the spheroidal trackball defaults to the classical spherical trackball (see Fig. 8).

Fig. 8
figure 8

Effect of the distance between the camera and the object on the shape of the ellipsoid. The furthest from the object, the more spherical the spheroid becomes. From a long distance, the spheroidal trackball defaults to the traditional spherical trackball

What is left to complete the spheroidal trackball a strategy to map \(\Delta \tilde{z}\) zoom-level gestures of the user to \(\Delta b\) changes to the spheroid. At this point, we need to find what is the minimum spheroid that can be defined around the object before the camera trajectory intersects the bounding box.

The minimum spheroid that just intersects the bounding box can be computed by writing the equation of the circle that goes through p and \(p'\) on Fig. 9, as well as that of the ellipse on the horizontal plane going through \(p'\). If we impose the relationship between the spheroids semi-axis a, b that we are imposing for the spheroid trackball (Eq. 3), we are left with a cubic equation on one of the semi-axes of the spheroid with coefficients depending only on the dimensions of the bounding box of the object:

$$\begin{aligned} a^3 - E_c~a^2 - (E_c^{~2} + r^2)~a + E_c^{~3} = 0 \end{aligned}$$
(16)

This cubic equation can be solved using the Cardano–Ferrari method. The analysis is not trivial, but given the geometric characteristics of the solution we are looking for, it is possible to consistently choose one of the solutions to the cubic equation to obtain the following expression to compute the values of the semi-axes \(a_{\min }, b_{\min }\) of the minimum spheroid:

(17)

where \(E_c\) and r can be directly computed from the bounding box of the object. This is an elaborate expression, but only has to be computed once, for instance, when the geometry of the object is loaded by the application.

Fig. 9
figure 9

The minimum spheroid trackball will just touch the bounding box in point p, which is at a radius r of the horizontal axis, which can be computed from the size of the bounding box (left). Also at this radius but on the horizontal plane is \(p'\), a point on the intersection of the spheroid with the horizontal plane (right)

The minimum spheroid is useful to map because it provides a lower bound to the changes in the semi-axes \(\Delta b\) that we set out to compute at the beginning of the section. For a given initial position of the camera on a spheroid defined by \(a_0, b_0\), one can compute the distance between that position and the minimum spheroid defined by \(a_{\min }, b_{\min }\), then map the variations \(\Delta \tilde{z}\) produced the interaction device to percentages of that distance. For the kind of spheroids that we are considering in all the formulation, the expression is simply:

$$\begin{aligned} \Delta b = k_\textrm{scale} \cdot (b_0 - b_{\min }) \cdot \Delta \tilde{z} \end{aligned}$$
(18)

where \(k_\textrm{scale}\) is a scale factor that will depend on the applicationFootnote 4. This and the expressions in Eq. 14 allow us to compute the spheroidal trackball at the new zoom level.

2.4 Generalisation to oblate spheroidal trackballs

The formulation above is general to scaling of the classical spheroidal trackball in any of the axis. However, we have given expressions in some cases that assume that the semi-axis a is greater than b (aligned, respectively, with global X- and Z-axes for convenience). This is to say that the formulation above in some cases assumes that the axis that is scaled is greater than the other two, which are equal but smaller. These are called prolate spheroids, having the shape of a rugby ball. We generalise all expressions in this section to oblate spheroids, where the semi-axis which is scaled is smaller to the other two, and the shape is that of a Pilates ball (when someone sits on it). Table 1 has a twofold purpose: to generalise the formulation to oblate spheroids, and to have all the practical expressions summarised together and separated from other, secondary expressions used in the formulation.

Table 1 Summary of expressions for prolate and oblate spheroidal trackballs

The formulation of the spheroidal trackball is an alternative framework which to the traditional approach to orbiting a virtual camera around an object. In the remaining of the paper, we seek to evaluate this framework in an experiment which test our hypotheses.

3 Evaluation of the spheroidal trackball

Once the spheroidal trackball is formulated, our goal is now to evaluate if this framework will improve interaction, in terms of higher performance, increased perceived usability and reduction of perceived workload.

3.1 Overview

We designed an experiment in which participants had to use a software application to complete a series of inspection and docking tasks. The object inspected was a 3D model of a gas turbine, a horizontally elongated object which strongly suggests a horizontal axis of rotation, which coincides with its longest dimension. All subjects had to perform a search-and-dock task of a small target on the surface of the object. First they have to find it, then align it with a viewfinder. The task was performed several times using different navigation conditions. In one condition the navigation used a spherical trackball, and in the other a spheroidal trackball adapted to the surface of the object. Our hypothesis was that using the spheroidal trackball will benefit the usability of the application, in terms of increased performance (less time to complete the task), higher perceived usability and lower perceived workload.

3.2 Inspection and docking task

Each participant had to complete a combined inspection and docking task a number of times on the same virtual object: a gas turbine, an elongated object with enough three-dimensional detail so as to hide from the viewer a small yellow sphere (target) depending on the orientation of the camera. The goal for the subjects was thus to (1) find the yellow target sphere and (2) dock the target inside the viewfinder. The viewfinder has two concentric circles. Docking is completed when the projection on the camera view of the yellow target sphere is within the ring-shaped area between the circles (Fig. 10)

Across iterations of the task, the yellow target sphere appeared on manually pre-defined locations on the object. When docking is completed, the viewfinder changes colour for one second, and a sound suggesting success is emitted by the application. To proceed to the next iteration, the subject had to press the space bar. A decreasing counter helps the user understand how many repetitions are left for a given trial. More detail is given in Sect. 3.4.

Fig. 10
figure 10

a shows the 3D model of the untextured gas turbine with a yellow sphere (target) and the ring-shaped viewfinder. bd shows examples of the three zones in which target can appear. From left to right, lateral-hidden, central and lateral-visible. The figure also illustrates how the viewfinder changes position and size between trials

3.3 Experimental design

We designed an experiment with two independent variables. The first variable has two within-subject conditions. Condition 1 employs the spheroidal trackball (introduced in this paper) to orbit the virtual camera, while Condition 2 corresponds to the traditional spherical trackball. In spheroidal navigation, the subjects were allowed horizontal rotation, vertical rotation and zoom-level gestures. In spherical navigation, the users were also allowed to make pan gestures. Each of the participants performed the study twice with each condition in a counterbalanced order. The participants did not receive demonstrations or explanations about the navigation techniques. They were told that they were going to try four variants of a technique to manipulate three-dimensional objects on the computer, and they were explained the task that they had to perform.

The second variable has three levels: the targets were set out to uniformly occupy three types of locations on the object surface. From an initially centred, neutral position of the virtual camera, we classified each area of the object surface into three categories: (1) Central: targets are visible and centred on the object (2) Lateral-visible: targets are visible but towards the sides of the object, and (3) Lateral-hidden, towards the sides of the object and hidden by the blades of the object. The object has enough detail in this area to hide the target, unless the camera orbits or pans towards it). Figure 10 attempts to illustrate this classification with examples. The targets were arranged so that, at each trial, the target was hidden and in a different area according to the previous classification. This strategy makes sure that the participants always had to navigate the camera around the object to find the following target. Four sequences were manually designed to meet these constraints. Across participants, the mapping between block and sequence was also counterbalanced using Latin squares.

Fig. 11
figure 11

Experimental design. a Shows the complete procedure of the experiment. b Shows the procedure of a block. c Shows an example of a sequence of the 18 trials performed in one block, it lists the areas where the target is placed in the Sequence C. The conditions and sequence order for participant 3 are shown as an example

Figure 11 shows the structure of the full participation of one subject in the evaluation. Once the demographic questionnaire and the informed consent are filled in, subjects completed four blocks, two for each condition. The order in which each condition appeared was counterbalanced using Latin squares. Each block was preceded by a training phase that allowed the users to practise the inspection and docking task for as long as they wished. The word ’Training’ appeared during this stage, instead of the counter showing how many targets where left to find. Once the participants felt they were ready, they clicked on a button on the corner of the screen to signal the end of the training. The scenario illustrated in Fig. 10a was loaded and the block started. Each block includes 18 trials of the inspection and docking task. Each trial starts when the user presses the space bar on the keyboard. The position of the target for each trial was selected by the software using a strategy that forced subjects to always use the navigation technique of the block to look for the target. The strategy is detailed in Sect. 3.4. The participants were asked to complete each trial as fast as possible. At the end of each block, they filled in a usability questionnaire, followed by a perceived-workload questionnaire.

3.4 Experimental apparatus

The evaluation took place in a closed room. The participants used a Lenovo Legion Y530-15ICH laptop with an Intel i5 CPU and 16GB of RAM, running Windows 10. The software application was developed using Unity3D. The core of the software is HOM3R, a 3D viewer for complex hierarchical product models [12]. The application run at full screen with a resolution of \(1920 \times 1080\) pixels. The interaction was done with the mouse, which rested on the same table as the laptop. The space bar of the laptop keyboard was used to start each trial. There were three gestures that the user could perform with the mouse: (1) Zoom in or out on the object using the mouse wheel. Certain minimum and a maximum zoom levels were set to avoid the user to completely lose the object. (2) Orbit around global horizontal and vertical axis through dragging gestures with the left button of the mouse. (3) Pan horizontally across the object through dragging gestures of the middle mouse button (the wheel), only for the spherical trackball condition. At each trial, the viewfinder could appear anywhere on the screen, with different sizes (Fig. 10), so as to force participants to use all the available gestures of the interaction device.

Fig. 12
figure 12

Software used for the evaluation. From the top left image, clockwise: The users are allowed to train with the application for as long as required. At each trial, the viewfinder can appear anywhere on the screen, with different sizes. Once each block is finished, the software presents the user with the usability and workload questionnaires

3.5 Participants

A total of 32 individuals participated in the evaluation, of whom 20 identified as men, 11 as women and one as nonbinary. Ages were between 18 and 49 (\(M=24.16\), \(\hbox {SD}=3.65\)). Thirty-one participants were 18–29 years old and one was 40–49. None of the participants reported any uncorrected visual acuities or impairments. All subjects were offered a small 3d-printed gift in return of their participation. No experience with 3D modelling or similar software was required. However, the demographic questionnaire showed that 68.75% of participants had experience with 3D modelling software, while 78% of participants had used video games. The participants signed their informed consent, and the whole evaluation procedure was reviewed and approved by the Experimentation Ethics Committee of the University of Malaga..

3.6 Data collection and analysis

Our goal was to evaluate whether there were differences between using the traditional spherical trackball and the spheroid trackball introduced in this paper. For this we measured performance, perceived usability and perceived workload.

We measured performance in terms of the time that it took participants to find and dock the target in the viewfinder. We averaged the time for each participant and each condition across the 18 trials. We also computed average times for the six trials within each object area (central, lateral-visible and lateral-hidden). We computed geometric time averages to compensate for the excessive time that some users take to complete trials [22]. Perceived usability was measured with a ten item SUS questionnaire [7]., translated to Spanish by Devin [11]. The participants scored each question with a 5-point Likert scale which ranged from ’Strongly Disagree’ to ’Strongly Agree’. The answers were processed to obtain a SUS score [21] of how usability is perceived by the participant. Perceived workload was measured with the Raw TLX [15], in its Spanish translation by [10]. The questionnaire has six questions that are answered in a 21 point scale—from very low to very high. The RTLX score was the average for each participant and condition.

We computed a within-subject, two-factor ANOVA where the two factors were the navigation technique (spherical or spheroid) and the position of the target (central, lateral-visible and lateral-hidden) to assess if there was an effect in performance of the navigation techniques depending on whether the target was more difficult to search and dock. In addition, we computed a paired-samples T test to assess if there were statistically significant differences in usability and workload perceived by participants.

3.7 Results

3.7.1 Task performance

To estimate performance, we measured the time that participants took to complete the search-and-dock task detailed in previous sections, and thus a smaller value corresponds to a better performance. We found a significant effect in task performance of the navigation technique (spheroid vs. spherical). For spheroidal navigation, the mean was \(8.91_{-0.44}^{+0.47}\) s,Footnote 5 while for spherical navigation the mean was \(14.25_{-0.85}^{+0.923}\) s. The estimated difference between both techniques was \(5.34_{-0.72}^{+0.78}\) s. (Fig. 13, All areas).

We computed a two-factor ANOVA to find that there was an effect of the navigation technique (\(F(1,31) = 292.586\), \(p<0.001\)). We also found that there was an effect of position (\(F(2,62) = 97.408\), \(p<0.001\)), and an interaction effect between position and navigation technique (\(F(2,62) = 11.721, p<0.001\)).

We performed separated T tests to assess the effect of performance of the navigation technique in each of the areas of the object. We found an effect in all areas:

  • Lateral-hidden: The t value was \(t(31) = -14.263\), \(p<0.001\). The means were \(9.54_{-1.32}^{+0.66}\) for spheroidal navigation, and \(16.92_{-1.22}^{+1.31}\) for spherical navigation.

  • Central: The t value was \(-10.251\), \(p<0.001\) and the means were \(7.8_{-0.43}^{+0.46}\) and \(11.34_{-0.81}^{+0.88}\), respectively, for spheroidal and spherical navigation.

  • Lateral-visible: The t value was \(-14.590\), \(p<0.001\) and the means were \(9.5_{-0.48}^{+0.52}\) for the spheroidal, and \(15.07_{-1.01}^{+1.07}\) for the spherical.

Figure 13 depicts the means and CIs separately for each object area.

Fig. 13
figure 13

Estimated means and CIs for the time it took participants to complete the inspection and docking task. The horizontal axis represents the navigation technique, and there are different plots for each of the object areas. All error bars are 95% CIs

3.7.2 Perceived usability and workload

We also found a significant effect in the perceived usability as measured by the SUS score \( t(31) = 8.486\), \(p<0.001\). The mean score for spheroidal navigation was \(83.79_{-3.68}^{+3.68}\), while for spherical navigation it was \(69.49_{-4.3}^{+4.3}\) (Fig. 14a). Note that this time, a larger value in the SUS score is better.

Fig. 14
figure 14

a Means and CIs of the usability scores for each navigation technique. A larger score in the SUS test indicates better perceived usability. b Perceived workload in mean and CIs of the points obtained in the Raw TLX questionnaire (less is better). All error bars are 95% CIs

Finally, we also measured a significant effect of the navigation technique in the workload reported by participants via the Raw TLX questionnaire. \(t(31) = -6.631\), \(p<0.001\). When participants used the spheroidal navigation, the mean in reported workload was \(29.56_{-4.3}^{+4.3}\) points. When they used the spherical navigation, the mean reported workload was \(41.56_{-5.66}^{+5.66}\) (Fig. 14b). This time, less is better.

4 Discussion

4.1 On some of the choices we made to evaluate the spheroidal trackball

We selected a virtual object that represents a power turbine. The object has two features that make it suitable for evaluating spheroidal navigation: it is elongated, and has an intrinsic rotation axis. Recent work has shown that objects having a clear rotation axis in the user mental model allow fixed-axis navigation techniques to provide a more natural and effective interaction [13]. The rotation axis of the turbine is also horizontal, making the prolate spheroidal trackball the adequate choice for evaluation.

Spheroidal navigation allows reaching the different areas of the object to be reached by orbiting around the object, while maintaining a more constant distance to the object than spherical orbiting. In contrast, in the traditional way of interaction, when we need to zoom the camera near the object, we do so towards its centre, leaving the sides of the object unreachable. To reach the extremes we need to perform a translation of the centre of the object, what is known as a pan gesture. The spheroidal navigation saves the user from having to combine orbit and pan gestures to inspect the object. This is why we chose to separately compare both techniques for targets at the centre and at the sides of the object.

4.2 On the results in task performance in the different areas of the elongated object

Targets at the centre of the object are accessible by both techniques without panning. This explains a smaller difference in performance between spheroidal and spherical navigation, which was expected. However, the difference is still significant. Subjects seem to be using panning gestures at the centre areas of the object even though they do not strictly need it, and paying in performance. Firstly, panning is available: if the subjects can use it, they will. And it is not just a matter of availability: panning can simplify the last part of the task, docking, as the correspondence between screen gestures and camera motion is more direct. On the other hand, the closer the camera to the surface of the object, the more similar is spheroidal orbiting to panning, which makes docking also simpler, without incurring the performance loss of switching between gestures.

At the sides of the object, the difference is even stronger, as panning is indispensable in spherical navigation to reach this areas. For targets that are at the lateral area but not hidden, panning is only required for docking, but it is not imperative to locate them, as a zoom out gesture can be sufficient to reveal their location. However, for targets hidden in the turbine blades panning can be required just to find where the targets are. This can explain the different results in performance: A maximum advantage for spheroid navigation for targets hidden in the lateral blades of the turbine; a slightly smaller, but still large advantage for spheroid in lateral, non-hidden targets; a smaller advantage for targets in the centre of the turbine. Still, in all areas the advantage of spheroidal navigation is significant, showing a consistent improvement in performance over the traditional approach based in spherical navigation.

4.3 On the results in perceived usability and workload and the degrees of freedom of each navigation technique

The reduced degrees of freedom (DoF) of spheroidal navigation can also explain its advantage in perceived usability and perceived workload. These results can be discussed in the context of the classic theory of Berstein for motor skill learning [3]. In his work, Berstein formulates the motor problem as that of choosing among a redundant set of DoF. Even for the apparently simplest tasks, such as throwing a ball, not only an enormous number of DoF of the human biomechanichal system have to be coordinated, but also, somehow, selected, as the set of DoF is redundant, i.e. there are multiple combinations that may produce the same motor result. How is then even possible that we can solve problems with so many numbers of variables, and become dexterous at performing tasks by training? There are many theories as to how this is accomplished, but Berstein suggests that at the early stages of learning, we employ different mechanisms to freeze as many DoF as possible in order to solve the motor problem. This hypothesis has extended by others in human and robotic motor skill learning [4]. Spheroidal navigation constitutes a similar strategy by which some of the navigation DoFs are frozen to simplify the problem of accessing the periphery of objects that depart from sphericity.

Let us think of an example of classical navigation based on a spherical trackball. We are looking from a small distance at the central area of an elongated object such as the turbine employed in the evaluation, and from here we need to travel to the left side of the object. In principle, we have a choice in the gesture that we can make: we can either orbit or pan the camera towards our target. Indeed these two actions have to be mapped to different buttons or the mouse, or require us to use an additional key on the computer keyboard. There are redundant DoFs to solve the problem, as in Berstein’s formulation. If we are close to the object and our target is far in the edge of the object, orbit might altogether the wrong choice, as we will not put the target in reach of the virtual camera. On the other hand, with spheroidal navigation panning is not required, as spheroidal orbiting will maintain the camera at a close, almost constant distance to the surface of the object, resulting in an efficient interaction that uses a reduced set of DoFs.

We can also frame our proposal in the classic Schema Theory [23]. According to this model, we retrieve a general motor programme from memory in order to solve a new motor problem. To retrieve the programme we use as inputs the initial and the final, desired situation, then we adapt the general motor programme to the specific conditions of the problem. In our example, in the case of spherical navigation, the retrieved schema involves orbiting and panning. In spheroidal navigation, the retrieved programme will involve only orbiting, in a schema which is simpler and thus easier to acquire.

Other strategies mentioned in the introduction, such as HoverCam [16], are designed to maintain a constant distance between the orbiting camera and the virtual object surface. Precisely because they are based on the distance to the actual object surface, they are vulnerable to traversing every feature and result in a jittering effect on the camera image during movement. Our approach encloses the object within a smooth surface—the spheroid—providing an adaptation, that, while not perfect, consistently ensures smooth camera movements, eliminating any potential shakiness. Finally, our proposal requires a non-uniform motion of the camera. This was specifically designed to approximate a one-to-one movement of the object in the screen with the mouse (this approximation would be exact if the object is an spheroid). Our evaluation shows that, despite this non-uniform motion, this approach yields to positive results on the user experience in terms of usability.

5 Towards the ellipsoidal trackball

In this paper, we have extended recent work to generalise the fixed trackball by relaxing the implicit assumption that the objects inspected have high sphericity. We have started by considering objects with a bounding box in which only one of the three dimensions is larger or smaller than the other two. The resulting trackball is spheroidal. A spheroid is a surface constructed by rotating an ellipse around one of its axis. If we rotate it around the longer axes, we obtain a prolate spheroid, but if we rotate it around the shorter axes, we obtain an oblate spheroid. For both cases, we have formulated equations that map device gestures performed by the user to orbital motion on the spheroid (latitude and longitude displacements), and re-computations of the spheroidal surface, which map zoom-level gestures of the user.

An evaluation performed with 32 subjects in a search-and-dock task strongly suggests that the spheroidal trackball outperforms the classical approach based on the spherical trackball, in terms of performance (time to complete the task), perceived usability and perceived workload.

We have previously discussed [13] practical implications for applications willing to take profit of the improvements in performance, usability and fatigue perception of using a fixed rotation axis that is consistent with a mental model of a user about the object. This can be generalised to the framework presented in this work. One promising option is the use of visualisation widgets [5] to interactively fix an axis for navigation around objects that are arbitrarily oriented in space. Once this axis is fixed, it is possible to compute a transformation of the coordinate system in which the bounding box of the object would fit either the prolate or the oblate spheroid formulations in Table 1.

The next step is to allow for objects that have a bounding box with different sizes in all dimensions. Such objects will require an Ellipsoidal Trackball. An ellipsoid is, in general, a surface that is not obtained by rotating an ellipse around one of its axis, as opposed to an spheroid or ellipsoid of revolution. One can obtain such shape from a sphere by performing scale deformations in all three dimensions. Mapping user device gestures while constraining the virtual camera to an ellipsoid requires extending the formulations in this paper, and performing a new evaluation on objects that depart from sphericity in all dimensions. There are multiple examples of such objects: vehicles, pieces or furniture, or buildings. In future work, we aim to formulate the ellipsoidal trackball and evaluate its performance against both the traditional approaches based on spherical trackballs, and on the strategy introduced in this paper, the spheroidal trackball.