1 Introduction

Recent advances in the sensor industry, together with the lowering of production costs, have significantly changed the perspective of virtual reality (VR) technology. Cipresso and colleagues (Cipresso et al. 2018) described the different types of VR based on the level of immersion perceived by the user, therefore distinguishing three main groups of systems. The non-immersive and semi-immersive VR systems represent the cheapest VR solution, reproducing a virtual image or a 3D scenario, respectively, using either monitors or desktops. Conversely, immersive VR (IVR) systems provide the user with a full-body simulated experience with a stereoscopic view of the surrounding environment using a head-mounted display (HMD) and a positional tracking system integrated with audio and haptic devices.

IVR allows for experiencing a greater degree of realism by concealing the interface between the real and virtual world from the user and creating the illusion of presence in 3D scenarios (Lombard and Ditton 1997). One of the main opportunities provided by IVR systems is the possibility to create an ad hoc scenario to test hypotheses that are otherwise difficult to assess in the real world (Nabiyouni et al. 2017). For this reason, these systems are rapidly expanding into numerous research applications and scientific disciplines, such as research, rehabilitation, education, medical and surgical procedures (Cipresso et al. 2018; Izard et al. 2018; Campo-Prieto et al. 2021).

Nonetheless, the applicability of IVR for quantitative measures in research is still under discussion, with a major limit in the existing trade-off between costs and precision in the tracking of the movements.

Expensive movement analysis systems reported in the literature, such as Vicon and Optitrack, embed IVR in their platforms, integrating precision motion capture with a 3D immersive virtual experience. However, in later years, the general development of gaming platforms has cut the costs of IVR systems and devices, from 100 k USD—or more—to less than 1 k USD. Among them, the HTC VIVE system represents the most widely used commercial solution for consumer VR applications, due to its better tracking performance (Ikbal et al. 2021).

In the literature, different studies explored the feasibility and reliability of such IVR system for precise experimental measurements. However, the variability in the overall results precludes any definitive conclusion. Some works reported high tracking performance with accuracy and precision values ranging from millimetre up to submillimetre scales (Spitzley and Karduna 2019a; Ameler et al. 2019; Veen et al. 2019a,b; Jost et al. 2019; Ikbal et al. 2021). Jost and colleagues (Jost et al. 2019) affirmed that the HTC VIVE system has the potential to accurately track human movement in biomechanical and physiotherapy research and in the clinical setting. In contrast, other studies highlighted that the same system presents clear tracking issues (Niehorster et al. 2017), with error values that can vary from centimetres (Hemphill et al. 2020) up to metres for dynamic movements (Borges et al. 2018). The variability in the existing results is related to the various methodologies, experimental setups and intended application of the different available works. As an example, some studies have analysed planar movements (Niehorster et al. 2017; Borges et al. 2018), while others have been focussed on 3D motions (van der Veen et al. 2019a,b), highlighting a lack of a consensus in the methodologies to compare performance across various applications. In this sense, the work of Ikbal and colleagues (2021) laid a solid foundation towards the identification of a standard procedure to evaluate the performance of the HTC VIVE tracking in static and dynamic conditions for a custom application. However, in their comprehensive study, they did not consider the effects of external interferences in the working area by conducting their evaluations in a controlled environment. Such conditions do not entirely reflect a standard-use scenario where occlusions in the acquisition area can affect the quality of tracking (Niehorster et al. 2017). Hence, our study aimed at systematically exploring the static-positional accuracy of the HTC VIVE tracking and its robustness against possible tracking occlusions occurring inside the capture volume.

2 Methods

2.1 Overview of the HTC VIVE pro and the SteamVR tracking system

The Vive Pro System configuration included a head-mounted display (HMD), two controllers, two lighthouses BASE stations, and two VIVE trackers 2.0 (2018). The HMD is equipped with two AMOLED lenses with a resolution of 1440 × 1600 pixels per eye with a refresh rate of 90 Hz, covering a nominal field of view of 110° (2021). The position and orientation of the trackable elements in the space (i.e. HMD, controllers or trackers) are obtained through the SteamVR tracking system, which operates on a combination of inertial and outside-in tracking principles (Borges et al. 2018).

The tracked position and orientation are updated primarily by inertial measurement units through dead reckoning (path integration) allowing high update rates (Niehorster et al. 2017). The BASE stations limit and correct the intrinsic “drift” error of the inertial measurements by providing additional kinematic data through the solution of the so-called perspective-n-point (PnP) problem (Maciejewski et al. 2020). The system extrapolates positional and orientation values from a set of sensors (IR photodetectors) located on the trackable device (i.e. HMD, controllers, or trackers) illuminated by the lighthouses. The latter emits a wide IR synchronisation blink followed by two wide range IR pulses which sweep the tracking area repeatedly within a 120 degree angle, one axis at a time from left to right and then from top to bottom. By knowing the angular velocity of the device, and the time between the synchronisation blink and the detection of the laser pulse, the system determines the directions in which each photodetectors are located. The directions of at least four non-coplanar photodetectors are the basis to solve the PnP mathematical problem and consequently improve the accuracy of the tracking.

2.2 Experimental setup

The HTC VIVE Pro system was used for the different experimental sessions. The experimental sessions were set in a 7 × 5 m room, with neither reflective surfaces nor natural lighting exposure. The floor alignment was verified with a spirit level. According to the HTC Vive Headset User Manual (HTC Corp 2020), the two lighthouses were fixed to the ceiling at the height of \(h\) = 3 m with a pitch inclination (Y-axis) of \(46\pm 1\) deg and a roll inclination (X-axis) lower than \(1\pm 1\) deg. The axes are shown in Fig. 1. Lighthouses were placed at a distance of \({d}_{c}\)= 4 m from each other. Such angulation was fixed, following the maximal limits specified in the manual mentioned above, considering the installation height of the lighthouses, which were coupled in “Sync cable” mode. At the middle of the junction line that connects the floor projections of the two lighthouses, we identified the centre of a Cartesian grid. The grid was a square area of side \(L\)= 2 m subdivided into sub-square regions of \(l\) = 0.5 m side, drawn on the floor with a \(d\) = 2 cm wide adhesive tape. By doing this, \({n}_{p}\)= 25 points were identified and categorised (relying on the specific position of the grid, Fig. 2) in: centre grid (CG) points, i.e. from 1 to 9, and limit grid (LG) points, i.e. from 10 to 25. Point 1 was considered as the origin of the XY plane. The grid was used as a reference to set the virtual gaming space using the SteamVR application.

Fig. 1
figure 1

Rotation axes of a VIVE lighthouse base station

Fig. 2
figure 2

Top view of the grid setup. The central points (CG) are identified by the red numbers on the grid nodes, the limit points (LG) by the blue ones

2.3 Experimental protocol

To validate the positional accuracy of the tracking system, we evaluated the x,y,z coordinates of the VIVE tracker (2018 version) registered in all the points of the grid. For each measurement session, the tracker was moved clockwise along the grid, from the centre to the outer borders. The choice to use the tracker instead of any other trackable support (headset or controller) was led by the optimal position of its reference frame, located at its base in direct contact to the floor, as shown in Fig. 3.

Fig. 3
figure 3

Dimensions and coordinate system of the 2018 Vive marker (a); the schematic representation of the cardboard base support under the tracker (b) (HTC Corp 2017)

Each point measure was acquired for 5 s with a sampling rate of \({f}_{S}\) = 40 Hz, using custom software developed within the framework of Unreal Engine (v4.24).

To evaluate the positional accuracy of the Lighthouse technology and its robustness against possible tracking occlusion inside the capture volume, we ran three different tests:

  • Accuracy (AC) test: this test aimed at evaluating the intra- and inter-trackers measurement variability. We conducted \({n}_{r}\)= 4 acquisitions with two different trackers, identified as T1 and T2. In each measurement repetition, the trackers' positions were acquired at each point of the grid according to the configuration shown in Fig. 4 with no obstruction in the capture volume.

  • Robustness to total occlusion (RTO): this test aimed at evaluating the system's robustness against total occlusions of a fixed duration, \({t}_{o}=5s\). The tracker's position was acquired two times on each point of the grid, before and after the tracker coverage, following the protocol displayed in Fig. 4. Before the second acquisition, a time of \({t}_{w}=5s\) was waited, in order to allow the tracker to reacquire the signal. A time of 5 s is enough to both lose and regain the sensor tracking, verified by observing in the SteamVR interface the real-time information about the connection status of the trackers. Measurement sessions were repeated four times on a single tracker.

  • Robustness to partial occlusion (RPO): this test aimed at evaluating the robustness of the system against partial occlusions. The sight of the tracker was alternatively occluded to one of the lighthouses. The tracker's position was acquired five consecutive times at each point of the grid, following the protocol displayed in Fig. 4. Measurement sessions were repeated four times on a single tracker.

Fig. 4
figure 4

Measurement session configurations and different test protocols shown from the top view. Note that the yellow squares represent the boxes used for the tracker occlusions

To ensure repeatability of marker positioning, a cardboard base support was placed underneath it. This support consists of two rectangular elements positioned perpendicular to each other, with the smaller side equal to d, as shown in Fig. 3. This cross-like shape allows at the same time for centring the marker on the nodal point of the grid and for aligning precisely its orientation as needed. For each test, the position of the tracker was standardised, with the status led facing towards the positive direction of the X-axis.

Two simple cardboard boxes, Box A and Box B in Fig. 5, were used to occlude the tracker visibility, respectively, for the RTO and the RPO tests. Box A covered entirely the tracker. Box B had an open side to cover the tracker visibility from one lighthouse at a time. For the RPO test, diagonal lines were added to the grid to allow repeatable directionality of Box B (dashed lines in Fig. 5).

Fig. 5
figure 5

Schematic representation of Boxes A and B used to occlude the tracker, respectively, for the RTO and RPO tests

Before each measurement session, the lighthouses, the SteamVR and the Unreal Engine software were restarted.

2.4 Data analysis

The \((x,y,z)\) coordinates of the tracker were registered for 5 s at each of the \({n}_{P}\) points on the grid, obtaining about \(n\) = 200 samples for each point and per axis. This was repeated \({n}_{r}\) times. The single sample in our dataset was identified with:

$${P}_{ji,a}^{(k)}$$
(1)

where \(k=x,y,z\) identifies the axis, \(i = 1,...,n\) identified the i-th sample in the j-th point of the grid (\(j=1,...,{n}_{P}\)) and \(a = 1,...,{n}_{r}\) identified the a-th repetition. Therefore, the total number of samples is \({N}_{tot}={N}_{a}\cdot {n}_{r}\), where \({N}_{a}\) is the number of samples from the a-th acquisition.

All the signal processing and statistical data analysis were carried out using MATLAB (Matlab R2019b).

2.4.1 Pre-processing

A first-order low-pass Butterworth filter was applied for each acquired grid point to denoise the signal. Based on the FFT analysis, the cut-off frequency was set to 1.81 Hz to have 85% of the signal power spectral density.

Subsequently, a first graphical evaluation of the dataset highlighted that each repetition had a random offset \({\underset{\_}{{P}_{a}}}^{(k)}\), as shown in Fig. 6. We therefore subtracted \({\underset{\_}{{P}_{a}}}^{(k)}\) from the dataset as follows:

$$P_{{ji,a}}^{{(k)*}} = P_{{ji,a}}^{{(k)}} - \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{P} _{a} ^{{(k)}}$$
(2)
$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{P} _{a}^{{(k)}} = \frac{1}{{N_{a} }} \bullet \sum\limits_{1}^{{N_{a} }} {P_{{ji,a}}^{{(k)}} }$$
(3)

where \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{P} _{a} ^{{(k)}}\) is the average value of the a-th acquisition and \({P}_{ji,a}^{(k)*}\) is the corrected sample.

Fig. 6
figure 6

AC test dataset processing: plots of the mean and standard deviations calculated on each grid point for each axis of acquisition (X, Y and Z), before (upper plots) and after (lower plots) the offset subtraction for both tracker T1 and T2

All three test datasets were filtered and adjusted by applying the offset subtraction.

2.4.2 Accuracy test: intra- and inter-tracker analysis

To evaluate the single tracker repeatability, an intra-tracker analysis was conducted. We calculated the difference between the position measurement in each grid point and the ideal grid point.

Then, we focussed on a detailed inter-tracker analysis to explore eventual differences between the two trackers, since many applications require tracking more than one point at a time.

Thus, we conducted the Bland–Altman analysis (described in Sect. 3.1.1) and we computed the average deviation between the positional measurements obtained by T1 and T2 for the XYZ volume (Eq. 4), for the XY plane (Eq. 5), and for the Z-axis (Eq. 6).

$${\Delta }_{xyz}^{i}=\sqrt{{\left({\delta }_{x1x2}^{i}\right)}^{2}+{\left({\delta }_{y1y2}^{i}\right)}^{2}+{\left({\delta }_{z1z2}^{i}\right)}^{2}}$$
(4)
$${\Delta }_{xy}^{i}=\sqrt{{\left({\delta }_{x1x2}^{i}\right)}^{2}+{\left({\delta }_{y1y2}^{i}\right)}^{2}}$$
(5)
$${\Delta }_{z}^{i}={\delta }_{z1z2}^{i}$$
(6)

where i = 1,…,25 and \({\delta }_{x1x2}^{i}\), \({\delta }_{y1y2}^{i}\), \({\delta }_{z1z2}^{i}\) are the absolute differences between the measurement of the two tracker, for each axis (Eq. 7 for the X-axis, the same for Y and Z).

$${\delta }_{x1x2}^{i}= abs({{x}^{i}}_{T1}-{{x}^{i}}_{T2})$$
(7)

Similarly, we computed the deviation between the positional measurements and the gold standard, represented by the ideal grid points coordinates.

2.4.3 Statistical analysis

The Bland–Altman (BA) analysis (Giavarina 2015) was conducted for the AC test to evaluate the measurement agreement between two trackers, T1 and T2. Acquisitions performed with T1 and T2 were identified as \({P}_{ji,a}^{(k)}\) and \({Q}_{ji,a}^{(k)}\), respectively (following the convention of Eq. 1).

Before performing the BA analysis (Fig. 13), it was verified that the difference vector \({D}_{ji,a }^{(k)}\) (Eq. 8), followed a normal distribution by observing histogram plots and skewness–kurtosis values.

$${D}_{ji,a }^{(k)}= {P}_{ji,a}^{(k)} - {Q}_{ji,a}^{(k)}$$
(8)

Moreover, to investigate if there was a dependency between the difference T1-T2 and the trackers' position on the grid, a linear regression was conducted on the BA data for the X- and Y-axis. The resulting line and the corresponding slope were displayed on the BA plots.

The RTO test aimed at evaluating the behaviour of the system before and after the tracker coverage, by quantifying the difference in the grid points acquisitions before and after the total occlusion.

Pre- and post-coverage data were saved in two matrices \({P1}_{ji,a}^{(k)}\) and \({P2}_{ji,a}^{(k)}\), respectively, defined as Eq. 1. Then the difference matrices \({D12}_{ji,a }^{(k)}\) were defined as reported in Eq. 9.

$${D12}_{ji,a }^{(k)}={P1}_{ji,a}^{(k)} - {P2}_{ji,a}^{(k)}$$
(9)

Regarding the partial occlusion robustness test, the tracker position was recorded five times for each grid point, according to the different conditions described in Fig. 4. These five recordings were collected in different matrices \(P1, P2, P3, P4, P5\) and were compared two-by-two by analysing the difference matrices \(D12, D23, D34, D45, D51\), defined as in Eq. 10.

$$\begin{gathered} D12_{ji,a }^{\left( k \right)} = P1_{ji,a}^{\left( k \right)} - P2_{ji,a}^{\left( k \right)} \hfill \\ D23_{ji,a }^{\left( k \right)} = P2_{ji,a}^{\left( k \right)} - P3_{ji,a}^{\left( k \right)} \hfill \\ D34_{ji,a }^{\left( k \right)} = P3_{ji,a}^{\left( k \right)} - P4_{ji,a}^{\left( k \right)} \hfill \\ D45_{ji,a }^{\left( k \right)} = P4_{ji,a}^{\left( k \right)} - P5_{ji,a}^{\left( k \right)} \hfill \\ D51_{ji,a }^{\left( k \right)} = P5_{ji,a}^{\left( k \right)} - P1_{ji,a}^{\left( k \right)} \hfill \\ \end{gathered}$$
(10)

Since the difference matrices for both the RTO and RPO tests (Eqs. 9 and 10) were not normally distributed, the nonparametric BA (NPBA) (Bland and Altman 2010) was applied. For the NPBA, the upper limits of agreement (ULoA) and the lower limits of agreement (LLoA) were identified by the 97.5th and the 2.5th percentiles (Bland and Altman 1999), each of them characterised by a 95% confidence interval (CI) calculated using a percentile bootstrap method based on 10 k samples (Davison and Hinkley 1997). The percentile method was chosen for its conservative nature, as it tends to produce wider CI less sensitive to population value and sample size (Jung et al. 2019). The ULoA and LLoA and the respective (CI) identified the limits of agreement (LoA) range. In all the BA and NPBA plots, we reported the corresponding grid point numbers near the circles that are located above the lower end of ULoA’s CI and/or under the upper end of the LLoA’s CI. This helped us understanding which grid points were more responsible for greater LoAs’ ranges.

2.4.4 Accuracy computation

The positional accuracy over the XY plane was computed as defined in Eq. 11:

$$Acc = mean(abs(\widehat{d} - D))$$
(11)

\({\widehat{d}}_{i}\) is the average geometric distance between the average origin measurement \({\widehat{x}}_{1}\) and the average positional measurement \({\widehat{x}}_{i}\) on the i-th over all grid points, as defined in Eq. 12.

$${\widehat{d}}_{i}=\sqrt{{({\widehat{x}}_{1}-{\widehat{x}}_{i})}^{2}+{({\widehat{y}}_{1}-{\widehat{y}}_{i})}^{2}}$$
(12)

Likewise, D is the geometric distance vector obtained from Eq. 12 by considering the distance between the ideal origin, i.e. (0,0), and all the ideal grid points positions on the plane.

Then, to evaluate whether the accuracy varies depending on the position on the grid, we performed the same calculation, this time by differentiating the 8 CG (from 2 to 9) from the 8 LG (11,13,15,17,19,21,23,25) points defined above.

$${Acc}_{z} = \mathrm{mean}(\mathrm{abs}(\widehat{z} - Z))$$
(13)

For the Z-axis, we computed the accuracy as the mean value of the absolute difference between the ideal plane points (\({z}_{i}=0\)) and the average positional measurement \({\widehat{z}}_{i}\) (Eq. 13). Similarly, to the XY plane accuracy computation, the z-accuracy was computed for all grid points and differentiating the selected 8 CG from 8 LG points, as described above.

3 Results

3.1 Accuracy test: intra- and inter-tracker analysis

Both trackers identified a tilted surface for the floor plane on which the test was conducted, as shown in Fig. 9. The two surfaces were obtained by using, for each grid point, the average value of the processed data reported in Fig. 6. The surfaces identified with both trackers are comparable. The maximum Z-variation was lower than 8 cm and observed along with the 2 × 2 m grid plan diagonal, resulting in a slope lower than 0.025.

The average values of the processed data, acquired with both trackers on each grid point, appear superimposed in comparison to the dimensions of the grid.

For the intra-tracker analysis, we reported the difference distribution for each axis separately in the boxplots of Fig. 10, where XT1 is the X-axis measurement for tracker T1 and so on. The boxplots showed a similar trend for all the acquisitions. The outliers in XT1 and XT2 are associated with the 19th and 21st grid points. The outliers in YT1 and YT2 are associated with the 17th, 24th and 16th, 17th points, respectively. Therefore, the outliers are located on the limit of the grid and more precisely to its outer corners (Fig. 2).

Then, we calculated the average deviation between the positional measurements obtained by the two trackers T1 and T2, by varying the position on the XY plane. Differences on the XY plane and in the volume XYZ are reported in Table 1.

Table 1 The deviation between T1 and T2 positional measurements (mean ± SD)

Similarly, we computed the deviation between the positional measurements and the gold standard, represented by the ideal grid points coordinates, by merging both trackers acquisitions (Table 2).

Table 2 Deviation between positional measurements and the gold standard

3.1.1 Bland–Altman analysis

In some RTO and RPO tests, the tracker lost its visibility on the LG points closest to the cameras, i.e. 12, 13, 14, 20, 21 and 22. This issue was probably due to the proximity to the vertical visibility limit of one of the two cameras (Fig. 2), but it was not present in all the acquisitions. In Fig. 7, the processed dataset of the RTO test is displayed. The grey bands on the plots indicated the data related to the points closest to the cameras. Even when the tracker was not completely lost in these grey areas, the values were characterised by visible artefacts. The same was observed for the RPO test (shown in Fig. 8). Thus, two BA analyses for the inter-trackers evaluation for the AC test were conducted: the first with all the grid points and the latter by excluding the ones closest to the camera field of view (FoW) limit. By removing these points, all the LoA ranges decreased by about 1 mm. The BA plots, obtained by excluding the grid points closest to the camera FoW, are shown in Fig. 9, 10, 11, while those with the complete dataset are reported in "Appendix 1".

Fig. 7
figure 7

Plot of the mean and standard deviations of both raw data (upper plots) and the processed data (filtered and without offset—lower plots) for the RTO test for X-, Y- and Z-axis

Fig. 8
figure 8

Plot of the mean and standard deviations of both raw data (upper plots) and the processed data (filtered and without offset—lower plots) for the RPO test for X-, Y- and Z-axis

Fig. 9
figure 9

Colour map of the recorded Z-axis positions on the grid, both for T1 and T2 (upper plots) and 3D representations of the colour map (lower plots). The Z-axis has a different scale than the other two axes, to better show the slope

Fig. 10
figure 10

Distribution of the difference between the position measurement in each grid point and the ideal grid point. Boxplots were grouped by tracker, axis and acquisition

Fig. 11
figure 11

BA plots of the AC test for T1 and T2 for the X-axis (upper plot), Y-axis (middle plot) and Z-axis (lower plot). The blue line is the linear regression of the data. The red and blue circles indicate the means of the n samples corresponding to each of the CG and LG points, respectively

3.1.2 Accuracy computation

The computed accuracy on the XY plane, considering all the grid points, was 0.5 ± 0.2 cm. An accuracy of 0.4 ± 0.1 cm and 0.6 ± 0.1 cm was obtained by differentiating the CG and the LG points, respectively. In line with that, the maximum deviation from the ideal position was observed in the grid points closest to the cameras. The deviation between the tracker’s measurements and the ideal grid positions was due to both an intrinsic uncertainty (~ 1 mm) characterising the manual grid construction method and the systematic error occurring during the tracker repositioning on the grid points.

Meanwhile, for the Z-axis, by considering the acquisitions in all the grid points, the accuracy value was equal to 1.7 ± 1.2 cm. The accuracy was equal to 1.1 ± 0.7 cm and 2.1 ± 1.4 cm for the CG and LG points, respectively.

3.2 Robustness to occlusions test

The NPBA analysis, for both the RTO and RPO tests, was conducted by excluding the points close to the cameras' field of view, as described for the AC test.

  • Total occlusion: for the RTO test, the Bland–Altman plots and the values of the respective parameters (ULOA, LLOA, BIAS) are reported in Fig. 12. To summarise, the calculated LoAs were lower than 1.5 cm for the X-axis, 1 cm for the Y-axis and 2 cm for the Z-axis.

  • Partial occlusions: Regarding the RPO test instead, the NPBA plots are reported in "Appendix 2", whereas values of the parameters (LoA, median and slope) are reported in Table 3.

Fig. 12
figure 12

NPBA plots of the robustness test pre and post the total occlusion, for the X-axis (upper plot), Y-axis (middle plot) and Z-axis (lower plot). Each plot point is the mean of n samples. The red and blue circles indicate the means of the n samples corresponding to each of the CG and LG points, respectively

Table 3 NPBA analysis results for the five different conditions (D12, D23, D34, D45, D51) of the RPO test

4 Discussion

This study aimed at increasing the available evidence about the performance of the SteamVR tracking system, in terms of static-positional accuracy and its robustness in suboptimal operating conditions. Specifically, our main efforts were directed towards the systematic evaluation of the effects of partial and total occlusions in the recording volume. This issue represents a pivotal point in the field of motion tracking, and it has been highlighted as one of the main sources of measurement variability in various applications (Jiménez Bascones et al. 2019; Conconi et al. 2021). Thanks to the inherent characteristics of the SteamVR tracking system, that set it apart from the competitors in consumer-grade VR applications (Ikbal et al. 2021), several studies have already explored the capabilities of this technology for research and scientific applications. In particular, most of the literature underlines how systems based on SteamVR tracking provide excellent measurement performance with accuracy values that vary in millimetre or even sub-millimetre scales (Spitzley and Karduna 2019b; Ameler et al. 2019; Veen et al. 2019a,b; Jost et al. 2019; Ikbal et al. 2021). For instance, Jost and colleagues (Jost et al. 2019) validated the positional and rotational performance of such technology in tracking the controller inside a room-scale (i.e. lighthouses positioned at 5.6 m apart) and standing configurations (i.e. lighthouses positioned at 2.6 m apart) against an optoelectronic motion-capture system. They obtained a sub-millimetric measurement difference (0.74 ± 0.42 mm under room-scale calibration and 0.63 ± 0.27 mm for standing calibration trials) for robot-driven motions, which increased up to 3.97 ± 3.37 mm for human movements. Similar evidence was found by Spitzley and Karduna (2019b), who compared the performance of the SteamVR tracking with a gold-standard magnetic tracking system, obtaining average error values below 0.35 mm. Other works highlighted the limitations of the system, reporting error values that vary in centimetres up to metre scales for dynamic movements (Niehorster et al. 2017; Borges et al. 2018; Hemphill et al. 2020). By implementing an HTC Vive system as a mobile-unit for room-to-room therapy, a recent work estimated the mean translational errors for the motion tracker, the controllers, and the HMD as (2.43 ± 1.57)cm, (3.63 ± 1.27)cm, and (2.10 ± 0.61)cm, respectively (Hemphill et al. 2020). Among the variability of reported results in the literature, our study supported the evidence of a millimetric precision of the tracking system across the XY plane, identifying an average accuracy of 0.5 ± 0.2 cm. As for the Z-axis, measurements showed a slope among the direction linking the two lighthouses, which increased the variability with respect to the XY plane, i.e. (1.7 ± 1.2) cm. More interestingly, the further analysis conducted on the different points of the recording area indicated that the central part should be considered as the ideal registration position, as pointed out by the higher level of accuracy for both the XY plane and the Z-axis.

According to our findings, the analysis of Niehorster and colleagues (2017) also reported how the accuracy of the HTC Vive system decreased in the identification of the HMD position along the vertical axes. They found good performance on planar measurements but a variability in the recorded height ranging from ~ 40 cm to ~ 4 cm across space, following a tilted surface. Conversely, by placing a robotic arm in the centre of the acquisition volume at a height of 0.90 m and tracing its motion with a VIVE tracker, Ameler and colleagues (2019) reported sub-millimetric accuracies along all the three coordinate axes. These discrepancies could be due to the strict dependence of the SteamVR tracking system’s performance on the experimental setup and the application in which it is used (Ikbal et al. 2021). In line with that, our results could be influenced by the vertical distance of 3 m, which places the measurement grid at the lower limit of the acquisition volume. A different setup, with a grid positioned at 1 m height, may have led to better accuracy results. However, the correct identification of the floor level plays a fundamental role for the game development, since it represents the origin of the SteamVR tracking system, defined during the calibration procedure.

The use of two different trackers did not significantly affect the overall quality of the measures and the trajectories of the (x,y,z) measures across the grid followed similar trends independently from the specific tracker or acquisition. This was even clearer after the initial offset removal in the data pre-processing, indicating a high inter-rater and test–retest reliability. Niehorster (2017) highlighted the presence of a random offset between different acquisitions and after brief signal losses, stating that this may continuously influence the measurement, and even a calibration procedure could be ineffective to correct it. Our robustness to occlusion tests, however, support only partially these conclusions: tracking artefacts occurred specifically on the points closest to the lighthouses’ field of view limits. Limited visibility from the SteamVR lighthouses resulted in variability and loss of accuracy, with a consequent spatial-dependant behaviour of the measures. This was evidenced in the Bland–Altman plots since most of the points falling close to the LoA correspond to the outermost points of the grid. Therefore, particular attention must be paid to setting up the system by selecting the recording volume that benefits from the greater visibility from both the tracking stations. Then, we demonstrated the robustness of the SteamVR tracking system for short transient occlusions occurring in the capture volume since the nonparametric Bland–Altman analysis showed small LoA ranges comparable to the one of the accuracy test (~ 2 cm).

The empirical findings of the present study should be evaluated in light of two main limitations. Our systematic approach was majorly focussed on the in-depth evaluation of the tracking occlusion issue across the recording area, limiting the analysis to static-positional acquisitions on a single plane. Further efforts should aim at characterising completely the SteamVR tracking, exploiting the presented methodology for volumetric and dynamic measures for position and orientation. Moreover, the intrinsic nature of SteamVR tracking considers the combined contribution of the Inertial Measurement Units (IMUs) and optical sensors. In the presence of occlusion, IMUs can be used for position and orientation estimation. In the first case, the position is estimated by double integration on the accelerometer data, even if this measurement, therefore, presents a considerable drift (Sitole et al. 2020). The gyroscopes of the IMUs provide the orientation, and the inevitable integration drift is partially corrected for pitch and roll by the accelerometer data. In contrast, the yaw correction is more difficult, having to rely on the magnetometers (Stanzani et al. 2020). The drift generally depends on the IMUs used, their calibration and environmental factors, e.g. temperature (Paternain et al. 2013). To evaluate the drift magnitude on all the IMU-derived measurements, a thorough analysis of the effect of time under-recording occlusion conditions would be needed. Important to notice, the new version of the SteamVR tracking system 2.0 has been recently commercialised and, by allowing the simultaneous connection of 4 lighthouses, it could have better performances and the potential to overcome the issues highlighted in the present work. In line with that, future studies could investigate the robustness to occlusions of the new technologies.

Overall, the presented evidence, together with the proven robustness of the tracking to both partial and total occlusions, promotes the SteamVR system for static measures in the clinical field. The estimated error can be considered clinically irrelevant for exercises aimed at the rehabilitation of functional movements of multiple joints simultaneously whose several motor outcomes are generally measured on the scale of metres (e.g. back forward flexion, reaching exercises, etc.) and where previous VR systems were studied and deployed (Hemphill et al. 2020). The person immersed in the IVR is free to move and to get in contact with trackers which can be represented by real-world objects in the virtual environment (Maciejewski et al. 2018). Real-world objects can be chosen according to the patient needs and preferences making the application tailored and more attractive. Trackers could be considered as real external focus of attention (EFA). Research reported that EFAs are more effective than internal focus of attention (IFA) in terms of motor performance and learning improvements (Rossettini et al. 2017) (Piccoli et al. 2018). Indeed, EFAs provide a higher level of movement efficacy and kinematics across different types of tasks, skill levels and age groups (Wulf 2013). In line with that, performing an IVR exercise can increase patient engagement, leading to better rehabilitative outcomes (Laut et al. 2015). Simple examples of some possible rehabilitative applications could be referred to reaching exercises with objects located on the ground for patients with low back pain, weightlifting exercises and certainly, the y balance test (Powden et al. 2019).