For each of the primary tasks (stationary, stepping, and walking), this discussion addresses how the average skeleton and joint distances change with different Kinect positions. It is worth noting that Wei et al. [5] only studied stationary and stepping tasks, with near parallel and 45\(^{\circ }\) apart Kinects. We compare results with those in Wei et al.’s study where appropriate.
Stationary
The stationary task shows the best results when the Kinects are parallel to each other and worst when they are at furthest (90\(^{\circ }\)) apart. All measures of distances follow the same trend, from \({\Delta } x\) to \({\Delta } z\) and \({\Delta } d\) (Table 1). Skeleton distances in the stationary task increase with increasing angle between the Kinects.
The \({\Delta } y\) values are the smallest both when the Kinects are parallel to each other and when they are 90\(^{\circ }\) apart. The \({\Delta } y\) value in the stepping task is only slightly higher than its \({\Delta } x\) and \({\Delta } z\) values (0.21 cm and 0.42 higher, respectively). We observe that in general the skeleton transformation makes the least errors in the coordinate transformation of the y-axis, since we rotate the skeletons around the y-axis. Furthermore, the heights of both Kinects in the evaluation were fixed, and the participants did not move along the y-axis.
Wei et al. [5] reported lower values compared to those found in the current work. In their stationary task (average difference before movement) with parallel (\(4.25^{\circ }\)) apart Kinects, the skeleton distances in the \({\Delta } x\), \({\Delta } y\), and \({\Delta } z\) were 0.00, 1.00, and 2.00 cm, respectively. They did not report \({\Delta } d\) values. A calculation using the Pythagoras’ theorem shows that the corresponding \({\Delta } d\) would have been 2.24 cm, which is also lower than our 3.52 cm (Table 1). In their same task with \(45^{\circ }\) (\(44.37^{\circ }\)) apart Kinects, the skeleton distances in the \({\Delta } x\), \({\Delta } y\), and \({\Delta } z\) were 1.00, 1.00, and 1.50 cm, respectively. The calculated \({\Delta } d\) was 2.06 cm which is also lower than the 6.95 cm reported here. The differences could be accounted by the larger participant pool found in more realistic environments.
Stepping
Overall, the skeleton distances in the stepping task are higher compared to those in the stationary task; for every Kinect position tested, see comparison of averages in Fig. 8a. The increase in skeleton distances is expected, because the task requires the participants to take steps both closer and away from the Kinect sensor, which causes the tracking system to produce larger differences between the skeletons because of transformation. Similarly to the stationary task, the stepping task also shows best results when the Kinects are parallel to each other and worst when they are 90\(^{\circ }\) apart. All measures of distances follow the same trend, from \({\Delta } x\) to \({\Delta } z\) and \({\Delta } d\) (Table 1). For all Kinect positions, the \({\Delta } x\) values are the highest, then \({\Delta } z\) and \({\Delta } y\). The skeleton distances also increase with increasing angle between Kinects. This shows that the tracking accuracy of Out of Sight is affected by both increasing angles between Kinects and increasing complex human activities.
Wei et al. [5] also reported lower values. In their stepping task (average difference after movement) with parallel (\(4.25^{\circ }\) apart) Kinects, the skeleton distances in the \({\Delta } x\), \({\Delta } y\), and \({\Delta } z\) were 2.00, 1.28, and 3.78 cm, respectively. The calculated \({\Delta } d\) was 4.46 cm which is lower than the 6.87 cm (Table 1) found in the current study. In their same task with \(45^{\circ }\) (\(44.37^{\circ }\)) apart Kinects, the skeleton distances in the \({\Delta } x\), \({\Delta } y\), and \({\Delta } z\) were 4.28, 1.64, and 5.28 cm, respectively. The calculated \({\Delta } d\) was 6.99 cm which is lower than the 12.80 cm found in the current work but in accordance with the differences reported.
Walking
The skeleton distances in the walking task are also higher compared to those in the stationary and stepping tasks; for every type of Kinect configuration, see Table 1 and the averages in Fig. 8a. Since walking movements are even larger than stepping and stationary movements, the error in the walking task will be higher compared to the other two tasks. On the other hand, the skeleton transformation also works best with parallel Kinects, and the average skeleton joint distance from different fields of view increases with larger angles (Table 1). Likewise, when the Kinects are \(45^{\circ }\) and \(90^{\circ }\) apart, the \({\Delta } x\) values are still the highest, followed by \({\Delta } z\) and \({\Delta } y\).
The average and standard deviation of \({\Delta } y\) are almost invariant to changes from the stationary to the walking task (Fig. 7a and Fig. 8a, b). The standard deviation of \({\Delta } y\) is the lowest compared to that of \({\Delta } x\) or \({\Delta } z\) in all the tasks discussed so far (stationary, stepping, and walking), with all different Kinect positions (parallel, \(45^{\circ }\), and \(90^{\circ }\) apart Kinects), except in the stationary task with \(45^{\circ }\) and \(90^{\circ }\) apart Kinects. The average skeleton distance over all tasks and Kinect positions is smallest in the \({\Delta } y\) component (4.11 cm, s.d. = 1.36 cm), compared to both \({\Delta } x\) (10.04 cm, s.d. = 4.12 cm) and \({\Delta } z\) (9.01 cm, s.d. = 3.35 cm). This finding supports the aforementioned argument that \({\Delta } y\) is steady throughout the tracking process, regardless of tasks and Kinect positions.
Wei et al. [5] did not run their experiments with a walking task as described in the current work. There is not other similar work in the literature. These results show new accuracy measurements for multi-Kinect tracking systems in a more realistic scenario.
Scenario and position comparison
The stationary, stepping, and walking tasks can be ordered on a spectrum of complexity, where the former requires zero movement, and the latter requires continuous movement. The evaluation so far shows that skeleton distances increase with increasing task complexity (Fig. 8a). The correlation can be attributed to increasing joint movements and turning of the shoulders. There is little variation in the accuracy of the technique between the distance dimensions across different joints, as shown in Fig. 7b. Therefore, skeleton transformation can be applied to all joints with the same confidence of joint positioning.
When testing the correlation of the angle to distance accuracy, a high correlation for \({\Delta } d\) of 0.985 shows that the larger the angle, the larger the distance between estimated skeletons, which is also visible in Fig. 8b. The angle between Kinects is related to the degree of rotation used in the transformation of multiple skeletons. A larger angle between the Kinects means that the skeletons will be rotated more, hence producing larger coordinate differences.
When varying only either the task complexity or the angle between the Kinects, the results show similar trends (Fig. 8a, b). In short, the distance between two computed skeleton joints increases with either a more complex task or a larger angle between multiple Kinects. The average distance \({\Delta } d\) is smallest in the stationary task with parallel Kinects (3.52 cm), and it is largest in the walking task with 90\(^{\circ }\) apart Kinects (32.38 cm). The overall average across all cases of task complexity and Kinect placement is 16.08 cm (s.d. = 5.84 cm). We believe interactive systems can make use of our Out of Sight tracking infrastructure within this error. This important finding shows the limits of how close people can be and still be distinct from one another when using this technique with multiple Kinects, both to extend coverage and to overcome occlusion.
The least accurate positioning of the Kinect was when the Kinects are \(90^{\circ }\) apart, where the average overall scenario was shown to have a \({\Delta } d\) mean distance of 27.76 cm (Fig. 8b). This boundary is still within the personal space, or the space where only one person is most likely to occupy, where close personal space can be defined as within 45 cm from the person; for a discussion of personal space, see [12]. The results therefore show preliminary success in tracking people using transformed 3D skeleton joint positions.
Tracking behind an obstacle
The obstacle task demonstrates that the tracking system can acquire, as complete as possible, joint coordinates for the same person from multiple Kinects when the person is occluded in one of the fields of view. Specifically, Out of Sight constructs an average skeleton from detected skeletons in all available fields of view. This has implications in scenarios where only one of multiple depth-sensing cameras has a clear view of the target. A use case would be a two-player interactive game, where the players are provided with feedback based on the other player’s position behind an obstacle, such as a wall. Another example would be a group of robots collectively searching for a person with particular appearance features in a large, occluded environment. The current system shows that this approach can reconstruct a person’s average skeleton when they are occluded and when they reappear from occlusion.
Tracking during occlusion
The Out of Sight RESTful API was validated with a toolkit usage scenario of two users standing side by side and then one user obstructing another, and vice versa. The API was shown to provide the matched skeletons for both scenarios when the participants were visible to both Kinects, and it also worked in 85% of the binary tests where one person obstructed the other person in one field of view. An example of the toolkit is shown in Fig. 3. This shows that the API can be used to track people behind obstructing objects, allowing future integration of occlusion-free skeleton stream into custom applications.