3.1 Objective Results (Simulation)
A total of 400 targets were programmed and planned for the three rounds and of them 305 (~76%) were executed during the experimental sessions. Hence, none of the teams completed all possible targets. Of the 305 targets, 100 targets out of 128 (78%) were executed within round #1, 152 out of 208 (73%) within round #2 and 53 targets out of 64 (83%) within round #3. Note that round #3 was shorter than rounds #1 and #2.
With regard to viewing angle difference (i.e., where the observer was located relative to the attacker in the ground-ground teams), the distribution across rounds #1, #2, and #3 was 100 targets of narrow angle (up to 30°), 132 targets of mid angle (30–80°), and 73 targets of wide angle (80–150°).
Objective results were analyzed in three aspects: execution analysis (i.e., whether there was fire or it was ceased by the controller of the experiment), response time to acquire a target, in seconds, and accuracy of execution in meters (for acquired targets). In some specific cases of the attacker device simulations, the implementation of augmentation on the target in the simulation turned out to be problematic, as there were discrepancies in the position of the augmentation and the target itself, implicating on objective performance. Therefore, this information was excluded from the statistical analysis related to execution and accuracy distance.
Execution Analysis.
The execution analysis includes rounds #1 and #2 and is detailed separately for teams with the Spike-MR (dismounted) attacker and teams with the Tank (mounted) attacker. Execution was defined as a binary variable (1-for fire 0-for cease fire).
Dismounted attacker.
Table 1 and Fig. 1 detail the number of executions that ended either with fire or ceasefire (i.e., time run outs), for the Spike-MR as attacker and the different observer types. Different patterns of performance can be seen for the different team combinations. Fire was executed in 73% of the cases on average. A logistic regression within the framework of the GLM (generalized linear model) was chosen for analysis. The model yielded a marginally significant effect for communication ability (p < .08) only for the Spike-MR+UAS configuration. No other effects were significant. Hence, for the Spike-MR+UAS team there was a trend for fewer cease fires in the coordinates (baseline condition) relative to both still images types. In contrary, in the Ground-Ground teams (Spike-MR + Coral and Spike-MR + tablet), there seemed to be more fire executions using still images (either with or without markings) compared to the coordinates and the augmentation of location communication abilities, but these were not significant differences.
Table 1. No. of executions (fire and ceasefire) per teams of Spike-MR as an attacker, the different observers by communication means.
Mounted attacker. Table 2 and Fig. 2 detail the number of executions that ended either with fire or ceasefire since time runs out, for the teams of Tank and the different observers. As can be seen, more events ended with fire in the case of the Tank as an attacker (92% on average). Using the logistic regression analysis here yielded no significant main effects for communication ability.
Table 2. No. of executions (fire and ceasefire) per teams of Tank as an attacker and different observers and communication means
Free Round analysis.
The last round (#3) enabled participants to choose any of the four communication means (coordinates, still images, still images with markings, and augmentation on reality). Here, fire was executed in 43 out of 53 (81%) of the cases, and the other 19% of the cases ended with no fire because time has run out. The results clearly show preference for the still images communication ability with 33 (77%) fires using still images with markings and 9 (21%) using still images without markings. The ‘augmentation on reality’ communication was used only once (but recall also the comment about its implementation accuracy, hence this type of augmentation is very sensitive to the accuracy of augmentation implementation).
Response time analysis.
The time to acquire a target was measured as the time from the beginning of the trial till execution. According to Parmet et al. (2014) the way most common response time analysis methods treat cases of no response (i.e., missing data) is inaccurate, and therefore they suggested using an alternative analysis technique, named survival analysis, that lead to more reliable and robust conclusions. Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. Generally, survival analysis involves the modeling of time to event data, in our case, the event is the participant’s response (or no response) to a traffic scene. Survival analyses are statistical methods and procedures that accommodate censored data. Procedures that treat differently the information gained from uncensored and censored observations.
We fitted the Cox proportional-hazards regression model Cox (1972) which is the most common tool for studying the dependency of survival time on predictor variables. The initial model included the Communication ability (Coordinates, Still images, Still images with annotations, and augmentation on reality), the angle, the interaction between the two. Interaction was not statistically significant for any of the models and therefore removed from the analysis. The main effect for communication ability was found statistically significant in the cases of Spike-MR + UAS team and the Tank + Coral team, see Fig. 3.
Accuracy analysis.
The accuracy of target acquisition was measured by the distance between the target and the impact point. A logarithmic scale of the distance was used to display the data (Fig. 4).
Utilizing a GLM analysis on the log of the distance from target acquired (normally distributed) with communication ability (Coordinates, Still images, Still images with annotations) as the predicting variable, statistical main effects for communication ability were found for the Spike-MR & Coral team configuration (F(3,37) = 2.25, p < .097), Spike-MR & Tablet team configuration (F(3,17) = 5.22, p < .0097), where in both, still images only and still images with annotations, yielded shorter distances from the target than the coordinates.
With regard to the viewing angle, it was not balanced well across communication ability conditions by configurations, as can be seen in Fig. 5. Nevertheless, there is a trend showing that the accuracy distance of trials with still images were less sensitive to viewing angle than the other communication ability means. This finding needs to be replicated in future studies before making any clear statement.
3.2 Subjective Results
SUS (usability evaluation).
After completing the entire experimental session and using the system in all four possible modes of communication participants had to rate the usability of the system. They were asked to record their immediate response to each of the 15 items on a 5-points Likert scale. SUS yields a single number representing a composite measure of the overall usability evaluation of the system. SUS scores ranged from 55 to 100 (out of 100). Participants’ (both attackers and observers) average score was 88 (SD = 9). Hence, overall participants were satisfied with the communication user interface.
DSSQ (stress evaluation).
This questionnaire is concerned with participants’ feelings and thoughts while performing the task (on 0–4 scale). The DSSQ measures three aspects of subjective stress; task engagement (related to task interest and focus: energetic arousal, motivation, and concentration), distress (integrates unpleasant mood and tension with lack of confidence and perceived control), and worry (composed of self-focused attention, self-esteem, and cognitive interference). The DSSQ was collected after each experimental round (i.e., 4 times), hence, percent of change over the course of the experimental day could be calculated. The overall averages were 24 (ranged 9–28, SD = 4), 7 (ranged 0–17, SD = 4) and, 4 (ranged 0–16, SD = 4), respectively for task engagement, distress and worry. The DSS scores after each round are detailed in Table 3. Note that the maximum ‘engagement’ scores which can be achieved in the DSSQ is 28. Therefore, it seems that, on average, participants were highly motivated and engaged in doing the task. The potential highest scores for ‘distress’ and ‘worry’ are 28 and 24 (respectively). This can point out that participants were pretty relaxed and became progressively even less worried as the tasks progressed.
Table 3. Average and SD results for the DSSQ by round (highest possible scores are 28, 28, and 24, respectively)
Quality of Communication Questionnaire.
In view of the problematic implementation, attributable to discrepancies in the position of the augmentation and the targets, the following analysis of communication quality excludes the ‘Augmentation of target location on reality’ mode. Two different versions of a subjective assessment of the communication quality were used; one for the attackers and one for the observers. The questions were divided into three groups – cooperation items (4 items in both the attackers’ and the observers’ versions, three of them were identical), coordination items (4 items for both attackers and observers, three of them were identical), and performance items (4 items in the attackers’ version and 2 items in the observers’ version).
The three shared ‘cooperation’ items were aimed at evaluating team work and interaction (i.e.; “Team cooperation was good”, “We used the same jargon”, “A unique common language was created between us”). Figure 6 presents the different patterns of evaluation among the different teams and communication means.
The three common coordination items (i.e.; “It was necessary to use verbal communication to acquire the target”, “We worked in specific sequential order”, and “The verbal communication was based on still images”, note: the last item was relevant only within the still images comm. means) aimed at evaluating the advantages (or disadvantages) of the different communication means and the teams’ working techniques. Figure 7 presents the different patterns of evaluation among the different teams and communication means.
Participants scored low (Average = 1.2, STD = 0.9) on the fourth coordination item (attacker version: “I was overloaded and couldn’t use all still images I received”, and observer version: “I felt that the attacker did not use the still images I sent but trying to “figure out” by himself”) indicating they favored the still images communication. The average of all 6 common items across all teams and including coordinates and still images (with and without markings) communication means was 4.1 (on a 5 Likert scale), which means that the participants were highly coordinated and managed to create unique communication.
As for the self-performance evaluation, the analysis was done for the observers and the attackers separately. The performance items in the observer version were: “I was able to understand the attacker point of view” and “I was able to instruct the attacker based on his point of view”. The results show that the observers preferred the still images (either with or without markings) compared to coordinates (See Fig. 8 top). The average of all observers across all performance items was 3.6 (STD = 0.8). The performance items of the attackers which were included in the analysis (i.e., “It was difficult for me to acquire the target’s surrounding”, “It was difficult for me to acquire the target itself”, and “I felt confident with the target acquisition”) had an average of 3 (STD = 0.4) across all attackers. See Fig. 8 bottom for the detailed results. The attackers’ scores on the fourth performance item - “I felt confident attacking the target based on pictures”, which was relevant only for the still images (with and without markings) communication means, were high (Average = 4.3, STD = 1) with no differences between the still images only and the still images with markings.
3.3 Verbal Communication
The verbal communication channel between the attackers and the observers was available throughout the experiment. The data was measured by the total number of ‘ping-pong’ transmissions between the attackers and the observers and by the percent of time verbal communication was in use. Total of 5702 ‘ping-pongs’ took place while 56% within the Spike-MR attacker and 44% by the Tank. In addition, 39% within the Coral, 37% via tablet and 24% by the UAS. See Figs. 9 and 10 for the detailed results and the different patterns among the various teams and communication means.
The controllers were asked to scale their impression of the necessity of verbal communication between the observers and the attackers for target acquisition. The results are shown in Fig. 11.