1 Introduction

Collaboration can be defined interdependent activities between co-located or remote individuals, conducted to achieve shared objectives (Thomas 1996; Kim et al. 2018a, b, 2020, 2020; Gervasi et al. 2020).

Many areas of application take advantage of collaborative activities, as is the case of medical, educational and industrial domains, and others (Johnson et al. 2015; Schneider et al. 2017).

Scenarios involving collaboration have evolved from more simple co-located scenarios to more complex use-cases of remote collaboration, encompassing several team members with different multidisciplinary backgrounds.

Therefore, the methods required to address collaborative activities have been growing in terms of complexity, scale, and interdisciplinarity, entailing not only the mastery of multiple domains of knowledge, but also a robust degree of expertise in every one (Arias et al. 2000; Schneider et al. 2017; Marques et al. 2021f).

Remote collaboration implies that on-site collaborators must be able to communicate and interact with remote collaborators in different places, both effectively and efficiently for best task performance (Kim et al. 2020).

In this process, and to integrate different activities in a seamless manner, team members must come together towards a joint effort.

Technological support for remote collaboration has been addressed by Computer-Supported Cooperative Work (CSCW), focusing on conceptualizing, designing, and prototyping solutions for communication, cooperation, assistance, training, learning, as well as knowledge sharing among distributed collaborators (Grudin and Poltrock 2013; Billinghurst et al. 2015; Jalo et al. 2018; Ens et al. 2019).

One major issue of remote collaboration is the fact that collaborators don’t share a common space/world. This is of particular interest in computer-mediated collaboration to provide more natural interaction and spatial connection, balancing collaboration between distributed team-members closer to face-to-face collaboration (Billinghurst et al. 1998; Rhee et al. 2020). In this vein, video solutions (considered the most traditional approach), as well as novel paradigms based on Virtual and Augmented Reality (VR &AR) technologies have been explored (Billinghurst et al. 2015; Jalo et al. 2018; Kim et al. 2018c; Ens et al. 2019; Alves et al. 2021; Gonçalves et al. 2021; Wang et al. 2021; Marques et al. 2022b).

In different ways, all these empower team members in need of additional knowhow from collaborators unavailable on-site, i.e., define a common ground, support discussion, analysis and resolution of complex activities, facilitating identification of issues, as well as making suppositions and hypothesis visible (Wang et al. 2016; Hall et al. 2018; Lee et al. 2020; Ludwig et al. 2021).

These technologies should encourage remote experts to be active and on-site collaborators to be aware of their counterpart presence (Kim et al. 2018a).

This can be done in several ways, such as showing the expert’s messages in a shared view using visual communication cues like hand gestures, pointing, pre-defined shapes, drawing, or others.

Another common practice is to provide a representation, either through live video/audio stream, projected imagery, realistic volumetric capture or virtual replicas, e.g., a human avatar (Gerhard et al. 2004; Latoschik et al. 2017; Piumsomboon et al. 2018; Teo et al. 2019b; Yoon et al. 2019; Kim et al. 2020).

Furthermore, research hints that when developing novel interfaces and effective tools for such activities, it is paramount to consider, among other human factors, the sense of social presence, being a vital concept that might have an influence on the outcome of remote activities, as well as on collaborators perception, cognition, and behavior (Marques et al. 2021e; Kim et al. 2020, 2020; Osmers et al. 2021).

Social presence may be interpreted as the sense of being together and communicating with each other (Witmer and Singer 1998; Biocca et al. 2003; Gerhard et al. 2004; Oh et al. 2018; Yoon et al. 2019). It goes beyond the simple information regarding the location or availability of a team member, as it entails a feeling of being ’real and present’ during the interaction process, as well as following what is happening during communication (Yoon et al. 2019; Sereno et al. 2020; Marques et al. 2021e).

This is generally distinguished from the sense of presence, which, although it may be defined in multiple ways according to various authors, can be collectively referred to as a complete feeling of being there in the environment (Steuer 1992; Oh et al. 2018; IJsselsteijn et al. 2000; Rhee et al. 2020; Liberatore and Wagner 2021; Scavarelli et al. 2021).

Social presence has become a popular measure of collaborative quality, as well as how successful a given tool is. Besides, this is often associated with physiological, psychological, behavioral, and social effects, e.g., skills training and task performance, among others.

Researchers frequently hypothesize that higher social presence leads to better tools and higher collaboration, although some studies report contradictory results, i.e., higher levels of social presence translate into good collaborative quality in some studies, while there are no effects in other studies (Oh et al. 2018; Osmers et al. 2021).

One possible reason is the fact that the sense of social presence is measured in an isolated way, i.e., without considering other dimensions of the collaborative process as well, e.g., level of attentional allocation, information understanding, mental effort, as well as easy to share ideas and communication, among others. In fact, some studies go as far as to consider social presence as an indication of usability or user experience, thus avoiding using additional measurements during evaluation studies.

This misleads the research community, given that higher social presence may not always yield improved outcomes. For some scenarios, team members must split attention between the task, the shared information, and the counterpart representation, which may lead to fractured learning experiences (Wang et al. 2021; Marques et al. 2022a).

Hence, the question arises: is the sense of social presence (alone) enough to describe collaborative quality during remote task resolution?

Thus, the main motivation of our research is to explore different visual representations of distributed team members and how these may affect social presence, task resolution, and the collaboration quality.

This is an important topic, since it can inform the design of more effective solutions moving forward.

In this paper, motivated by the challenges reported, a user study with 37 participants was conducted, comparing two distinct conditions: 1- video chat, having the team-members representation always visible and 2- AR-based annotations, where the collaborators representation was never available.

These conditions were considered following insights of a participatory design process, in scope with a research project with partners from the industry sector (Marques et al. 2021a, 2022a, c, d).

In this context, a real-life maintenance task with four completion stages was used.

The results obtained are described and discussed based on a data analysis, showing that the majority of participants preferred the AR-based condition, despite the absence of team members representation, which led to slightly lower sense of social presence, but significantly higher results for the remaining dimensions of collaboration, as well as faster task resolution.

The remnant of this paper is structured as follows.

First, we describe the experimental setup and the conditions selected, as well as the rationale behind them.

Next, we present the experimental design and the procedure of the user study.

Then, we describe the results obtained through a data analysis and critically discuss the outcomes.

Last, conclusions are drawn and further research opportunities are presented.

2 User study on a remote maintenance scenario

A user study was conducted to explore how the team members representation affected social presence and other dimensions (e.g., easy to communicate or express ideas, enjoyment, mental effort, information understanding and attention allocation, as well as the overall sense of collaboration) of the collaborative process, as well as its impact in task resolution during a scenario of remote guidance.

2.1 Experimental setup

Two distinct tools were considered to establish a common ground between remote collaborators: C1 - Video tool (the expert representation was always visible) and C2 - AR tool (the expert representation was never available) (see Fig. 1).

These methods were considered following a participatory design process with partners from the industry sector (Marques et al. 2021a), in which collaborative realities of distributed team-members and the role of an existing video tool were discussed. Then, limitations and constraints were identified. To elaborate, it was reported that collaborators are limited to passively watching video feeds with no means for interaction with the remote physical environment, given that it only allows assistance through verbal cues or hand gestures in response to a visual feed, limiting the ability to reference areas of interest or specific objects on the environment. Therefore, it can become ambiguous or vague to communicate, leading to confusion and error, since video is not suitable for converging spatial information. This forces collaborators to adopt time-consuming, complex verbal negotiations to communicate their intended directions and achieve a common goal. Furthermore, using video has proven heavy for some networks, clearly affecting how remote collaborators interact.

These disadvantages represent the motivation to interactively introduce a different approach, in the form of an AR tool for the remote daily activities of these distributed teams.

The goal was to provide an alternative, without video, and assess if it could maintain the same level of robustness of the video tool, i.e., if it could solve the same problems with a different perspective.

Following, a brief explanation of each tool main features is provided.

To clear up, both methods used the same hardware, which means only the features of the tools were different.

The remote expert used a MacBook Pro 13-inch, 2017 laptop computer (2,3 GHz Dual-Core Intel Core i5, 8 GB 2133 MHz LPDDR3, Intel Iris Plus Graphics 640 1536 MB with a 1080p camera) running macOS Monterey 12.3.1.

As for the on-site participants, they used an Android-based (version 11) Lenovo Phab 2 Pro smartphone (Qualcomm Snapdragon 652 MSM8976, 4 GB RAM with a 16MP rear camera and 8MP front camera)

Plus, the Unity 3D game engine (2018.3.0f2) was used for the development of both tools, based on C# scripts. The augmented content was placed in the real-world environment through the Vuforia library (8.0.11). A dedicated server was responsible for all communication, using WebRTC calls over Wi-Fi through a WRT 1200 AC Dual-Band Router - 5 GHz.

2.1.1 Video chat tool

The first method shared video/audio to provide assistance during the collaborative process.

On-site team-members were able to point a handheld device to the situation context and share it via live video/audio stream with the remote expert.

For this method, the expert representation is available at all moments, while the on-site collaborator can switch between displaying their representation or the task context using the front/back camera of the handheld device, respectively.

Besides, text messages are also available through the chat, ensuring important information remains plain to see.

Taking advantage of said features, team-members were able to communicate/discuss the content being captured, as well as express their main difficulties, identify areas of interest or, in the case of the remote expert, inform where to perform a given action and how to do it.

2.1.2 Annotation tool using AR

The second method uses additional layers of information based on 2D annotations and AR. In this case, the expert representation is never available.

To capture the situation context, on-site team members must point a handheld device camera and capture an image. Then, using audio communication and annotation features like drawing, placing pre-defined shapes or notes, as well as sorting annotations, team members can edit the image captured. By doing so, it is possible to express difficulties, highlight relevant areas of interest or specify questions.

When the remote expert receives the enhanced capture, he/she is able to propose instructions using annotation features identical to what was explained before.

After the annotations are received by the on-site participant, he/she can point the device to the task context and perform an augmentation by re-aligning with the real world. When necessary, the handheld device can be placed on a surface to follow the instructions in a hands-free setting.

At any time, the device can be picked up, and an augmentation can be performed by re-aligning the instructions with the real world.

This can be repeated iteratively until the task is successfully accomplished.

Fig. 1
figure 1

Remote guidance supported by different tools: C1- video tool; C2- AR tool

2.2 Experimental design

The experimental design considered was a within-group approach, taking into consideration the null hypothesis (H0), i.e., to conduct the selected maintenance tasks, both experimental conditions are equally usable and acceptable.

The independent variable was the remote expert representation during collaboration, with two levels corresponding to the experimental conditions:

C1 - Video tool and C2 - AR tool.

The dependent variables were performance measures and participants’ opinion.

As secondary variables, participants’ demographic data, and previous experience with AR, as well as collaborative tools, were considered.

2.3 Tasks

We considered a situation where an on-site collaborator using a handheld device needed to conduct maintenance procedures while being assisted by a remote team-member using a computer.

The task was a synchronous problem with 4 completion stages (Fig. 2): 1- replace interconnected components, 2- plug and unplug some energy modules, 3- remove a specific sensor, as well as 4- integrate new components into the equipment.

Different procedures were used for each conditions to avoid learning effects, while make certain an homogeneous level of difficulty and resources. These were defined based on feedback from our industry partners, taking into account their needs and work activities.

Besides, team-members needed to communicate in a continuous way (alternately, i.e., reciprocal interdependence) to solve the procedures.

The on-site participant were required to use various hand tools to accomplish the defined procedures, despite low physical movement/effort was required.

Fig. 2
figure 2

Illustration of completion stages of the maintenance task: 1- replace interconnected components; 2- plug and unplug some energy modules; 3- remove a specific temperature sensor; 4- incorporate new parts in the equipment

2.4 Measurements

For the two conditions, all data was collected through the CAPTURE toolkit (Marques et al. 2021b, d), a Unity Package that can be incorporated to collaboration tools.

It enables contextual data collection and analysis in scenarios of remote collaboration, allowing to obtain an additional perspective on selected aspects of the collaboration.

A central server was responsible for storing all data associated with the remote team-members. A visualization dashboard was used for post-evaluation analysis (Fig. 3), allowing reviewing the collaborative process for the conditions selected.

Based on this, the toolkit was added to both an existing video chat tool and an AR-based annotation tool, following prior work (Marques et al. 2021c, a, 2022c).

In this vein, standard measures were considered, like

task performance is based on task time, i.e., duration needed to complete the task with success in a collaborative manner.

Besides, additional measures regarding the collaborative process were also contemplated, e.g., easy to communicate or express ideas, level of social presence, enjoyment, mental effort, information understanding and attention allocation, as well as the overall sense of collaboration (taking inspiration from existing studies from literature (Kim et al. 2018b; Aschenbrenner et al. 2018; Teo et al. 2019b; Piumsomboon et al. 2019; Merino et al. 2020; Kim et al. 2020; Marques et al. 2021f, b, a)).

These were considered given their predominant role in existing research, as relevant dimensions to help illustrate how the collaborative process occurred, in particular, when analyzed as a whole.

Participants’ emotional state, before and after the task fulfilment,

participants’ preferences and opinion were also registered.

Fig. 3
figure 3

Example of remote participants collaborating using different tools instrumented with the CAPTURE toolkit: 1- Expert using the tools to provide remote collaboration; 2- On-site technician reacting to the instructions provided; 3- Distributed multi-user data gathering; 4- Evaluation data storage; 5- Visualization dashboard for analysis of the collaborative process. Adapted from: (Marques et al. 2021b)

2.5 Procedure

Participants were introduced to the study goals, the task, and the experimental setup. After giving their informed consent, they were introduced to the video and AR tools, and an adaptation period was provided.

Participants acted as on-site technicians with condition C1 and then C2, always in this order (this was done given that different procedures were used to minimize bias, and as such participants would not be affected by learning effects), while the remote counterpart was ensured by a researcher in order to transmit the instructions correctly each time.

This approach was used to facilitate collaboration, given that having participants acting as the remote counterpart would cause an additional level of complexity, which was unnecessary for the intended goal of the study.

Given that this role was ensured by the same individual, a point can be made that it is not the same as having a participant, although we argue it still allows to have a granular view of the collaborative effort since not all collaborative sessions are similar to one another.

Furthermore, the approach described is aligned with the methods adopted by recent literature (Piumsomboon et al. 2018; Yoon et al. 2019; Teo et al. 2019a; Osmers et al. 2021; Marques et al. 2021f), of having participants acting as either the on-site or remote team-member, while an actor, i.e., a researcher was responsible for ensuring the other side of the collaborative process.

Thus, the researcher was responsible for following the same approach during all sections of the study, contributing to data variability associated with how the collaborative process occurred.

Participants started by filling a demographic questionnaire.

Then, the maintenance tasks were completed with both conditions.

Following, participants answered a post-study questionnaire associated with the collaborative process, and their preferences regarding the characteristics of the tools used.

Last, to understand participants’ opinion regarding collaboration when using each condition, a small interview was conducted.

2.6 Participants

In total, 37 participants were recruited (15 female - 40.5%), whose ages ranged from 20 to 63 years old (M = 32.9, SD = 10.6).

Participants had various professions, e.g., Faculty members, Researchers, as well as Master and PhD students from different fields. Also, Front-End Developers, Software Engineers, a Tourism Manager, a Housewife, a Public Administration Officer, a Physiotherapist, a Banker, as well as a Manufacturing Manager and an Assembly Line Operator.

From these, 18 participants had previous knowhow of AR and 33 from collaborative tools.

3 Results and discussion

Next, the results from the user study are described and discussed. These were obtained through a data analysis using SPSS and Statistica S/W. Exploratory, descriptive, inferential (non-parametric tests due to the ordinal data of the assessed dimensions and lack of normality (as verified by Shapiro-Wilk tests),

as well as existence of outliers of time data) and multivariate statistical techniques were used.

3.1 Task time

Figure 4 shows the box-plots for task duration (in minutes) with the two conditions, suggesting:

lack of normality (confirmed with the Shapiro-Wilk test, p-value < 0.01 for both samples) and also expressed by the Kurtosis values, respectively 1.61 for C1 and 0.03 for C2, showing a departure from the normal distribution.

Moreover, it suggests a difference concerning the central value of the task duration (C2<C1 and thus a one tailed, less than alternative hypothesis).

The Wilcoxon non-parametric test (matched samples) rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among conditions; in this case, a significant reduction of the task time from condition C1 (median = 15 min) to condition C2 (median = 12 min) (Table 1).

Fig. 4
figure 4

Task time for the two conditions (in minutes). C1 - video tool; C2 - AR tool

Table 1 Summary table for task time; Elements considered: Variable; Median; \(\bar{S}^{+}\) - Average rank (Di>0); \(\bar{S}^{-}\) - Average rank (Di<0); Z - Test statistic (when N>20); p-value; X - reject null hypothesis

Categorizing the task time by gender, there is no significant difference between male and female for both conditions. Due to the lack of normality and small dimension of the two independent samples, the non-parametric Mann–Whitney test was used instead.

It is import to highlight that fact, as it appears to go against previous trends of recent decades, suggesting that for educational, social, or cultural aspects, it made sense to categorize based on gender. We argue that, perhaps, for some situations, it no longer makes sense to consider this division, as the line between male and female participants appears to be vanishing, which we consider to be a quite positive aspect.

Likewise, categorizing the task time by experience with VR-AR, there is also no significant difference between the two groups for both conditions. Once again, due to the lack of normality and small dimension of the two independent samples, t-tests were not used, but the non-parametric Mann–Whitney test.

This suggest that having previous experience with VR or AR is not an asset for a faster completion.

This may be considered a benefit, since it encourages the idea that individuals with no prior experience are able to quickly use the tool to learn and apply new skills. In fact, when considering, for example, industrial scenarios, this aspect may represent that untrained workers can perform specialized task when assisted by a remote expert, which can have significant impact in lowering production downtime as well as reducing the need for travelling, cutting costs, and contributing to important sustainability factors according to the United Nations (UN) development goals.Footnote 1

3.2 Collaborative process overview

Figures 5 and 6 present an overview of participants’ assessment for all dimensions of collaboration considered while using both conditions, rated by the participants using a 7 point Likert-type scale (1-Low to 7-High).

Furthermore, Table 2 summarizes the main results of the data analysis. It shows that there were significant differences between the two conditions in all dimensions except social presence.

With respect to the effectiveness in express ideas properly (D1), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=5, sum=173) to condition C2 (median=6, sum=235).

As to the level of attentional allocation (D2), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=4, sum=158) to condition C2 (median=7, sum=239).

Regarding the effectiveness in perceived information understanding (D3), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=5, sum=172) to condition C2 (median=7, sum=238).

As for the level of mental effort (D4), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant decrease from condition C1 (median=5, sum=185) to condition C2 (median=2, sum=76).

Regarding the level of enjoyment (D5), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=4, sum=163) to condition C2 (median=6, sum=234).

Concerning easy to communicate (D6), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=5, sum=184) to condition C2 (median=6, sum=229).

In regards to the level of social presence (D7), the Wilcoxon test did not reject the null hypothesis - equality of medians (p-value = 0.254), indicating no significant difference between methods, from condition C1 (median=5, sum=188) to condition C2 (median=5, sum=176).

Finally, in the matter of the overall sense of collaboration (D8), the Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case a significant increase from condition C1 (median=5, sum=176) to condition C2 (median=6, sum=233).

Fig. 5
figure 5

Radar chart representing the collaborative process: D1- effectiveness in express ideas properly; D2- level of attentional allocation; D3- effectiveness in perceived information understanding; D4- level of mental effort; D5- level of enjoyment; D6- easy to communicate; D7- level of social presence; D8- sense of collaboration. C1: video tool; C2: AR tool. Data displayed using a Likert-type scale: 1- Low; 7- High

Fig. 6
figure 6

Overview of the collaborative process: D1- effectiveness in express ideas properly; D2- level of attentional allocation; D3- effectiveness in perceived information understanding; D4- level of mental effort; D5- level of enjoyment; D6- easy to communicate; D7- level of social presence; D8- sense of collaboration. C1: video tool; C2: AR tool. Data displayed using a Likert-type scale: 1- Low; 7- High

Table 2 Summary table for the dimensions considered: D1- effectiveness in express ideas properly; D2- level of attentional allocation; D3- effectiveness in perceived information understanding; D4- level of mental effort; D5- level of enjoyment; D6- easy to communicate; D7- level of social presence; D8- sense of Collaboration. C1 - video tool; C2 - AR tool; Elements considered: Variable; Median; Sum; \(\bar{S}^{+}\) - Average rank (Di>0); \(\bar{S}^{-}\) - Average rank (Di<0); Z - Test statistic (when N>20); p-value; X - reject null hypothesis; \(\checkmark\) - do not reject hypothesis

Thus, it can be observed that that condition C2 was rated higher by the participants for most of the aspects of collaboration considered: effectiveness in express ideas properly, level of attention allocation, effectiveness in perceived information understanding, level of enjoyment, and easy to communicate.

As for the level of mental effort, condition C1 was rated significantly higher, perhaps due to some communication arguing for understand where to perform some procedures, which may be related to the diminished level of attentional allocation of condition C1.

As such, the results obtained appear to suggest that the video tool was inferior when compared to the AR-based alternative.

Contrarily, the level of social presence was slightly higher for condition C1; however, the difference was not statistically significant.

Possibly, because this condition supported live video sharing between remote team-members, i.e., the collaborators representation was always present in the tool interface during the task duration, which may have an impact on participants feeling of togetherness with their collaborative counterparts.

Besides, condition C2 offered AR-based stabilized annotations on top of the captures obtained from the task context. Yet, the team-members representation was not available in the tool interface, which might have affected participants perception in relation to the level of social presence, despite not with any major difference as can be visible in Table 2.

Besides the analysis already presented, an additional effort was made to categorize the collaborative process by age to verify the effect of using technology, i.e., considering two samples <=30, >30 with 19 and 18 participants respectively. This value was selected given that it appeared as a relevant threshold for separating the technological literacy of the participants in two distinct generations. Results show there is no significant difference for both conditions, as shown by the non-parametric Mann-Whitney test.

Moreover, when categorizing the collaborative process by experience with VR-AR, there is also no significant difference between the two groups for both conditions when considering the non-parametric Mann–Whitney test.

Hence, having previous experience with VR or AR is not an asset for a better collaborative process. Probably for the same reasons as described in Sect. 3.1.

3.3 Participants’ preferences and opinions

As for participants’ satisfaction, since the data is on an ordinal scale, and each user performed the two conditions (matched sample), the equality of medians must be tested with a non-parametric test. The Wilcoxon test rejected the null hypothesis - equality of medians (p-value < 0.000), indicating differences among methods, in this case, a significant increase of the rating from condition C1 (median=5, Sum=174) to condition C2 (median=6, Sum=239).

This occurred despite condition C2 not having a representation of the remote counterpart. The interviews performed confirmed this fact. Condition C2 was preferred by the majority of participants, stating that it provided non-verbal instructions aligned with the task context, which contributed to participants understanding of where to perform given instructions, while also facilitating how they express themselves to request the expert guidance.

Regarding participants experience, 55 reaction cards (a methodology used by Microsoft to obtain an understanding on team members reaction towards the tool used during collaboration (Barnum 2010)) were selected by the participants to characterize condition C1, and 58 to characterize condition C2.

The following top 10 reaction cards represent the expressions most selected by the participants to characterize each condition (Fig. 7):

C1 - collaborative, time-consuming, accessible, helpful, familiar, simplistic, flexible, usable, expected and predictable;

C2 - helpful, appealing, empowering, collaborative, engaging, flexible, advanced, novel, customizable and easy-to-use.

Fig. 7
figure 7

Participants’ top 10 reaction cards regarding the conditions used: C1 - video tool; C2 - AR tool. Colors considered: green - positive meaning; gray - neutral meaning; Red (*) - negative meaning, according to (Barnum 2010). (Color figure online)

Although the reaction cards allowed to have a grasp of a participant’s response to the conditions used, it is not possible to clearly differentiate one condition that stands out or identify the magnitude of that emotion by this method alone.

Yet, analyzing participants emotional state provides a clearer perspective.

Regarding condition C1 (see Fig. 8 - top), participants emotional state before the study varied among joy (15 out of 37), surprise (4 out of 37), excitement (10 out of 37) and contempt (8 out of 37). Then, after the study, it varied among joy (10 out of 37), surprise (2 out of 37), excitement (1 out of 37), contempt (23 out of 37), and disgust (1 out of 37).

As for condition C2 (Fig. 8 - bottom), it varied among joy (17 out of 37), surprise (3 out of 37), excitement (7 out of 37), and contempt (10 out of 37). Then, after the study, it varied among joy (15 out of 37), surprise (9 out of 37), excitement (12 out of 37), and contempt (1 out of 37).

Fig. 8
figure 8

Participants’ emotional state for each condition, before (left bar) and after (right bar) the tasks: C1 - video tool (top); C2 - AR tool (bottom)

With all things considered, a decrease in the number of participants feeling joy, surprise, and excitement for condition C1 at the end of the study can be observed. In consequence, this caused a significant growing for the emotional state of contempt. There was also a case of disgust reported.

Contrarily, regarding condition C2, there was only 1 occurrence of contempt, while joy, excitement, and surprise were expressed by a higher number of participants. There were no cases of disgust reported.

Therefore, condition C2 scores higher values for emotions associated with positive connotation, e.g., surprise, joy, and excitement. As for condition C1, the only higher value is associated with contempt (neutral connotation).

Following, some comments by the participants are introduced to provide extra context to the statements aforementioned:

  • Level of social presence and easy to share ideas properly:

\(\blacksquare\):

At first, I missed the expert figure when using the AR tool, although as I moved along and focus on the task, I didn’t even notice its absence! The instructions on top of the real world and the voice were enough to complete the tasks with success;

\(\blacksquare\):

Given that my job makes me to use remote video solution in a daily basis, I was anticipating that the absence of video stream would affect collaboration with the remote counterpart. However, given that the AR tool further concentrated on the assignment itself, I was so involved, that not having the expert representation did not influence my performance;

\(\blacksquare\):

I have been working remotely for more than 5 years and never thought the struggle using video could be to explain a practical problem and its context to another person. It was very challenging to properly situate a given component, tool, or area of interest. Having the expert in the interface was not an advantage, since there was still much discussion involved. The AR mode allowed us to express and understand each other quicker and more precisely;

\(\blacksquare\):

I was able to interact more clearly and better comprehending where a given action needed to be performed when using the AR tool. I preferred this to the video alternative which I was always obliged to pay attention to ensure what to do. If I blinked for a second, we needed to recapitulate, while the AR instructions were always available, even if the expert portrayal was not;

  • Level of information understanding and attentional allocation:

\(\blacksquare\):

Being more familiar, the video tool was quicker to begin communicating with the expert. Yet, when I needed to express myself about the equipment components and the tools that I needed, that’s when the lack of support was evident to me. As a consequence, I started to repeat myself over and over in different manners, aiming to proper express the intend objectives in my mind. This also occurred with my counterpart;

\(\blacksquare\):

Having the instructions being presented near the equipment where I needed to intervene helped to easily conduct the maintenance procedures in contrast to the video alternative, since this last did not provided cues to help me remember what to do, or to confirm my actions, besides the text.

  • Easy to communicate and level of mental effort:

\(\blacksquare\):

I felt the video tool had an absence of supporting features, besides the use of voice, which makes me prefer the AR alternative, although it was a novelty to me and I had to adapt to it.

\(\blacksquare\):

I never perform a maintenance task, my area of expertise is tourism. All things considered, I enjoyed more the AR tool. Whenever I took too long to do a procedure, or forget the next instruction, I just looked to the device again or enhanced the boiler components with the correct information, while the expert described what to do..

3.4 Discussion

Taken together the results of the data analysis, it seems that attempting to increase social presence may not have uniformly positive results as hypothesized by many works reported in literature.

Given the reported results, we strongly believe that future studies should consider a more holistic view regarding the sense of social presence, i.e., consider different dimensions that impact the collaborative process to provide a broader view of the work effort, as has been described in this work, which otherwise would not been capable of presenting a more comprehensive view of the collaborative process for each condition.

Special attention should also be made on individual preferences and intended goals, as well as contextual characteristics, which may affect how different aspects of technology are accepted and used, thus influencing how collaboration occurs.

For example, if the characteristics of the team-members are considered, as well as the task type (e.g., management, advisory, negotiation, psycho-motor, defined problem, ill-defined problem (Marques et al. 2021e)), and its context, it may be possible to customize collaborative tools, i.e., leverage different levels of social presence, which may lead to larger acceptance and adoption.

Some team-members may feel more comfortable with lower levels of social presence, rather than when it is high (e.g., individuals with social anxiety or communication apprehension).

In fact, 13 out of 37 participants suggested having the remote expert representation available, not at all times, but as an optional feature, when the team-members consider it relevant to increase trust and situation understanding during task resolution.

As a matter of fact, although the video condition was rated as having a higher level of social presence, having video stream being shared may be heavy for existing networks, resulting in latency issues and communication shortage, significantly influencing participants reaction, e.g., attention allocation, information understanding, as well as the sense of social presence and perception during collaborative activities.

All in all, it appears that having the remote expert present in the user interface might not be helpful in all scenarios, leading to possible increased cognitive load and hindering the performance and work efficiency for some cases. Altogether, the absence of team members representation may not significantly affect task resolution when AR annotations and voice communication are used as an alternative, which can still stimulate collaborator’s sense of social presence.

Perhaps the way to go, is to enable collaborative tools to support a larger option of representation methods, and let team members select how they prefer to collaborate according to their individual profile and task needs, while, situations like parallel or asynchronous activities, where the counterpart acts more sporadically may be supported with less functionalities, given that for the most of the time, on-site team-members are mostly focused on the task and require less intervention/assistance.

4 Conclusions and future work

The work presented in this paper was motivated by distinct results reported in literature regarding the role of social presence in remote scenarios.

We explored how collaborators representation affected the level of social presence and other dimensions of collaboration deemed relevant, as well as task resolution during (synchronous) remote guidance, motivated by a collaboration with the industry sector.

Results obtained from the data analysis showed that although the video condition presented a higher level of social presence, the remaining aspects of collaboration considered (e.g., easy to share ideas properly and communicate, level of information understanding, attentional allocation, and mental effort) were rated higher for the AR-based condition.

In fact, the AR-based condition was preferred by the majority of participants, who empathized that having the remote expert visible at all times was not necessary to accomplish the task procedures.

Thus, it was suggested that the expert representation could only be available as an optional feature. This way, team-members could customize their collaborative experience according to their individual profile and tasks needs.

As such, we argue that social presence, alone, cannot be taken as a general predictor for good or bad collaboration, an important aspect to consider when designing future supporting tools for distributed activities.

Further work will be devoted to gaining more knowledge about the role of social presence for remote settings, considering the characteristics of the team-members (e.g., emotions, level of social anxiety, communication apprehension, etc.), as well as the task type and context in order to comprehend if there are patterns that can be identified to guide the research community.

Plus, we intend to support a third condition, i.e., the use of virtual avatars to represent team-members members.

In this vein, we plan to conduct a formal user study with target-users and domain experts, also contemplating the use and effect of AR-headsets.