Empirical study of Team Usability Testing: a laboratory experiment

The evaluation of groupware has a long history; several researchers have investigated this research area and made attempts to develop evaluation methods. This paper aims to make a contribution to this research topic by introducing a groupware evaluation method called Team Usability Testing. The goal of this method is to evaluate the usability of real-time distributed groupware. The Team Usability Test consists of a combination of questionnaires, on-screen behaviour recordings and interviews. The data analysis is based on the mechanics of collaboration theoretical framework and involves communication analysis, behaviour analysis, and analysis of post-experiment interviews. A laboratory experiment and a field study constitute the two main phases of the creation of the usability testing method. In this paper, the results of the laboratory experiment and its implications for the field study will be discussed. According to the results, the Team Usability Test is able to explore team-level usability problems and contextual problems. The paper ends with the discussion of future field research considerations related to the possible application of the Team Usability Test in real-life work settings.


Introduction
Although several commonly used single-user evaluation methods have been developed to explore usability problems of individual software, only a few methods have been devised to evaluate collaborative software.
The evaluation of collaborative software is challenging because beyond individual usability problems, collaborative software also has team-level usability problems. Team usability problems refer to collaboration-related usability problems, which can only occur in a collaborative situation, and which cannot be explored using individual usability evaluation techniques (single-user usability tests).
This paper aims to contribute to the research area of groupware evaluation by introducing our Team Usability Testing method through a description of its development process. As a qualitative method, Team Usability Testing aims to explore the team-level usability problems of realtime-distributed groupware.
Team usability problems are collaboration/teamworkrelated usability problems, which can only occur in a collaborative situation, and which cannot be explored by testing individuals (single-user usability tests).
"In the broadest sense, groupware refers to any computing technology that helps groups work better collaboratively over digital media" (Khoshafian andBuckiewicz 1995 in Yen et al. 1999). Another definition is that "groupware systems support collaborative work of users that share common objectives" (Salomón et al. 2019, pp. 11). The time/ space classification differentiates between groupware types based on time and space. In this categorization, groupware can be used in same time/same space, different time/same space, same time/different space, and different time/different space. According to the classification, the term 'realtime distributed groupware' means that people are working together at the same time, but from different places (Gutwin and Greenberg 2000, pp. 413). Typical examples are any kind of multiple user editor software, like online whiteboard software or online text editors. In this context collaborative activities are related to multiple users collaborating (e.g. edit and organize content together) in the same workspace.
In an early phase of the development, the paper presents the results of a laboratory experiment using Team Usability Testing to test a specific piece of real-time distributed groupware. First, the related literature on existing groupware evaluation methods and the theoretical background is presented. Then the creation process of the Team Usability Test is demonstrated, especially focusing on the laboratory experiment and its result. Last, future field research directions and the different considerations of applying this method in the field are discussed. Grudin (1988) summarized the difficulties of evaluating collaborative applications. These difficulties include the challenge of creating a group in a laboratory setting, the complexity of field evaluation and some of the specific problems of complex groupware, such as different user roles having different features. Evaluating collaborative software is also challenging because the methods used for single-user evaluation (e.g. observation, inspection methods, field study) can be difficult to adapt to groupware evaluation (Baker et al. 2001). In addition, as Grudin (1988) highlights, groupware evaluation methods require more time, money and effort than single-user methods.

Diverse approaches in groupware evaluation
Usability evaluation of individual software has a long tradition and it has become a natural part of software development in the last few decades (Buur and Bødker 2000;Nielsen 1994). Despite the challenges outlined above, many researchers have also tried to overcome the difficulties of groupware evaluation and investigated different aspects of this topic since the 1990s. The goal of these intensive research efforts was to make groupware evaluation become a part of software development process in the same way as individual software evaluation. Nevertheless, groupware evaluation is still not an everyday practice in the software development industry.
Understanding the collaborative practices of people in real-life working environments and how they (could) use groupware in the context of work is the background of groupware evaluation. The research field has various publications from the investigation of the role of space in collaboration (Spinelli et al. 2005) and common information spaces in an airport (Fields et al. 2005) through the relationship between organizational culture and the use of technology in different user groups (Chisalita et al. 2005) and the use of social media in their collaborative design tasks (Cho and Cho 2019) to collaborative design practices of product design studios (Vyas et al. 2013), Yet, the goal of all of the abovementioned studies is the same: to define the real need of users in their working context and how technology could support their work.
The focus of studies into groupware evaluation varies on a broad scale. Some studies focus on developing groupware and evaluation methods for a specific organization and investigate the effect of collaborative software on groupwork (Christensen and Ellingsen 2014;Pargman 2003;Pinelle et al. 2003;Silsand et al. 2012;Tang et al. 1994;Van der Veer and Weile 2000). Other research focuses on how specific features, particularly awareness supporting features, influence usability (Gutwin et al. 1996;Greenberg 1998, 1999;Lopez and Guerrero 2017;Ignat et al. 2015;Passos et al. 2017;Romero-Salcedo et al. 2004). Pinelle (2000) reviews and compares groupware evaluation methods in detail, based on several factors, such as the type of groupware and the methods used in the evaluation and placement of evaluation in the software evaluation process. In his definition, groupware evaluation "implies that the researchers attempted to gather information about the software by measuring its use by a group of people, whether they were end users or paid participants in a lab study" (Pinelle 2000, p. 23). Steves et al. (2001) compared usability evaluation methods (user-based study, the evaluators are users of the software) and inspection (the evaluators are usability experts) methods in the context of collaborative software. They claim that the two types of methods are complementary, but not interchangeable. Inspection methods can be used to explore the main usability problems in earlier phases of software development, while user-based studies are able to explore contextual problems related to real-life working usage.

Disambiguation: evaluation methods with somewhat confusing names
Before further discussing the different types of groupware evaluation methods in detail, it is worth introducing two more methods: group usability testing by (Chen et al. 2013) and team usability testing, developed by Hackman and Biers (1992). These methods are outside the scope of this paper, because they test individual software, instead of groupware, so their names are somewhat confusing. We would like to emphasize that despite their names, these two methods are not groupware evaluation methods because they test individual software. In a group usability testing situation, several individuals are tested at the same time in the same room by several researchers (Chen et al. 2013). In a team usability testing situation, two people work together as a team, but 1 3 they have different roles: only one uses the computer and the other is "just" an advisor. (Hackman and Biers 1992).

Expert-based inspection methods of groupware
In this section, a few groupware inspection methods will be presented briefly: all of them focus on collaborative software and evaluate it on an organizational level. These are inspection methods since usability experts perform the evaluation; therefore, no real software users are involved in them. The main goal of this section is to demonstrate the structure and the complexity of groupware inspection methods. The first example is called DUTCH (Designing for Users and Tasks from Concepts to Handles). It is a complex method, focusing on exploring the current task-based work model of the organization; after uncovering the problems with the current model and the needs of the employees, it constructs a new task model. It is a lengthy process, and is thus not a discount usability evaluation method, it is more like an investment for the company (Van der Veer and Weile 2000).
FrUtEG is a conceptual framework for utility evaluation of groupware. Concept of use and social interactions are investigated by experts (Frías et al. 2019).
Mobile collaboration modelling (MCM) allows experts to model interaction scenarios, visualizing the roles of the users and the interaction between them. With the help of the model, developers can easily define user requirements related to collaborative functions (Herskovic et al. 2019).
The last one is called CUA (collaboration usability analysis). Its authors state that "CUA's main contribution is to provide evaluators with a framework in which they can simulate the realistic use of a groupware system and identify usability problems that are caused by the groupware interface" (Pinelle et al. 2003). This framework includes the description of the mechanics of collaboration which consist of the basic actions which users do to effectively work together in a shared workspace. The mechanics of collaboration will be described in detail later, because it will be the theoretical framework of data analysis in the new Team Usability Testing method.

Groupware evaluation methods based on user participation
In empirical, user-based groupware evaluation, real users of the software participate in the evaluation process. Most of these studies are longitudinal (2-3 month) studies, employing a variety of methods: the observation of software usage, the application of usage scenarios, log file analysis and interviewing end-users. We picked the next few examples to demonstrate the variety of these methods.
Observational studies of small groups using collaborative software started in the early 90s. Several of these studies focused on the shared drawing activity of groups (Bly 1988;Tang 1991;Greenberg et al. 1992).
Usage scenarios were used for software evaluation in the work of Haynes et al. (2004). First, typical everyday usage scenarios of the collaborative software were defined. Next, users were interviewed about their experience related to these scenarios. The results of the interviews were summarized as feature enhancements related to scenarios. Yamauchi et al. (2012) conducted a 2-month-long experimental study with four groups using a video-conferencing system. They examined how different psychological concepts influence participants' usability judgements. The authors argue that various psychological concepts can influence how participants interpret usability problems. Gumienny et al. (2013) conducted a 3-month-long study investigating collaborative whiteboard software used by a globally distributed team. The authors analysed system log files (usage statistics) and user interviews investigating how this collaborative system supported teamwork. Marlow et al. (2016) investigated collaborative online distributed meeting habits. After collecting survey and interview data about general meeting habits, they observed three distributed team meetings. This allowed them to identify three types of meetings and to then summarize the sharing and archiving needs of these different meeting types and their design implications.
In the study of Svensson et al. (2019), only interview data of different working roles in an airport were used to recommend design implications.

The novelty of the Team Usability Test in comparison of existing methods
The previous sections reviewed expert-based inspection methods of groupware and evaluation methods based on user participation. The new Team Usability Testing method presented in this paper differs from these methods in that it is a groupware evaluation method, involving real users and exploring usability problems of collaborative software on a team level. It states that beyond individual usability problems, collaborative software also has team-level usability problems. Team-level usability problems cannot be explored using individual (single user) usability evaluation techniques, because they only occur in collaborative situations. This method explores usability problems of real-time distributed groupware involving real users of the software, working together in real time, on the same task but from different places. The Team Usability Test consists of a combination of questionnaires, on-screen behaviour recordings and interviews. The data analysis is based on a theoretical framework of the mechanics of collaboration and involves communication analysis, behaviour analysis and analysis of post-experiment interviews. Table 1 demonstrates how Team Usability Test is similar and how it is different to existing methods.

The mechanics of collaboration
The coding scheme which was selected for the method described in this paper is the mechanics of collaboration (MoC) theory of Pinelle et al. (2003). The authors developed it to offer a general framework for groupware evaluation (detailed in Sect. 2.3). Other popular data analysis techniques in HCI research techniques include open coding, thematic analysis or applying grounded theory (Strauss and Corbin 1994). We chose the mechanics of collaboration framework because we expect it to provide a comprehensive framework for categorizing data related to collaborative software usage. According to the mechanics of collaboration (MoC) theory, there are several basic collaborative actions that users should be able to perform in a shared workspace while collaborating (Pinelle et al. 2003).
According to Pinelle, Gutwin and Greenberg, in many cases, the reason for the poor usability of groupware is that basic collaborative actions are not supported properly. "Some usability problems in groupware systems are not inherently tied to the social context in which the system is used, but rather are a result of poor support for the basic activities of collaborative work in shared spaces." (Gutwin and Greenberg 2000, p. 103). Table 2 summarizes these actions, which involve communication (explicit communication and information gathering) and coordination acts (shared access and transfer). To enhance the usability of groupware, the mechanics of Any phase On use of released software Working prototypes and released software Table 2 The base of our data analysis: the mechanics of collaboration framework (Pinelle et al. 2003, p. 288) In this context, basic awareness is not a neurological term, it simply means observing who is in the workspace, what are they doing and where are they working

The mechanics of collaboration framework used in groupware evaluations
The mechanics of collaboration is a comprehensive framework which defines the basic collaborative acts which users need to perform to effectively work in a shared workspace. The authors suggest using it as a toolkit (by selecting the most relevant and useful mechanics) because the importance of each mechanic depends on the goal of the collaborative software. Gutwin and Greenberg (2000) suggest adapting the mechanics of collaboration framework for several usability evaluation methods, such as heuristic evaluation, cognitive walkthrough and observation (Gutwin and Greenberg 2000). Later, the authors adapted the framework and integrated it into several evaluation techniques.
In 2001, Baker, Greenberg and Gutwin created groupware evaluation heuristics based on the mechanics of collaboration framework. These heuristics were a combination of the mechanics of collaboration and Nielsen's heuristics. These heuristics constitute the basis of heuristic groupware evaluation, a discount evaluation method for shared visual workspaces. In 2002, the same authors tested these heuristics in an empirical study, with 27 inspectors evaluating groupware. They found that even novice evaluators could use it successfully (Baker et al. 2002). Pinelle and Gutwin (2008) successfully adapted the mechanics of collaboration framework in a cognitive walkthrough usability testing scenario for tabletop groupware applications. Twelve participants, both usability experts and novices, took part in the experiment, where they needed to evaluate low-fidelity paper prototypes of tabletop groupware applications using two methods. The researchers compared an informal expert review with T-CUA (Tabletop Collaboration Usability Analysis method based on the mechanics of collaboration framework). One of their main findings was that participants explored more team related usability problems with the help of the T-CUA (Pinelle and Gutwin 2008) than in an informal review. Dew et al. (2015) conducted individual usability tests, employing target users to evaluate an early prototype of a collaborative healthcare software. The mechanics of collaboration framework was used to categorize usability issues and report findings of collaborative system evaluations. Gutwin and Greenberg (2000) also suggest adapting the mechanics of collaboration framework to usability test situations (testing collaborative software). Although the authors suggested using this framework in usability test situations in 2000, since then only one study (Dew et al. 2015) has adapted the framework successfully. Thus, more research is needed to confirm the successful adaptation of the framework in usability test situations. Although the framework is not widespread, it was sufficiently developed to be able to use it as the theoretical base of our research. Therefore, in our study, we will adapt the framework to a usability test situation, and it serves as the theoretical framework of data analysis in our Team Usability Testing method.

Method: laboratory experiment
Our Team Usability Test is a usability evaluation method for real-time distributed groupware which aims to explore team-level usability problems. It involves real users of the software working together in real-time on the same task and from different places. It consists of a combination of questionnaires, on-screen behaviour recordings and postexperiment interviews.
When creating a usability evaluation method, finding a balance between ecological validity and experimental control is a serious dilemma (Kjeldskov and Skov 2014). With this in mind, the method will be applied to two different settings: laboratory experiment and field study. First, we performed a laboratory experiment, with a high degree of experimental control. Subsequently, a field study will be executed in real-life working settings, with a corresponding increase in ecological validity. In this section, the laboratory experiment will be discussed in detail.

Participants
Two pilot teams helped to test and overcome the technical difficulties of the scenario before eight teams participated in the full-scale laboratory experiment. The first pilot helped to clarify the task instructions given to collaborators. The second pilot helped to explore technical difficulties related to the video recording software used in the study, which did not occur in the first pilot.
Each team consisted of three collaborator members and one observer member. Collaborator participants worked together as a three-member team, while observer participants observed the team's collaboration. Teams were formed randomly, but from the same university group. Students could apply to different time slots and the students who applied to the same time slot were formed into a team. Table 3 summarizes the different characteristics of the different teams. Since the observer participants were not 1 3 included in the collaborative activities, only the characteristics of collaborators are presented. Also, one team was excluded from the data analysis for reasons which we will discuss later. Table 3 presents the data from the seven teams observed. The collaborators were 21-28 years old (mean = 23.57 year) university students, who knew each other. Most of them described themselves as team players. (On a 1-7 scale [1-prefer individual work; 7-prefer team work] the mean of the different teams was 4.61, ranging from 3.67 to 5). Nine out of the twenty-one collaborators had previous experience with the collaborative software under test, and there was at least one participant in every team who had used it before. Figure 1 shows the steps of the laboratory experiment from the beginning to the end. First, when participants arrived, the researcher instructed them verbally about the details of the 90-min lab session. When all the four participants were present, they were randomly assigned the role of collaborator or observer by drawing role-cards from an envelope. Each envelope contained three collaborator role cards and one observer role card. If only three participants attended, everyone got the collaborator role automatically.

Scenario
Participants then watched a 3-min tutorial video about the prezi software to familiarize them with it if necessary. prezi is a "zoomable, canvas-based editor" in which users AŌer-task quesƟonnaire Post-experiment group interview Fig. 1 The steps of the laboratory experiment 1 3 can create creative presentations on an online platform. Furthermore, besides individual work, it allows and supports real-time distributed editing, so it is also a real-time distributed groupware (Laufer et al. 2011). It means that multiple users can work at the same workspace and edit the same presentation real time. The collaborative activities prezi can support are related to document editing: users can edit the same presentation real-time, while seeing what the other users are doing (with the help of avatars). Avatars are important-among others-in collaboration support because they help to avoid participants to edit the same part of the workspace. Besides the support for real-time collaboration, we chose prezi for our lab experiment because no special skills are needed to use it, even for first time users. After watching the tutorial, the researcher led the three collaborator participants into three different rooms. Each of the three rooms contained a laptop with a mouse, and printed task instructions (covered in the beginning). The observer participants stayed with the researcher in a fourth room.
The participants next responded to the before-task questionnaires (see in Appendix), then the researcher called the collaborators via voice-call software (Skype) and asked them to turn the instruction paper over and read the task. The collaborators were given a chance to ask questions. The collaborators began the task by clicking on the collaborative editing surface (they were already logged in to the workspace, so they just needed to click on an already opened tab in the web browser) and started the collaboration. The task lasted 30 min during which the three collaborators kept in touch via voice communication. The observer and the researcher observed the collaboration in the fourth room, while making notes. The observer was given verbal instruction to note anything which was interesting related to the collaborative software usage.
After the collaborators had completed the task, they were required to respond to an after-task questionnaire, followed by a group interview (see in Appendix) aiming to explore their collaborative experience. Altogether, the whole laboratory experiment usually took 90 min.

Task
The three collaborator participants' task was to create a prezi presentation together in 30 min. Their task was to organize a company social event and create a draft plan of their ideas (a sort of idea map). The details of the task were printed in front of them, so they did not need to switch between browser windows.
The motivation behind choosing a social event organizing task was that it is a relatively simple and specific task, which does not require any special skills at this scale. A simple task was necessary because the whole research session was intended to take under 90 min. The observer's task was to make notes on how the other three collaborators completed the task.

Data
Questionnaires, video recordings of the on-screen behaviour of participants and post-experiment group interviews were recorded. The before-task questionnaire concentrated on demographic data, software usage experience, teamwork and the experience of the participants in organizing social events, while the after-task questionnaire contained questions related to the collaboration, what factors helped or hindered the collaboration, and how the participants were satisfied with the different aspects of their solution. The video recordings of the on-screen behaviour comprise four videos: three individual videos recorded on the participants' laptops and one team video (where every participant is present at every time interval) recorded on the researcher's laptop. The team video was necessary because of the zooming function of the collaborative software. The post-experiment group interviews were made to explore the collaboration and software usage experience of the participants.

Analysis
Of the eight experimental teams, data of seven teams were analysed. One team was excluded from the analysis because their group dynamics prevented successful collaboration. The participants became locked in a heated debate about the design of the presentation, which resulted in a lack of collaboration on the task. This team were not able to participate in the usability test situation, so they did not perform any collaborative activities to analyse. As this did not occur in the other teams, we decided to exclude this team from the data analysis. We think it is natural and inevitable that not everyone can be involved in a usability test situation.
The voice communication was transcribed from the onscreen behaviour videos of the participants. The interviews were also transcribed.
The mentioned mechanics of collaboration was the base of our analysis: we used the different mechanics of the framework as separate codes.
Communication analysis and analysis of post-experiment interviews were performed. We analysed the communication transcripts based on the mechanics of collaboration framework: first, we defined if an utterance is related to usability or not, then we categorized the utterances by the mechanics. Tables 4 and 5 demonstrate examples of this analysis. We performed the same process with the interview transcripts and the questionnaire data. Besides, we also used the on-screen behaviour data to understand ambiguous situations which could not be understood only from the communication transcript. We did not include the observer's notes in the analysis because of its various quality.
It is important to note that only a part of the mechanics of collaboration could be interpreted because of the characteristics of the experiment and the software. The mechanics of collaboration is a general framework for any type of collaborative software, and some mechanics can only be used when participants work in the same place or work in different times.

Results of the lab experiment
This section first presents the factors influencing collaboration: team usability problems and contextual factors. Then each of the different factors, their connection to the mechanics of collaboration and their implications on collaboration in different teams will be discussed.
We would like to emphasize that the focus of this paper is to demonstrate the development of the Team Usability Testing method by presenting the results of a laboratory experiment. Although it is inevitable to present the usability findings of this specific piece of real-time distributed groupware, its aim is to illustrate the method. We also emphasize that our subjective experience is that the problems which were explored related to this specific software are typical problems of other collaborative software. Our paper's contribution is to examine how the Team Usability Testing method could be implemented in a laboratory experiment.
Related to the mechanics of collaboration framework, Table 6 demonstrates the factors influencing collaboration. We categorized team usability problems based on the mechanics of collaboration framework. However, we found important contextual factors which influence collaboration, too. Team usability problems include awareness, avatars, synchronization, saving and verbal communication. Contextual factors include prezi expertise, task division and team mood. Table 6 also presents which collaboration mechanic is related to the different factors.  "Ok, please speak a little slower, then it will be all right" Yes Spoken messages P2 "I lean closer (to the microphone) or I don't know" Yes Spoken messages P3 "All right" Yes Spoken messages

Team usability problems identified by the Team Usability Test method
Team usability problems are collaboration/teamwork-related usability problems, which can only occur in a collaborative situation, and which cannot be explored by testing individuals (single-user usability tests). Team usability problems caused flaws in collaboration. Based on the mechanics of collaboration framework, we identified usability problems related to information gathering and explicit communication.

Team usability problems related to the mechanic "basic awareness"
Basic awareness was crucial in relation to collaboration. Basic awareness refers to "observing who is in the workspace, what are they doing, and where are they working" (Gutwin and Greenberg 2002, p. 288). Most teams said that avatars, synchronization, and verbal communication enhanced awareness, while problems with saving and slow synchronization decreased it. The emphasis participants put on the role of awareness in collaboration confirm that it plays a crucial role in real-time distributed groupware. If "basic awareness" features work and are supported then they will enhance the success of the collaboration, while if there are awareness problems, the collaboration will be seriously negatively affected.

Avatars
Participants were represented as avatars in the workspace (Fig. 2). The avatars were important elements of the workspace: they informed the participants about where the others were located. Team 2 highlighted that the different colours of the avatars also helped to iden-tify and distinguish the others. In some cases, participants mentioned that avatars obstructed part of the view of workspace, which was distracting. In other cases, participants said that they would have needed more information related to avatars, such as whether the other participant was just watching a frame of text or editing it.
"It helped that we saw them* (*the avatars) the whole time. But it was disturbing for me, that the icon was relatively big and when it was in front of the text, I couldn't see the text from it. But it was good, that I could see it, because it wasn't necessary for the others to tell (…) instead I could see where they were (…) and not only a cursor flashing but it had a figure." (Interview quote-Team 1). "P2: I see, that something green with a smile is jumping (in the workspace). Is that you (P1)? P1: Yes." (Communication transcript-Team 4).

Synchronization
Synchronization refers to how fast the synchronization of the common workspace was. One team mentioned it as a supporting factor of collaboration, because synchronization was fast; however, two teams mentioned that it was really slow, and made collaboration difficult. In one case, when the synchronization was slow, participants were forced to rely on verbal communication, because they did not trust the latest changes in the workspace. This compensatory behaviour of constant verbal shadowing was a huge effort. In the other case, a team complained that the most disturbing was that the timing of the synchronization was unpredictable.
Synchronization is important because in a collaborative editing situation, it is crucial that everyone should see the same state of the common workspace. As can be seen in the examples any issues related to synchronization obstruct collaboration and demand extra effort from the participants.
"The functions worked pretty well, prezi synchronized the changes made by the teammates quickly." (Questionnaire quote-Team 4).

Saving
There were also problems with saving work. In one team, participants did not know whether or not they had to save their work to allow others to see it. Later, they found out that it is not necessary. In another team, when a writing problem occurred, a participant did not refresh the collaborative workspace, because s/he thought that the others' work would be lost. Clear communication of the operation of saving feature is important because it increases participants' trust in the workspace and saves time during the collaboration.
"Yes, I used prezi twice before, but individually. So I didn't know whether the other participant see the changes or not. Should I save the changes or not?" (Interview quote-Team 1). "P3: I think it (the workspace) isn't synchronizing for me. Never mind." (Communication transcript quote-Team 5).

Team usability problems related to the mechanic "spoken messages"
In the mechanics of collaboration framework, spoken messages are "intentional and planned verbal communication" (Gutwin and Greenberg 2002, p. 288). Five out of seven teams stated that verbal communication helped collaboration the most. Since verbal communication has a crucial role when using real-time distributed groupware, this result confirms its significance. Because of this, when there were problems with verbal communication, they severely affected the collaboration. In Teams 3 and 7, bad sound quality made collaboration difficult, while in Team 5 the problem lay with a function of the voice-call software which muted the participants who are not currently speaking, to make them concentrate more fully on the participant who is speaking.
In terms of the mechanics of collaboration framework, spoken messages can be connected to this result. The possibility of verbal communication and the proper support by the groupware is essential, because it helps collaboration and makes teamwork effective.
"It was difficult because we weren't at the same place … and we need to verbalize things that we wouldn't verbalize in other conditions, and sometimes it made (the task solution) difficult." (Interview quote-Team 5).
"I honestly confess, I didn't really hear what is going on." (Communication transcript quote-Team 3).

Contextual factors
Contextual factors are factors which are not directly related to the groupware but which nevertheless, seriously influence the success of the collaboration. The contextual factors are not related to the mechanics of collaboration framework. We identified three factors: prezi expertise, task division and team mood.

prezi expertise
Six out of the seven teams cited lack of experience with using prezi as a hindrance to collaboration. The teams cope with this situation differently: some overcame it quickly, while others continued to be hindered by it, but were still able to collaborate successfully. Lack of familiarity with the software had the most detrimental effects in Team 5. In this group, everyone mentioned (in the questionnaires and the interview) that one participant's lack of prezi knowledge seriously hampered the completion of the task. Team 7 handled this situation differently: participants mentioned that using prezi was an easy and creative process for them, and they felt really competent because in spite of their lack of prezi knowledge they succeeded in completing the task.
"What obstructed collaboration (if there were such factors)?". "My two teammates used prezi for the first time, and I am not that experienced either." (Questionnaire quote-Team 1).

Team mood
Teams 1, 2, and 4 described the collaboration as a great experience and reported that they enjoyed working together. Although they experienced usability problems ("annoying little things" as Team 2 called them), they easily overcome them together. Team 4 also highlighted that the whole collaboration was a creative and fun process, which they really enjoyed. In this case, one of the participants could not write in the shared workspace. While they tried to address this problem, they did not consider it to be a significant problem and they "accepted" the situation and assigned that participant other tasks during the collaboration.
On the other hand, Team 5 participants mentioned that they enjoyed working together, even though they could not overcome the usability problems and highlighted that one participant's lack of prezi knowledge made the collaboration difficult. Team 1 and Team 5 mentioned that their collaboration process needed some time to "warm up". While Team 1 were eventually able to "warm up", Team 5 did not.
Our quantitative questionnaire data, demonstrated in Table 7, underlines this result. We performed a Kruskal-Wallis test to check if there is a significant difference in how the teams evaluated their teamwork, but we found no significant difference. The Kruskal-Wallis H test showed that there was no statistically significant difference in how teams rated their collaboration, χ 2 (2) = 6.55, p = 0.364.
Team mood had a significant effect on how teams overcame usability problems or experienced breakdowns. This idea will be discussed in more detail in the next section of the article.

Task division
Six out of the seven teams emphasized that explicit task division (dividing the different parts of the task between team members) helped the collaboration. Since the time limit was 30 min to complete the task, effective task allocation was an important part of the collaborative process. The teams did not receive any assistance with task division. The possibilities of task division support by collaborative software will be discussed later in this paper.
"What facilitated collaboration (if there were such factors)?". "Task division, everyone could edit different parts at the same time…" (Questionnaire quote-Team 3).
This section presented the factors influencing collaboration: team usability problems and contextual factors. We demonstrated how the mechanics of collaboration are connected to team usability problems. The most relevant result is that the contextual factors have a serious effect on how easily teams are able to overcome usability problems.

Results related to the development of Team Usability Testing method
Team Usability Testing explores usability problems of realtime distributed groupware and includes a combination of questionnaires, on-screen behaviour recordings and interviews. The data analysis is based on the mechanics of collaboration theoretical framework and involves communication analysis, behaviour analysis and analysis of post-experiment interviews.

Feasibility of the method
The results of the laboratory experiment confirm that Team Usability Test is feasible in a common university laboratory setting and permits exploration of team-level usability problems. The mechanics of collaboration framework was successfully used to analyse data. In relation to this specific task and software only a part of the MOC could be interpreted. The MOC framework can be considered a theoretical set of basic collaborative actions, from which different parts can be applied in different settings.
Organizing the usability test was challenging since three participants needed to be present to conduct the laboratory experiment. Therefore, we invited 4 participants for every test and in cases when every participant attended one of them was assigned the role of observer. We found it useful to invite four participants, since in two out of seven cases, despite several reminder emails, only three participants attended.

Quality of results explored by the method
Analysing questionnaires, voice communication and postexperiment interviews afforded a comprehensive picture of the collaboration of participants. This type of data helped us to explore team-level usability problems and other factors influencing the collaboration. The added value of on-screen behaviour analysis to the final results was low.
Since it was the most time-consuming part of the data analysis, in the future, we plan to use it differently. Instead of analysing the whole dataset, we will analyse behaviour only when a situation could not be understood from voice communication alone. However, recording the on-screen behaviour of each participant can still be considered important because it allows analysis of the problems from different perspectives of the participants when necessary.
In conclusion, based on the results of the lab experiment, Team Usability Testing is a feasible method and can be used to explore team usability problems related to the mechanics of collaboration framework.

Discussion and conclusions
The limited number of empirical groupware evaluation methods mentioned in Sect. 1 has been remedied-summarizing our experiences-by the Team Usability Testing presented here. The goal of Team Usability Testing is to explore usability problems of real-time distributed groupware. We demonstrated that our Team Usability Testing method (consisting of questionnaires, on-screen behaviour recordings and post-experiment group interviews) was suitable for exploring team-level usability problems and factors influencing collaboration.
Based on the findings of this study, a relationship can be discerned between team usability problems and contextual factors. Contextual factors have a strong effect on how teams overcome and handle team usability problems and on their evaluation of the collaboration. Contextual factors are also important in relation to the success of the task solution.
As expected, although team-level usability problems occurred in all teams, teams handled them in significantly different ways, depending on the contextual factors. In some teams, when team usability problems occurred, they did not particularly bother the participants and were quickly solved or ignored, without diverting much effort or attention from the collaboration. In other teams, the usability problems caused breakdowns in communication and, regardless of their success in solving the problems, the team could not overcome them satisfactorily.
Usability issues proved to be problematic and caused breakdowns in collaboration in some teams, while in others they had only a minor impact. As a consequence, teams which experienced breakdowns complained frequently about the different features related to the groupware, while other teams were calmer despite experiencing similar problems, solving them differently without complaining as much about the features. For example, situations where a team member was unable to write arose in several teams but the teams' reactions varied widely. One team spent some time trying to solve the problem, then moved on. Another team were unperturbed and did not consider it to be an insurmountable problem; they accepted it and gave different tasks to the non-writing participant. However, in another case, similar issues gave rise to a great deal of frustration as the other two writing participants felt that the non-writing participant had slowed down the completion of the task.
An important finding is that besides the overall usability of the groupware, contextual factors also determine how a team will handle usability problems. This suggests that different features supporting the contextual factors can help teams to overcome usability problems easily and prevent breakdowns. This idea needs to be developed further, because its implications may be important in groupware development.
Summarizing the findings of the Team Usability Test, collaboration was influenced by team usability problems and contextual factors. Team usability problems were connected and examined using the mechanics of collaboration framework. However, because of the type of groupware and the type and time interval of the task, only two mechanics ("basic awareness" and spoken messages) could be connected to the results. We think it is because of the characteristics of this specific task and groupware, and the short time limit; therefore, we are planning to use this framework for further research.
In summary, considering the limitations described below, we regard Team Usability Test as a feasible and working method, which is able to explore team-level usability problems of real-time distributed groupware.

Limitations of the Team Usability Test
There appear to be two main limitations related to the Team Usability Testing method, which need to be discussed.
First, according to Schmidt and Bannon (2013), research focusing on real-time distributed groupware, with small groups collaborating is limited in a way, because it does not lead to a full understanding of the interdependencies in real-life work settings. While we agree with this statement, we think that usability tests of groupware can be a first step towards achieving a full understanding of the complex reallife teamwork settings of groupware.
Second, while we agree that usability is important, it is not the only metric of the goodness of a piece of software. Kjeldskov and Skov (2014) recommend a broadening of focus from usability evaluations to evaluations of complex digital ecosystems. This is because focusing explicitly on usability may lead to a limited understanding of interaction design. We agree that broadening the focus is important, but suggest that this is a further step. First, routinely applicable groupware evaluation methods are needed. Our Team Usability Testing method is intended to make a contribution to these methods.

Validity and reliability
Investigating the validity and the reliability of Team Usability Testing is a next step of the research.
In the future, it is important to investigate the validity of Team Usability Testing by comparing it to other methods. Our idea is to compare our method to a heuristic evaluation for groupware. As detailed earlier, Baker et al. (2001) developed groupware evaluation heuristics based on the mechanics of collaboration framework. Since both of the methods rely highly on the same mechanics of collaboration framework, it would be meaningful to compare them. This way the validity of Team Usability Testing could be examined.
The reliability of Team Usability Testing is also highly important. Interrater reliability should be measured, which means that at least two other researchers should code the data to test how reliable the coding system is (Fletcher et al. 2011).

Field study
After the data analysis of the laboratory experiment, the next phase of research is to examine how the Team Usability Testing method works in the field. This would involve performing field studies in software companies developing real-time distributed groupware. This would provide insights into the ways that Team Usability Testing can be used in a software development process.
A field study could be expected to provide a better understanding of the usability needs and problems of teams while collaborating and it would explore the effects and costs of using this method in a company. A field study, then, would allow us to examine the strengths and the weaknesses of the method and also its cost-effectiveness.
Specifying the exact goals before embarking on fieldwork is one of the most important keys to an effective field study. Our goal is "going to the last research mile": a solution not only needs to be developed to an unsolved problem, but it subsequently needs to be tested in a real working environment, evaluated and redesigned if necessary. "The last research mile ends only when practitioners use a solution routinely in the field" (Nunamaker et al. 2015). In our case, this will involve investigating the everyday difficulties of using Team Usability Testing, addressing these problems and redesigning the method if necessary.
Ultimately, our hope is that Team Usability Testing will offer a working solution for evaluating real-time distributed groupware in the everyday working practice of companies.
• What facilitated collaboration (if there were such factors)?
(B) Technical question • Did you use the mouse or the laptop's touchpad for your work? (C) Experience • How much experience do you have organizing task like this?
Post-experiment group interview questions