1 Introduction

A shared understanding of the project vision is paramount to the success of software projects, as its absence can lead to conflicting requirements [1]. Achieving this shared understanding is one of the key challenges in requirements engineering [2]. For this purpose, stakeholders must disclose, discuss, and align their mental models of the intended system to achieve a shared understanding [3]. However, stakeholders are often spread across different locations and time zones [4]. In this case, primarily asynchronous communication occurs, as stakeholders can hardly meet for synchronous in-person or even virtual meetings [5]. One way to achieve a shared understanding in asynchronous communication contexts is to distribute a written specification using standards like ISO/IEC/IEEE 29148:2018 [6]. Nevertheless, reading a written specification can be time-consuming due to its low communication richness and effectiveness [7]. For project visions specifically, a richer and more effective way for achieving a shared understanding is the use of so-called vision videos [8].

Vision videos support the development of a shared understanding, as they provide visual reference points to stimulate active discussions among stakeholders to align their mental models [9]. They are primarily used to support the elicitation, documentation and validation of requirements [8]. Nagel et al. [10] have successfully used vision videos to find misaligned mental models in asynchronous settings. However, simply watching a vision video without the opportunity to discuss its contents complicates the resolving of misalignments [10]. For this reason, stakeholders need suitable support for their discussions to achieve a shared understanding in asynchronous communication contexts.

The goal of this paper is to develop suitable concepts to support stakeholders in achieving a shared understanding in asynchronous communication contexts. This way, we create an opportunity for the achievement of a shared understanding for stakeholder groups whose circumstances force them to communicate asynchronously.

To reach this goal, we propose five concepts that are designed to solve issues of asynchronous communication extracted from literature. We combine these concepts with vision videos to investigate whether they support stakeholders in achieving a shared understanding. Additionally, we assess the suitability of six categories of asynchronous communication tools including multimedia platforms and messaging services to this support. In a workshop we decided on three representatives of these categories based on their adaptability to our concepts. We also developed a prototype that implements all five concepts to their fullest extent. In an experiment with 30 participants, we evaluated the four software tools and establish a baseline. Our results show evidence for the suitability of our concepts. All software tools support the achievement of a shared understanding. In particular, participants supported by our adaptation of the messaging service Discord and our prototype presented a statistically significantly higher level of shared understanding compared to the control group.

The presented paper is an extended version of an earlier work by Nagel et al. [11] which in turn is based on a master’s thesis by Amiri [12]. In this work, we extend their efforts by giving a thorough description of the challenges of asynchronous communication contexts that we extracted from literature. We motivate our concepts with these challenges. Additionally, we provide more detail on the selection process for existing tools, including an overview of categories of online communication tools, a closer description of the conducted workshop and finally the advantages and disadvantages which lead to the final selections. Lastly, we include our findings on the social presence experienced by the participants of our experiment in this extension and visualize key elements of our work with further illustrations.

This paper is structured as follows: Sect. 2 discusses related work. We present our research methodology in Sect. 3 and describe our new approach to the support of stakeholders in asynchronous communication contexts in Sect. 4. In Sect. 5, we provide information on our research questions and experiment design, before Sect. 6 presents our selection process of existing tools to include in the experiment. Section 7 details the main experiment, whose results are presented in Sect. 8. Threats to validity are laid out in Sect. 9. Our results are discussed in Sect. 10 before the paper is concluded in Sect. 11.

2 Background and related work

2.1 Shared understanding

Shared understanding is one of the three most important requirements engineering objectives [13]. For effective requirements communication, it is essential to develop and negotiate a shared understanding of the goals, plans, status, and context of a development project among all project partners [3, 14]. Glinz and Fricker [2] discuss the role of shared understanding in software engineering and identify enablers and obstacles. According to their work, achieving shared understand requires using of explicit documentation as far as needed while relying on implicit mental models as far as possible [2]. Therefore, achieving shared understanding requires the use of practices that use suitable communication mechanisms supporting a proactive information exchange (social collaboration) among all project partners. These practices should make mental models tangible by using explicit representations that focus on abstraction and summarization [15].

Braunschweig and Seaman [16] developed a technique to measure the shared understanding achieved by a group of stakeholders using Pathfinder Networks (PFNets). To use this technique, stakeholders fill out a spreadsheet with relatedness ratings of concept pairs. These ratings are then used to create graphs called PFNets as introduced by Dearholt and Schvaneveldt [17]. Shortest paths can be calculated by using the relatedness ratings as edge weights. The PFNets of a pair of stakeholders can be compared by determining the similarity of the neighborhoods of individual concept-nodes. Calculating the average of all concept similarities between two PFNets, a Network Similarity (NetSim) value for a stakeholder pair can be obtained.

2.2 Vision videos

One practice to support the achievement of a shared understanding are vision videos presenting the project vision [9]. The term vision video has been defined by Karras et al. [8, p. 2] as a video “that represents a vision or parts of it [(problem, key idea of the solution, improvement of the problem by the solution)] for achieving shared understanding among all parties involved by disclosing, discussing, and aligning their mental models of the future system”. To aid stakeholders in the creation of these vision videos, Schneider et al. [18] introduced the Affordable Video Approach which looks to minimize the effort required of stakeholders looking to create vision videos. This approach lead to the development of a set of guidelines by Karras and Schneider [19] that encompasses the entire video creation process. A further set of recommendations for the use of text in vision videos was created by Nagel et al. [20], who recommend the insertion of texts below the video content to portray specific details that cannot be shown in the video itself.

2.3 Synchronous communication

Several researchers proposed different approaches that focus on the use of vision videos to achieve shared understanding in local and primarily synchronous communication contexts.

Creighton et al. [21] introduced the use of videos to visualize scenarios by presenting workflows that are not yet implemented. This approach combines the produced videos with Unified Modeling Language (UML) diagrams to trace videos and requirements in later development phases. Brill et al. [22] expanded on this idea by investigating potential uses of videos in various phases of requirements engineering. Their experimental results yielded that videos help to avoid misunderstandings and clarify requirements better than textual use cases. Pham et al. [23] proposed an interactive storyboard to support requirements engineers to elicit, validate, and document requirements and visions of stakeholders. The interactive storyboard enables the production of a special kind of video that is enhanced by multimedia technologies such as overlays of hand-drawn sketches. Karras et al. [24] developed an approach to generate videos as a by-product of RE practices. They applied their approach to digital prototyping and demonstrated how interaction sequences on hand-drawn and digitally created mockups can be used to generate videos as additional support for textual scenarios. They found that such videos allow a slightly faster understanding of textual scenarios by developers compared to static mockups. Schneider et al. [18] investigated the use of videos in combination with text to support the elicitation of feedback. They found that the use of videos and text resulted in more feedback than the mere use of individual media.

2.4 Asynchronous communication

Besides the approaches focusing on local and more synchronous communication contexts, some researchers already considered the use of vision videos in global and asynchronous communication contexts. The potential use of vision videos on multimedia platforms like YouTube has already been discussed by Schneider and Bertolli [25]. Karras et al. [26] investigated the use of vision videos on social-media platforms for CrowdRE. They analyzed over 4500 comments on a vision video from YouTube and found that these comments provide relevant information in the CrowdRE context. They conclude that vision videos can help to motivate stakeholders to actively participate in a crowd and solicit numerous video comments as a valuable source of feedback. Despite the so far limited use of vision videos in global and asynchronous communication contexts, videos are frequently used in e-learning, especially for asynchronous settings. Skylar [27] investigated the performance of students in synchronous and asynchronous online courses and found both to be effective. Palsole and Awalt [28] transferred team-based learning in a course into an asynchronous online setting. They found that the teams performed well and approximately 90% of all participants completed the course.

2.5 Challenges of asynchronous communication contexts

Asynchronous communication contexts present distinct advantages when compared to their synchronous counterparts. These advantages present opportunities for stakeholders to achieve a shared understanding by participating in the discussion on their own time. However, the use of asynchronous communication contexts also introduces a number of drawbacks that could hamper the accomplishment of this task. We examined existing literature on asynchronous communication to find the challenges that should be taken into account.

One challenge impacting asynchronous discussions in a different way than synchronous communication contexts is the potential impact of differing levels of domain knowledge among stakeholders [2, 10]. Some stakeholders might have a false understanding of a piece of domain knowledge and may only realize their misunderstanding after discussions with their peers. In synchronous meetings, such situations can quickly be resolved since all relevant stakeholders are present and can provide their expertise on the domain. Therefore, the differing domain knowledge can quickly be resolved through a discussion. In asynchronous communication contexts, stakeholders have to wait an uncertain amount of time before receiving an answer on a question regarding the domain. In other cases, their misunderstanding might remain unnoticed since it cannot be guaranteed that all stakeholders read all messages.

Another challenge that is more prevalent in asynchronous communication contexts is the fact that misunderstandings are harder to detect and resolve [10, 29, 30]. Similar to differing levels of domain knowledge, they can be detected and resolved more quickly in synchronous discussions. Additionally, asynchronous discussions miss key communication elements like facial expressions, gestures and tone of speech. All of these aspects can diminish the potential of misunderstandings occurring. Therefore, misunderstandings are more likely to occur in asynchronous communication contexts.

Discussions of vision videos could potentially revolve around multiple different details of the project vision. This diversity in discussion topics is especially impactful in asynchronous communication contexts, as multiple different topics might be discussed in the same chain of messages by different stakeholders. An abstract illustration of this challenge is presented in Fig. 1. In such cases, there is no sequential ordering of messages as messages concerning different topics are interwoven in the message chain [29]. This interwoven nature can make it hard for stakeholders to follow a topic of interest in the asynchronous communication.

Fig. 1
figure 1

Abstract illustration of the missing sequential ordering of messages. A stakeholder wanting to gather a complete picture of a specific topic needs to filter all messages for the relevant comments while keeping earlier messages in mind

Asynchronous communication contexts also present a higher risk for valuable ideas being missed [10, 29]. Requirements engineers cannot guarantee that all stakeholders read all messages written by their peers. This could lead to valuable ideas or important misunderstandings remaining undetected. For example, a stakeholder could raise a valid concern regarding a part of the implementation of the project vision while their peers are discussing other details presented in the vision video. If their concern is not recognized and discussed by other stakeholders, it might only be detected during the implementation process when the necessary changes are costly.

In synchronous meetings, a moderator is present who guides the discussion through the relevant topics. The moderator can also make sure that all meeting participants are attentive and actively take part in the discussion. This is not given in asynchronous communication contexts. Therefore, there is a higher chance of so-called Free-Riders not properly participating in the discussion and taking on a passive role [31]. This could once again lead to valuable ideas or concerns being missed as stakeholders who might have pointed them out do not participate in the discussion properly.

Furthermore, it can be hard to find an end point of a discussion in asynchronous communication [31]. All stakeholders can participate in the asynchronous discussion when their schedule allows them to do so. This can lead to some stakeholders giving their thoughts at a much later time than their peers. Requirements engineers holding asynchronous discussions are therefore faced with the challenge of finding a suitable end point of the discussion at which a final conclusion can be reached for a certain topic. Moreover, it can be difficult to gauge how aligned the opinions of stakeholders are on a certain topic only based on messages.

Stakeholders participating in asynchronous discussions according to their own schedule can also lead to vastly differing times at which they perform the different steps asked of them [32, 33]. For example, stakeholders participating at a later time than their peers will find an already progressed discussion. This could tempt them to give their thoughts on individual discussion topics before watching the vision video. This way, the missing coordination of steps that would be enforced by a moderator in synchronous discussions could lead to further misunderstandings.

Lastly, asynchronous communication contexts also lessen the opportunities for discussion participants to feel socially present [34, 35]. A discussant’s social presence is defined as their ability to project themselves socially and emotionally and therefore feeling themselves to be recognized as a real person in media supported communication. Stakeholders experiencing a lower social presence are less inclined to answer questions raised by their peers and therefore provide less feedback that might be valuable to the project.

The presented challenges are valid concerns that diminish the suitability of asynchronous communication contexts to the task of achieving a shared understanding of a project vision in a group of stakeholders. However, the opportunities introduced by the asynchronicity cannot be dismissed. Therefore, the research presented in this paper looks to minimize the impact of the presented challenges to support asynchronous discussions between stakeholders as much as possible. To the best of our knowledge, our work is the first attempt to achieve a shared understanding in asynchronous communication contexts that explicitly makes use of vision videos and examines the suitability of existing online communication tools to this task. Using this novel basis, we hope to turn asynchronicity into an opportunity for groups of stakeholders who cannot meet synchronously.

3 Research methodology

In this paper, we look to support stakeholders in achieving a shared understanding in asynchronous communication contexts in two ways.

First, we examine the challenges of asynchronous communication extracted from literature. Through this examination, concepts can be created that either eliminate the sources of these challenges or reduce their impact. Suitable concepts supporting the asynchronous communication on project visions can make such contexts accessible to more groups of stakeholders. This may allow new stakeholders to more easily contribute to the discussion leading to the inclusion of new insights.

Secondly, we also examine the different categories of existing online communication tools. Finding a suitable tool that stakeholders are already familiar with further reduces their barrier of entry. Such a reduction not only enables more stakeholders to take part in the discussion but also increases the amount of mental capacity that stakeholders are able to utilize for their participation instead of having to use it to familiarize themselves with a new tool. Therefore, we investigate the different categories and also perform a preselection of potentially suitable tools. This preselection is then used as the base set for a candidate workshop in which we look to come up with the three most likely candidates.

In our main experiment, we look to combine these three candidates with our concepts to gather insights on their suitability to our research goal. We also test a prototype that represents the full implementation of our concepts. This way, we collect evidence for the suitability of our concepts and are also able to determine the best suited existing online communication tool as well as the size of the differences between the candidates.

An overview of our research methodology can be found in Fig. 2.

Fig. 2
figure 2

Overview of our research methodology

4 A new approach to support shared understanding in asynchronicity

In order to reach our research goal, we look to create concepts that lessen the impact of the common challenges of asynchronous communication we extracted from literature.

Questions of Understanding We adopt the concept of Questions of Understanding from related work by Nagel et al. [36]. These questions ensure that all stakeholders understand the presented content of an artifact correctly and clarify domain-specific terminologies. Differing from prior research, we propose to force stakeholders to answer Questions of Understanding before being allowed to take part in a discussion. In this way, we can ensure that all discussion members have a basic understanding of the presented content.

The inclusion of Questions of Understanding can unveil differing levels of domain knowledge. Stakeholders who otherwise might not have questioned their domain knowledge can resolve these disparities instead of introducing them to the discussion. Additionally, they also receive feedback on their answers by the tool including an indication of which option would have been correct. This might resolve their differing domain knowledge immediately. In a similar vein, our concept can unveil and resolve misunderstandings that could have been left undetected.

In order to include Questions of Understanding in their asynchronous communication context, requirements engineers need to examine the used vision video to identify terms or pieces of domain knowledge that might not be trivial to all stakeholders. For this purpose, misunderstandings and points of contention unveiled during the earliest phases of stakeholder communication, which lead to the creation of the vision video, can be used. This way, the creation of Questions of Understanding can be done with limited effort and lead to a decrease of stakeholder effort spent on clearing up disparities of domain knowledge.

Requirements Engineers as Facilitators Synchronous meetings are often held under the guidance of a moderator who guides the participants [37, 38]. A traditional moderator role cannot be present in asynchronous communication. However, the active and collaborative participation of all stakeholders, that can be motivated by a moderator [37], is still vital for achieving a shared understanding [39]. We therefore propose to have requirements engineers play a facilitating role in asynchronous communication. This can be done by providing some initial questions or reacting to comments made by stakeholders to motivate them to participate even more. However, requirements engineers should remain neutral in discussions so that stakeholders can reach final conclusions on their own.

Employing requirements engineers as facilitators for asynchronous discussions can lead to the detection of valuable ideas that might otherwise have been missed. Requirements engineers can point stakeholders to the valuable idea and ask them to explicitly discuss the idea. Furthermore, such an employment of requirements engineers allows them to detect potential Free-Riders who they can then contact privately to encourage them to participate. In terms of the effort necessary to employ this concept, requirements engineers should monitor the entire discussion instead of simply waiting for final results in order to detect their own misconceptions. Therefore, only a moderate increase of their effort is necessary to include a more active facilitation of the discussants.

Message Frames A logical and sequential ordering of individual sentences is important to enable humans to reach conclusions from conversations [29]. Our concept of Message Frames looks to implement this idea on asynchronous communication, where such sequential orderings are hard to follow [29]. In such contexts, the order of messages does not necessarily have to follow the order of discussion topics. Stakeholders can start a topic and return to the discussion after other stakeholders have commented with ideas on other topics. When messages regarding the same topic are located in widely different positions in the ordering of messages, it is hard for stakeholders to follow a discussion [29]. This challenge is especially prevalent when the number of discussants and messages increase. Message Frames combat this challenge by summarizing comments dealing with the same topic in a logical order. They are summaries for the discussion topics created by requirements engineers. Multiple sets of Message Frames can be created at different points in time to not only restore the logical order of messages within a summary but also between multiple iterations of a discussion. For example, a requirements engineer could summarize all comments regarding the topic of “security” in one Message Frame. This makes it easier for stakeholders to finalize their thoughts on any given topic. Message Frames can thereby lead to more explicit shared understanding. An abstract illustration of this process is visualized in Fig. 3.

Fig. 3
figure 3

Abstract illustration of the creation process of message frames. A requirements engineer summarizes the messages written by the discussants and creates a message frame for each topic

In addition to the creation of a more explicit shared understanding, Message Frames can also minimize the impact of multiple other challenges of asynchronous communication contexts. One such challenge is the potential of misunderstandings occurring. By having requirements engineers summarize the contents of a discussion and asking the discussants for feedback on these summaries, misunderstandings can be unveiled and resolved. Additionally, the summaries might point stakeholders to valuable ideas they might have missed. Asking stakeholders to give feedback on a summary containing an idea they had previously missed can allow them to discuss the idea with their peers. Lastly, Message Frames can solve the challenge of a missing sequential ordering of messages. In the case of a lengthy and convoluted discussion of different topics, the creation of multiple Message Frames at different points in time can enable stakeholders to keep an overview. In this way, stakeholders can recognize changes in the opinions of the discussants over time, as each Message Frame consists of the summary of each discussion topic up to a specific point. Therefore, the sequential ordering of messages is restored on the summary level.

Message Frames are by far the most cost intensive of our concepts. Requirements engineers looking to create Message Frames need to read the full discussion, identify the different topics and summarize the relevant comments. This is likely to take considerable time and effort, however, the concept offers important benefits in return.

Polls Polling is one possibility to reach definitive conclusions at the end of a discussion [31]. Polls can turn implicit shared understanding into explicit shared understanding [2]. We recommend using the Paraphrasing Method [2] to create the polling questions. By paraphrasing the comments made by the participants and asking for their feedback before enabling the polls, requirements engineers can ensure that there are no misunderstandings [2]. Additionally, we propose that stakeholders should be encouraged to suggest additional polling questions themselves. This allows them to directly ask their peers about unresolved uncertainties. A potential side benefit of the use of polls is that they can also be used to gather an initial indication of a group’s level of shared understanding. Groups of stakeholders giving the same answer to a polling question are likely to have a higher level of shared understanding than other groups giving more diverse answers.

The creation of polls can lead to stakeholders detecting valuable ideas they would have missed if no poll was created for the idea. Additionally, the polls can be used to reach final conclusions. If a decision is required, polls can be created to gauge the opinion of the group of stakeholders. The creation of these polls should not take too much time as a requirements engineer monitoring an asynchronous discussion of a vision video can identify points of contention based on the amount of stakeholder comments that disagree with one another.

Step-by-Step Design Another drawback of asynchronous communication is the difficulty of coordinating the stakeholders [32]. Important steps could be performed in different orders, thereby creating a chasm between individual knowledge bases. Providing an explicit process is one way to counteract this phenomenon [32]. Therefore, we propose an enforcement of such a process. At first, our concepts only allow stakeholders to get familiar with the content of the presented artifact. Their next step is to answer Questions of Understanding, thereby ensuring that they have a common knowledge base. Stakeholders are only allowed to contribute to the discussion once they answer all Questions of Understanding correctly. Furthermore, our concepts also include fixed time frames for the existing steps. One task of moderators in synchronous meetings is to lead participants through the phases of the agenda within a given time [37]. We incorporate this aspect by providing fixed time frames for each step of the process. Stakeholders are thereby kept from delaying their participation. Simultaneously, the fixed time frames also provide requirements engineers with a concrete time at which feedback regarding the presented content will be available. This way, our concept of a Step-By-Step Design introduces a coordination of steps that was previously missing.

The inclusion of this concept in an asynchronous communication context requires careful planning of which steps the discussants should perform in which order. During the discussion, the effort for requirements engineers largely depends on how strictly they want to enforce the order of steps. A stricter enforcement will lead to more effort being required than in a more relaxed discussion environment.

The presented concepts are created to minimize the impact of the challenges of asynchronous communication contexts we found in the literature. Table 1 presents an overview of which concepts are designed to solve which challenges.

Table 1 Overview of our concepts and the challenges they look to solve

5 Empirical evaluation

We conducted a user study to evaluate the suitability of our concepts to our research goal of developing suitable concepts to support stakeholders in achieving a shared understanding in asynchronous communication contexts.

5.1 Research questions

To reach this goal, we seek to answer the following two research questions:

figure j

The first research question focuses on the evaluation of the concepts introduced in this paper. Our concepts are designed to help stakeholders overcome various challenges of asynchronous communication contexts. Answering this research question provides insights on whether or not our concepts are successful in this task.

figure k

The second research question regards the examination of existing online communication tools. By answering this research question, we hope to find a suitable tool that can be adapted to our concepts. Such a tool could support various groups of stakeholders with the necessary means to achieve a shared understanding in asynchronous communication contexts without having to familiarize themselves with any specialized software.

5.2 Experiment design

We carefully designed our empirical evaluation to minimize the presence and impact of potential threats to validity. One of these major decisions was the design as a between-subjects study. A within-subjects design would have introduced a number of threats like a learning effect between iterations of the discussion and a higher mental strain put on our participants. Since the forced asynchronicity of our experiment resulted in a rather lengthy schedule for our participants, we looked to avoid any other influences that could have introduced further mental strains.

A second important design decision was the inclusion of multiple treatment groups in our experiment. We decided to select a number of already existing online communication tools to evaluate their suitability to the support of stakeholders communicating asynchronously. For each of these existing tools, we examined their adaptability to our concepts. Each tool was adapted as closely to our intended concepts as possible. Additionally, we created a prototype of our own to obtain results on the value of a full implementation of all of our concepts.

Moreover, we decided to evaluate the shared understanding of a control group to compare the results of our treatment groups to. The control group was designed to simply watch the vision video without any means of communication. This way, we were able to isolate the impact of the vision video for this group, thereby establishing a baseline for the level of shared understanding that emerges from the video itself.

A further design decision was the scheduling of appointments with our participants to ensure a strictly asynchronous discussion. All participants were asked to engage in the discussions at multiple specific points in time over the course of a week. We made sure that the schedules of different participants did not overlap at any point during the experiment. This way, our results were guaranteed to not be influenced by incidental occurrences of synchronous communication.

6 Selection of tools for the experiment

The concepts presented in this paper are novel approaches to diminish the impact of the challenges of asynchronous communication settings. In order to gauge their suitability to this goal, we implemented them in a prototype. Additionally, we decided to also investigate the applicability of our concepts to existing online communication tools. By including multiple existing tools in our research, we can examine both the impact of our concepts as well as the suitability of existing communication tools. We aim to collect evidence to form a recommendation of which existing tool is best suited to the support of stakeholders communicating asynchronously.

There are many online communication tools that could potentially be suited to the support of stakeholders looking to achieve a shared understanding. We decided to narrow down the selection of available online communication tools through multiple steps. First, we examined which categories of online communication tools are available. Then, we examined the categories based on their suitability to the discussion of vision videos by a group of stakeholders. This suitability was determined by investigating the categories on whether they enable a dialogue or simply broadcast information [40]. Only tools that allow a dialogue between stakeholders are suited for our research. Additionally, it is important for the tools to offer means to display the vision video as we are looking for stand-alone communication tools to reduce the amount of software stakeholders have to familiarize themselves with. Once we evaluated the categories and selected representative candidates, we presented them in a small workshop to three participants. In this workshop, the candidates were discussed and a list of the top three candidates was created. These top three candidates were then adapted to our concepts and evaluated in our study.

6.1 Categories of online communication tools

The initial step of our search for existing online communication tools suited to our research goal was to gather an overview of the available tool categories. We based this search on existing literature.

One category of online communication tools is the category of multimedia-platforms [41]. These platforms offer large amounts of video, photo or audio files that can be streamed to the user’s device on demand. Multimedia-platforms are suitable candidates for the support of online discussions of vision videos between stakeholders since they offer means to share the vision video and enable multiple users to communicate on the same web page. Examples for tools in this category include Instagram, Facebook and YouTube.

Another category is represented by instant-messaging services [41]. They enable their users to connect with one another asynchronously by transmitting messages. These messages can also consist of video, photo or audio files. The services also allow for the creation of groups in which users can communicate with multiple other users at the same time. Therefore, they fulfill both criteria for suitability by allowing a dialogue in a group and an easy sharing of the vision video. Exemplary tools for this category include Discord, Skype and WhatsApp.

Wiki systems are a further category to be recognized [41]. They consist of a system of hypertext based web pages that users can view and edit in a web browser. Some of these wiki systems offer functionality to include videos on their web pages. Comment systems can also be included. Wiki systems that contain both of these aspects - like Confluence - are suited candidates for the support of stakeholders discussing a vision video since stakeholders can discuss the video in the comment section of the web page that the video is displayed on.

Weblogs are another category of online communication tools [41]. Using weblogs, users can create and manage publicized web pages. The pages allow for the inclusion of video content and can also have a comment section. However, we decided against including a candidate of the Weblog category in our workshop as the functionality of a comment section and video presentation on a web page was too similar to the more popular category of wiki systems. As of this paper’s publishing, Confluence is used by over 235,000 companies.Footnote 1

A further category of online communication tool is represented by E-Mail based tools. E-Mails are the most important computer-mediated communication tool and one which any given group of stakeholders can reasonably be assumed to be familiar with [42]. For the use of with vision videos, video files can theoretically be sent as attachments of the E-Mails. However, conversations between groups are hard to follow when messages are sent in quick succession. Communication partners might be in the middle of writing a response while receiving a new message and therefore miss the incoming content before sending out their own. Therefore, we excluded the category from further examinations.

Lastly, the category of ticketing systems is available for organizations who look to only communicate with an internal group of stakeholders. These systems benefit from a high degree of clarity of large amounts of information as tickets are structured within the system [43]. Videos can be attached to tickets who also can also offer a commenting system. Additionally, different tickets can theoretically be used to organize diverse topics of discussion. The example of a ticketing system examined in our workshop was JIRA.

6.2 Selection of candidates

Based on the categories of online communication tools, we conducted a workshop in which three participants discussed the potentially suited candidates. All participants were currently enrolled in a master’s degree program at a German university.Footnote 2 The workshop consisted of a brief introduction to shared understanding and asynchronous communication, before each participant was asked to explain their thoughts regarding the advantages and disadvantages of each candidate. Ultimately, we asked participants to decide on the top three candidates they thought should be investigated further in the main experiment of our research.

A total of eight different online communication tools were discussed in the workshop, namely the three multimedia platforms of Instagram, Facebook and YouTube, the three instant-messaging services Discord, Skype and WhatsApp, the wiki system Confluence and finally the ticketing system JIRA. Our participants agreed on YouTube, Discord and Confluence as the three most suited candidates for the main experiment. In the following, we present the advantages and disadvantages of these three candidates.

6.2.1 YouTube

One of the main advantages of YouTube as rated by our participants is the easy to use interface. The popularity of YouTube further amplifies it’s suitability. With over 2.1 worldwide users,Footnote 3 it is likely that most stakeholders are familiar with the platform. YouTube’s comment section is organized in an uncomplicated structure, as comment chains only consist of one parent comment and replies that are situated underneath. This makes it easier for stakeholders to figure out which comment chain is of relevance to them. Additionally, users can reference other discussants in their comments. Other advantages of YouTube can be found in its Analytics service which requirements engineers can use to ensure that all stakeholders have watched the video, and the variable video quality which can enable stakeholders with slower internet connections to watch the video.

Fig. 4
figure 4

Exemplary screenshot of a YouTube page. While the left side of the screen is dedicated to the video, the right side might distract stakeholders from the relevant content

As for disadvantages of YouTube, it is important to recognize that a sizeable part of the available screen space is used to recommend other videos to the user, as is marked in Fig. 4. This could distract stakeholders who are more interested in the recommended videos than the comments of the vision video. An additional disadvantage is the fact that the platform does not provide built-in functionality to hold polls. While the Like and Dislike functionality of comments can be used to prioritize them, it is not suited for polls who would be limited to yes or no questions. YouTube does not display the exact number of votes for each option and instead only presents an overall score. Furthermore, Questions of Understanding can only be asked in the description of a video and might therefore be ignored by stakeholders. YouTube offers no functionality barring users from commenting before having watched the video. Third party tools are required for Polls and the answering of Questions of Understanding.

6.2.2 Discord

Requirements engineers can create Discord servers for free at their leisure. These servers consist of text and voice channels only available to invited users. Voice channels can be joined for conference calls. Text channels offer functionality to write messages, upload files, and embed images. Users can reference other messages or other users. Discord also offers the possibility of including various bots who can extend its functionality. For example, Polls could be implemented in Discord using a bot. Additionally, Discord enables administrators to create different channels to distribute the topics of the discussion among multiple threads. Such a distribution allows for a general channel to not be cluttered with too many discussion topics. Another advantage that separates Discord from other instant-messaging services is the ability for administrators to pin messages to the top of the screen. This way, stakeholders can easily find important messages like general announcements.

While Discord is a popular instant-messaging service, it is not quite as popular as YouTube with 140 million monthly active users in 2021.Footnote 4 Therefore, some stakeholders might be unfamiliar with the interface and the larger amount of communication-related functionality available to individual users. This disadvantage is emphasized further by the inclusion of bots within a server, as each bot has their own interaction techniques, some of which might seem like cryptic commands to unfamiliar users. Lastly, Discord offers no direct functionality to automatically enforce the Step-by-Step design. A manual enforcement is theoretically possible by using Discord’s permission system. For example, requirements engineers could create one role for stakeholders who are yet to answer the Questions of Understanding and therefore do not have access to the discussion channel. A second role with access to the discussioncould then be assigned to stakeholders who have provided correct answers. However, such an approach would require a large amount of effort of requirements engineers moderating the discussion. A manual enforcement of the Step-by-Step design could also hinder stakeholders using Discord, as they would have to wait for the requirements engineer to assign them a new role with the rights to view the discussion channel once they complete the questions of understanding.

6.2.3 Confluence

The participants of the workshop rated Confluence’s user interface as particularly user friendly. Similarly to Discord, various plugins and bots can be used to extend the functionality of Confluence. The wiki system also offers a larger variety of text editing functions to design and structure comments. Another advantage of Confluence is the intuitive page system. Different pages for each of the steps of the Step-By-Step design can be created. While the concept cannot be enforced this way, it encourages stakeholders to follow the steps in the desired order.

Disadvantages of Confluence include the dependence on third party plugins or bots to hold Polls. In general, Confluence is not designed as a media platform, which introduces unnecessary functionality. For example, the initial page seen by participants of our experiment included further menus and options that could potentially confuse stakeholders. However, once participants selected the first page of the experiment, the user interface reverted to a more minimalistic and therefore beginner friendly design.

6.3 Adapting the selected tools

We assessed the adaptability of existing software tools for asynchronous communication, as preexisting familiarity with these tools could reduce the barrier of entry for stakeholders. Table 2 presents an overview of the concepts and the manner in which they were implemented for each tool including a new prototype we developed. The following paragraphs present the implementation of basic video playing and commenting functionality as well as the implementation of our concepts Questions of Understanding, Polls and Step-By-Step Design. The concepts Requirements Engineers as Facilitators and Message Frames were not implemented as technical adaptations of the tools, but as manual tasks of the requirements engineer’s role.

Table 2 Overview of the applicability of our concepts to each tool. Applicabiltiy: \(\checkmark\) fully, \(\bigcirc\) partially, and only manually* For YouTube, Polls had to be applied using a third party tool

6.3.1 YouTube

YouTube provides built-in functionality for the presentation of video content. The multimedia-platform offers a comment system which provides functionality to answer previously made comments and to reference other users. YouTube also includes a description section in which more context can be given. This description section can be used to line out the order of steps and the Questions of Understanding. There was no way to enforce the Step-By-Step Design or to hold Polls. While the Like and Dislike functionality of comments could be used, YouTube does not display the exact votes. Using these functions would also limit polls to yes or no questions. Third party tools are required for other Polls and for the answering of Questions of Understanding.

6.3.2 Confluence

Confluence includes functionality to organize knowledge on pages and a comment system. Videos can be embedded directly on these pages, while the comment section can be used to hold the discussions between stakeholders. One page can be created to view the video, one page to answer Questions of Understanding, one page for the comment section, and one final page for polling questions. In this way, the Step-By-Step Design can be implemented partially, as the order of steps cannot be enforced. There also is no built-in functionality for Polls. Instead, a suite of plugins is available within Atlassian’s marketplace. Requirements engineers can use the space on the individual pages to include texts explaining the step itself and what actions are expected of stakeholders.

6.3.3 Discord

Within Discord, Videos can be shared within a text channel. To keep this important message easily available at all times, we used Discord’s pin function to pin it to the top of the screen. The same text channel can be used for discussions between stakeholders. Additionally, the threads function can be used to create new environments for especially important topics or to implement our concept of Questions of Understanding which were asked in a separate thread. For the Step-By-Step Design, a message can be pinned detailing the order of steps. However, the compliance with this order cannot be enforced. Discord also does not offer built-in functionality for polling. For this reason, free plugins must be used to enable our concept of Polls.

6.4 Building a dedicated prototype

The existing tools evaluated in this paper offer functionality suited to some of our concepts. However, none of them could be adapted to include all concepts to their full extent. For this reason, we developed a prototype that implements all five concepts. For the purposes of this paper, we call this new prototype Vision Video Platform for Asynchronous Discussions or ViViPAD for short.

Fig. 5
figure 5

Screenshot of the ViViPAD prototype presenting the Experiment Procedure page

ViViPAD was implemented as a single page application, a screenshot of which can be found in Fig. 5. ViViPAD always displays the vision video at the top of the screen (1). This means that participants have access to the main medium being discussed at all times. Stakeholders can click through the pages of the prototype (2), which represent the Step-By-Step Design. Some pages only unlock after performing prior steps, thereby making sure that users of our prototype perform the steps in the correct order without for example skipping a question. The main area of ViViPAD displays the selected page’s content (3). This content was either the set of Questions of Understanding that needed to be answered before being able to participate in the discussion, or the discussion itself. As ViViPAD was specifically developed for use in our evaluation, we also included a page detailing the procedure of our experiment. Lastly, the polls created by the requirements engineer could be accessed as the final page. When providing new comments, stakeholders are required to give a headline to assist requirements engineers in the creation of Message Frames.

7 Main experiment

Our main experiment was conducted to evaluate the benefits and drawbacks of our adaptations of the existing online communication tools and ViViPAD. This way, we look to obtain evidence regarding the suitability of our concepts.

7.1 Material

To facilitate a discussion with the prototypical implementations of our concepts lined out in Sect. 6, we made use of a vision videoFootnote 5 published by Hyundai on YouTube as the basis for all discussion topics. The video presented a vision on the future of urban mobility with autonomously moving vehicles and hubs for these vehicles to converge to. We chose to use this video for our study as it is easy to grasp and relevant to all participants of modern traffic. This way, the usage of this particular video enables a large population of potential participants to easily relate to the role of a stakeholder.

We handed out the credentials of newly created e-mail addresses and user accounts for the adapted tools to our participants in order to provide an uninhibited access to the tools. This also preserves the privacy of their personal accounts and lowers the barrier of entry. Additionally, we provided them with a link to an online spreadsheet editor, in which a spreadsheet was already prepared for each user. This spreadsheet was filled out by participants to allow the use of the PFNets method laid out in Sect. 2 and is available on Zenodo [44]. A link to an online questionnaire was also distributed at the end of the study. This questionnaire consisted of questions on the suitability of asynchronous communication for the support of a shared understanding and participants’ preference between synchronous and asynchronous communication contexts. Lastly, we also asked our participants to answer the social presence dimension of the community of inquiry questionnaire introduced by Garrison and Arbaugh [34] which was validated in studies by Arbaugh et al. [45] and Swan et al. [46].

7.2 Participant selection

We used convenience sampling to recruit the participants for our study. Participation was not mandatory. A total of 30 participants took part in the study. All participants were active university students in Germany.Footnote 6 Our only requirement for our participant selection was a functioning computer on which to watch the vision video, answer the questionnaire and fill out the PFNets spreadsheet. Based on the contents of the vision video by Hyundai, we were looking to include potential stakeholders for the topic of future mobility. Therefore, anyone participating in modern traffic is a viable participant.

7.3 Experiment procedure

The study was conducted online over a total of five days, with each group participating on a single day. Participants were assigned to groups based on personal availability. Our only influence on these assignments was limited to the selection of time slots for participants whose availability was suited to multiple groups. The study was performed strictly online due to the Covid-19 pandemic. We performed an experiment session with a control group of six participants to establish a baseline. Members of this control group were asked to view the vision video on their own and had no support to discuss with any other group members. They were also explicitly asked to work on the spreadsheets on their own to ensure the validity of their answers. We designed the control group without any means of communication to measure the level of shared understanding that is created by simply watching the same vision video. To the best of our knowledge, no methodology for the achievement of a shared understanding in asynchronous communication contexts exists. Therefore, our study was designed to create a baseline of shared understanding when watching vision videos while also investigating the differences between supporting communication tools.

For members of the treatment groups, the study consisted of two distinct time windows. To ensure a strictly asynchronous setting, no participants were scheduled to take part at the same time. Participants were asked to perform the same set of steps during the two time windows. However, there were some differences in terms of the available functionality as outlined in Sect. 6.

In the first time window, participants were asked to watch the vision video for the first time before answering six Questions of Understanding. Participants were explicitly asked to answer these questions first before proceeding. However, this requirement could only be enforced in ViViPAD. Lastly, participants were allowed to leave comments and add to existing parts of the discussion.

Between the two time windows, the experimenter scanned through the comments and created Message Frames. Polling questions were also determined. This was done using the paraphrasing method, meaning that the experimenter repeated the stakeholders’ requirements using their own words. Stakeholders could then give feedback on whether or not the extracted requirement was understood correctly.

The second time window started by providing the Message Frames before participants answered the polling questions. For the treatment group supported by YouTube, this was done via telephone. Next, each participant was asked to read the submitted comments and respond to them. After all participants had finished the second time window, they were asked to review the results of the Polls before answering the questionnaire and filling in the PFNets spreadsheet.

7.4 Data analysis procedures

To answer our research questions, we created two sets of hypotheses. Each set is designed to answer one research question. The first set of hypotheses aims at finding differences between each of the four treatment groups and the control group:

figure l

The second set deals with the differences between the different supporting tools. For example, we look to find a difference between the treatment group communicating via YouTube and the one being supported by ViViPAD:

figure m

Figure 6 visualizes the individual hypotheses of the two sets.

Fig. 6
figure 6

Visualization of the two sets of hypotheses. The first set (a) revolves around a comparison of our control group with the treatment groups. The second (b) concerns the comparison of the treatment groups with one another

To find data on which to base a potential rejection of these null hypotheses, we analyzed the PFNets spreadsheets filled out by our participants according to Braunschweig and Seaman [16]. Their technique resulted in network similarity (NetSim) values for all participant pairs. These were then used to calculate average NetSim values for each group and to calculate the statistical significance of differences in the achieved shared understanding between the groups. The statistical significance was determined by first testing for normal distribution using the Shapiro-Wilk test before applying the Mann–Whitney U test or the t-test, depending on the presence of a normal distribution. We also applied the Bonferroni-Holm correction. In addition, we extracted the results of the Polls and gathered answered questionnaires. For the Polls, we determined which choice was made by the majority of participants, before averaging the number of participants who were part of this majority for each poll performed in the respective treatment group. This resulted in the average size of the majority vote for each group. We analyzed the answers to the questionnaires descriptively.

8 Results

Our study focuses on measurements for the shared understanding within each group of the experiment. Furthermore, we also obtained information on participants’ thoughts on the suitability of asynchronous communication contexts and their general opinion on the software tool they were supported by.

8.1 NetSim

We measured the shared understanding within the groups of our experiment using the aforementioned PFNets method. The results are available on Zenodo [44]. As our results were normally distributed for all groups, we used the t-test. The results of the Shapiro-Wilk test can be found in Table 3.

Table 3 Results of Shapiro-Wilk tests. Note that the sample size for a group of six participants is 15 as we obtained similarity values for each participant pair

To test the set Hypotheses HC, we compared the values calculated for the control group with the values measured for each other software tool. We found statistically significant differences between the control group and the treatment groups supported by Discord and ViViPAD (cf. Table 4).

Table 4 Results for Hypotheses HC. The column Corrected p presents the p-values resulting from the Bonferroni-Holm correction

Hypotheses HT were tested by determining the statistical significance of differences between the treatment groups. Such differences were found between the group supported by ViViPAD and all other treatment groups (cf. Table 5).

Table 5 Results for Hypotheses HT. The column Corrected p presents the p-values resulting from the Bonferroni-Holm correction

To gain a better understanding of the magnitude of the differences between the examined groups, we calculated the effect sizes for all comparisons that were positively tested for statistical significance. The results of these calculations can be found in Table 6.

Table 6 Effect sizes for statistically significant differences between groups. We interpret the calculated values according to Cohen [47] and Sawilowsky [48]

8.2 Polls

Polls were created based on the discussion of each group. The groups supported by YouTube, Confluence and ViViPAD were asked eight polling questions each, while the group supported by Discord answered seven. We found average majority sizes of 72.6% for YouTube, 78.8% for Confluence, 71.1% for Discord and 76.8% for the group supported ViViPAD.

8.3 Questionnaire

The questionnaire consisted of questions regarding the general suitability of asynchronous communication contexts for discussing an artifact and the social presence dimension extracted from the community of inquiry questionnaire introduced by Garrison and Arbaugh [34]. The first question asked participants how suitable they thought asynchronous communication was for the discussion of a vision video’s content. No statistically significant differences could be found between the groups. Out of 24 participants, six answered neutrally. All other 18 participants indicated that they agreed or strongly agreed that asynchronous communication is suitable. An overview of these results can be found in Fig. 7.

Fig. 7
figure 7

Answers to the questionnaire regarding the suitability of asynchronous communication

A second question addressed the preference between asynchronous and synchronous communication. Once again, no statistically significant differences could be found. The answers were diverse for all treatment groups. In total, no participant strongly preferred synchronous communication, while five participants indicated that they preferred synchronous communication and five participants answered neutrally. A total of nine participants preferred asynchronous communication, with an additional five participants strongly preferring asynchronous communication. A visual representation of these results can be found in Fig. 8.

Fig. 8
figure 8

Answers to the questionnaire regarding participants’ preference between asynchronous and synchronous communication

In addition to questions answered on Likert scales, we also asked open questions regarding positive and negative aspects of asynchronous communication. The most often mentioned positives were having enough time to think, developing ideas and the temporal flexibility. Negative aspects included delayed answers and missed comments, as well as the longer time required for final conclusions.

The final question asked for opinions on a statement indicating Questions of Understanding as valuable. Once again, no statistically significant differences could be found between the treatment groups. Only a single participant strongly disagreed, while two other participants gave neutral answers. Twelve participants agreed with the statement, and a further nine participants agreed strongly.

As for the social presence experienced by our participants, we aggregated the responses of our participants along the subcategories affective expression, open communication and group cohesion to enable an easier comparison between the treatment groups. The aggregation resulted in the combination of our participants’ answers to three questions per subcategory. Therefore, 18 responses are presented per subcategory and treatment group. Differences between the groups are only marginal and not statistically significant. Answers to the subcategory affective expression were rather diverse for all groups as can be found in Fig. 9. The least diverse answers were given by the group supported by Confluence. In contrast, as visible in Fig. 10, responses to the questions of the subcategory open communication were more consistent. Only two answers by members of the group supported by YouTube indicated negative levels of open communication. All other responses were either neutral or positive. The most positive values were indicated by the treatment group supported by Discord. Lastly, we obtained mostly positive, but also diverse answers for the subcategory group cohesion as is shown in Fig. 11.

Fig. 9
figure 9

Responses to the social presence dimension of Garrison and Arbaugh’s [34] community of inquiry questionnaire regarding the affective expression

Fig. 10
figure 10

Responses to the social presence dimension of Garrison and Arbaugh’s [34] community of inquiry questionnaire regarding the open communication

Fig. 11
figure 11

Responses to the social presence dimension of Garrison and Arbaugh’s [34] community of inquiry questionnaire regarding the group cohesion

9 Threats to validity

We report the threats to validity of our results according to the classification by Wohlin et al. [49].

The conclusion validity of our results is threatened by the small sample size. Having only six participants per treatment group increases the risk of statistical noise impacting the results. However, we chose to include three existing software tools in our evaluation rather than increase the sample size for only one or two, as we obtained three clear favorites in the workshop. Another threat to the conclusion validity is the fact that we asked participants who had only discussed the vision asynchronously about their preference between synchronous and asynchronous communication. Nevertheless, it is easy for participants to imagine synchronous discussions and the answers to the open questions of the questionnaire gave concrete reasons for this preference.

One threat to the internal validity of our study is the potential of exhausted participants giving incomplete answers. Participants of our study were asked to work in two time windows and asked to fill in multiple documents over the course of a day. We chose this type of study to reliably simulate an asynchronous setting and also gave participants a lengthy break between the time windows. Furthermore, participants could in theory have interacted with one another outside of the asynchronous communication tools. We minimized this threat by creating new accounts without any identifying information for all participants on all software tools used in the study.

A threat to the construct validity is the mono-method bias. We chose not to include further metrics to avoid an even higher potential for participant exhaustion. Another threat is that participants might understand the same term differently when filling in the PFNets spreadsheet. We only included terms that were short and clearly recognizable in the vision video to minimize this threat. An additional threat is posed by the fact that we only simulated the presence of different time zones by assigning distinct time frames to all participants. An experiment including multiple time zones would have been preferable, but was not feasible.

The external validity of our results is threatened by participants’ knowledge of the fact that they were taking part in an experiment. A study with practitioners in a real-world use case would have been preferable. Another threat is the potential that we might have missed a suitable existing tool. However, we tried to minimize this threat by conducting the workshop and discussing the results with multiple researchers. Furthermore, the experiment was conducted over the course of a single day while a real-world application would likely be performed over the course of multiple days. We accepted this threat as the threat of participant exhaustion might have been increased further had we conducted a multi-day study.

Additionally, our findings originate from a study exclusively populated with university students due to the use of convenience sampling. This impacts the generalizability of our results, especially to industry. We attempted to limit the impact of this threat by choosing a vision video of a domain for which our participants were able to reasonably assume the role of stakeholders, namely the domain of future mobility. All of our participants engage with urban traffic on an everyday basis which means that a vision of future mobility in urban contexts is directly relevant to them.

The use of students as participants might have lead to a less professional use of the communication tools. This poses a threat to the validity of our results as it could have increased the importance of individual concepts like the Step-by-Step Design, as students participating in an empirical study might be more likely to gloss over hard to answer Questions of Understanding than real stakeholders in a corporate environment. However, we did not find any evidence for an impact of this threat while observing the discussions over the course of our study.

A further threat to the external validity of our results is the use of only a single video as the basis for discussions amongst our participants. This limits the discussion to the domain of future mobility. Some participants might be more or less interested in this domain and therefore discuss with more or less interest. We decided against presenting multiple videos to all groups of stakeholders to focus the discussions and not overload our participants.

The suitability of the social presence dimension of Garrison and Arbaugh’s [34] community of inquiry questionnaire to the measurement of a users experienced social presence is debated in related research. Nevertheless, we included it in our questionnaire, as the questions raised by Garrison and Arbaugh encapsulate aspects of an online communication tool that in our eyes should be considered, evaluated and improved on with scientific research. For example, even if a rating of an agreement with the statement I was able to form distinct impressions of some course participants. is not perfectly suited to the measurement of a users social presence, its responses still provide valuable insights on what could be improved in future work. If users’ answers to this statement are particularly negative, future research should focus on ways to improve this aspect rather than others that are already rated positively. Furthermore, the responses of users of different communication tools can also be compared to find which tool is best suited to which aspect and to determine which elements of a communication tool should be included in a future implementation.

10 Discussion

The results of our study show clear differences between the achieved level of shared understanding among the participants of the five groups. In particular, we found that all treatment groups supported by one of the four software tools (YouTube, Confluence, Discord, and ViViPAD) achieved a higher average level of shared understanding than the control group. This finding is indicated by the higher average NetSim values, as a higher NetSim value indicates a higher level of shared understanding [16]. When comparing the results of the treatment groups, we found that the group supported by ViViPAD achieved a statistically significantly higher level of shared understanding than every other treatment group (cf. HT3.1, HT5.1 and HT6.1).

These results substantiate the suitability of our concepts to support stakeholders in achieving a shared understanding in an asynchronous communication context. First, all software tools, even adapted with only a partial implementation of our concepts, result in a higher level of shared understanding than the control group. In accordance with the results of Nagel et al. [10], our results show the importance of enabling discussions between stakeholders in asynchronous settings. Even partial concepts already help to achieve a better understanding, as they improve stakeholders’ capabilities to communicate with each other. Second, implementing all concepts to their full extent (as in ViViPAD) provides a solid basis for achieving a higher level of shared understanding. In all four software tools, we tried to implement each concept as fully as possible. However, for the three existing tools, we had no access to their source code and thus had to make compromises, such as using plugins, to enable the concept as intended. In contrast, ViViPAD allowed us to implement and combine the concepts to reach their full potential. For this reason, the main difference between ViViPAD and the adapted software tools is the degree to which the concepts could be implemented.

Similar to Karras et al. [50, 51], we consider this difference from a cognitive science perspective. We use the theory of cognitive load by Sweller et al. [52] to explain why our prototype performs better than the other three tools.

First, we briefly summarize the cognitive load theory. According to Sweller et al. [52], cognitive load determines the required working memory resources of a human brain to process information of given materials. If the cognitive load exceeds the available working memory resources, a human will fail, at least in parts, to process the materials. The total cognitive load imposed by the materials used consists additively of the intrinsic and extraneous cognitive loads. While the intrinsic cognitive load depends on the nature of the materials in terms of their content difficulty and complexity, extraneous cognitive load depends on the representation and design of the materials. Both cognitive loads and thus the total cognitive load are mainly influenced by the element interactivity of the materials. Sweller et al. [52, p. 58] explain that those “interacting elements are defined as elements that must be processed simultaneously in working memory because they are related”. The higher the element interactivity, the higher the cognitive load as more working memory resources are necessary to keep in mind the related information. According to Sweller et al. [52], a human needs to process all related information simultaneously in order to understand the overall content. A successive processing of the particular materials only enables the understanding of the single materials, but not of their interrelationships. Therefore, interrelating all provided materials is an essential part to understand and process them.

Considering the cognitive load theory [52], we use the same materials in terms of vision video, questions, explanations, etc. in all tools, all of which have a high element interactivity as the materials are strongly associated with each other. However, depending on the tool and its degree of implementing our concepts, we have a varying degree of split presentations of the materials. As a consequence, the intrinsic cognitive load is similar among the tools as we use the same materials, but the extraneous cognitive load varies due to the different representations. YouTube has the smallest degree of implementing our concepts, as polls required a third party tool, splitting the video and the questions that should be considered together. This splitting significantly increases the cognitive load, as video and question can no longer be processed directly together and simultaneously. Confluence and Discord present a better integration of the concepts and thus lower cognitive load, as in particular the use of bots and plugins for the polls mitigates the splitting of the materials. Comparing Confluence and Discord regarding their representation of the materials, Discord is slightly better suited as it is intended as a communication platform and videos can be more easily integrated into and combined with the text channels. Confluence is more complex because it is more than just a communication platform. The many additional functions and the division of materials into different pages leads to a higher level of cognitive load. In contrast, we can assume that ViViPAD has the lowest cognitive load compared with the other three tools as it purposefully designed to fully implement our concepts. As a result, the splitting of the materials is lower concluding in a lower cognitive load. Based on this argumentation using the cognitive load theory, we can rank the tools regarding their suitability for asynchronous communication as follows: 1) ViViPAD, 2) Discord, 3) Confluence, and 4) YouTube. This ranking also matches the ranking of the tools based on the average NetSim values (cf. Table 4).

While the results show that even the partial implementations lead to a higher shared understanding than the control group, ViViPAD achieved the best results overall with effect sizes ranging from large to huge [47, 48]. We assume that the main reason for these results are the concepts Step-By-Step Design and Polls as their full implementation is the main difference between ViViPAD and the other communication tools supporting the treatment groups. The Step-By-Step Design provides a structured framework for all other concepts. For example, ViViPAD enforces the answering of Questions of Understanding before participants can access the comment section. In this way, the full implementation of the Step-By-Step Design emphasized the importance of these questions and ensured that the participants are familiar with the video content before writing any comment. The Polls in turn require a closer look at all materials together to give an answer, which is easier or more difficult depending on the degree of splitting of the materials in the respective tool. As a consequence, these concepts were better integrated and combined resulting in a higher level of shared understanding of stakeholders. These assumptions are supported by participants of the treatment groups supported by our adaptations of the three existing communication tools who claimed that the implementation of polls was lacking. Based on these insights, we provide the following answers to our research questions:

figure n
figure o

Besides the analysis of the shared understanding among the stakeholders in the respective groups, we also investigated the participants’ attitude towards the idea of being supported in achieving a shared understanding in asynchronous communication contexts. According to our results, most of them preferred the use of asynchronous communication contexts over synchronous ones. They justified this decision with a higher flexibility to take their time to think about the presented vision and for the development of questions, answers, and ideas for the discussion with the other stakeholders. This finding is in line with the advantages of asynchronous communication contexts found by Dowling and Lewis [29]. Additionally, our results show that participants of our study were able to openly state their opinion, disagree with other participants and react to comments made by by their peers. This is evident in the answers to the questionnaire on social presence, more specifically the subcategory of open communication. The corresponding statements were rated especially positively by members of the treatment group supported by Discord. A possible cause could be that they were the only group including emojis in their messages. While most participants of our experiment also rated the subcategory group cohesion positively, responses to the affective expression subcategory were more mixed. An explanation for these results might be the fact that while the groups seemingly experienced a feeling of working together, they still only communicated via text. A strong social presence experienced by a stakeholder can lessen the impact of language barriers or cultural differences. While our concepts are not explicitly designed to counteract these issues, they still help stakeholders overcome them by enabling them to communicate more openly.

Future work could include concepts that focus on dealing with cultural differences and enabling stakeholders to work more closely together, creating a stronger sense of cohesion among the participants. A first simple solution is the joint creation of artifacts, such as a glossary. In this way, each participant cannot only make their own contribution, but also recognize the contributions of others to a particular result. The creation of such specific results in the form of concrete artifacts represents a targeted collaboration that can help to strengthen the sense of community and cohesion among the participants. For more sophisticated solutions, the consideration of gamification is a promising direction, as shown by Kolpondinos and Glinz [53]. However, the integration of gamification into the existing set of concepts requires a more comprehensive approach. Such an approach must take into account the development of a suitable and, at best, collaborative game with appropriate motivational concepts and a reward system that help generate the necessary contributions to discussions, such as questions, answers, or ideas. A collaborative game, where the participants have to work towards a common goal such as reaching a shared understanding, would be even more advantageous, because participants are not only encouraged to play but can only win by working together. With these future measures, we expect an improvement in the experienced affective expression.

The generalizability of our results is limited. The groups of participants supported by each software tool are probably smaller than in a real-world setting. In addition, the participants had no real value in understanding the presented vision due to the fictitious experimental context. Nevertheless, our concepts are a promising starting point for future research. On one hand, future work needs to investigate how each concept individually contributes to a shared understanding, as we only investigated all concepts together. On the other hand, we observed difficulties in the experiment, such as language barriers and terminology issues. While so far we only have the partial solution of a Step-By-Step Design combined with mandatory Questions of Understanding to address these difficulties, the aforementioned idea of jointly creating artifacts, especially a glossary, is a promising solution for overcoming language barriers and terminology issues.

In summary, our results reveal the value of asynchronous communication contexts. Stakeholders are able to disclose, discuss and align their mental models within an asynchronous context to achieve a shared understanding. An even higher level of shared understanding can be accomplished when using the full extent of our concepts. We conclude that the concepts described in this paper fulfill our goal. In this way, we developed suitable concepts to support stakeholders in achieving a shared understanding in asynchronous communication contexts.

11 Conclusion

A shared understanding between stakeholders is vital for successful software projects. The discussion of vision videos presents one possible way to achieve such a shared understanding, even in asynchronous settings. However, these discussions depend on asynchronous communication methods. In this paper, we presented support for the achievement of a shared understanding between stakeholders in asynchronous communication contexts. To create suited support methods, we first collected common challenges of asynchronous communication from existing literature. Then, we conceptualized ideas to minimize the impact of these challenges. To test the suitability of our concepts, we developed a prototype called ViViPAD. In addition, we also considered different categories of existing communication tools. From these categories, YouTube, Discord and Confluence were picked as three representatives to include in a user study. We adapted these tools to our concepts as closely as possible. In an experiment with 30 participants, we compared the adapted tools and ViViPAD to a control group and to each other. Participants of the treatment groups were able to achieve heightened levels of shared understanding. The group supported by our adaptation of Discord and the one supported by ViViPAD achieved especially large increases. Therefore, the results of our study are evidence for the suitability of our concepts for supporting shard understanding in asynchronous communication contexts. This way, we created an opportunity for stakeholder groups who cannot meet for synchronous discussions to achieve a shared understanding.

In future research, we plan to increase the sample size of our study to obtain more reliable results. We also plan to evaluate our concepts in isolation and to compare our results to the shared understanding created in a synchronous meeting. For the concepts Requirements Engineers as Facilitators and Message Frames, we seek to investigate how requirements engineers can be supported while performing the associated tasks. For example, one opportunity for the concept of Message Frames could be the use of generative AI to create the summaries. Through such research, the effort required to implement these concepts can be reduced. Furthermore, the PFNets spreadsheet could be extended with terms relating to the topics discussed by the groups and we plan on including new concepts that focus on lessening the impact of cultural differences.

The findings of this paper indicate the potential of our concepts. Further research efforts might lead to a definitive tool supporting the achievement of a shared understanding among stakeholders in asynchronous settings.