There are reasons to consider virtual reality (VR) as a newly arrived communication medium that ought to be differentiated from all other forms of mediated communication, since it is the first and only medium with the potential to enable incorporation of the full spectrum of both verbal and non-verbal cues. The present paper is part of a broader scheme in investigating potential differentiations in interpersonal communication between the physical world and VR. Our experimental design builds upon the existing knowledge base of forced compliance experiments; the set-up involved a comparative study of two groups (N = 46) performing tasks under the authoritative influence of a researcher who applied persuasion techniques. Results indicate that VR-mediated communication is as intricate as face to face, since subjects were equally or more compliant, with the nature of information exchanged (e.g. fact-based, morality-based, etc.) being a contributing factor, whilst exemplifying under-development and future applications of VR collaborative environments.
Virtual reality (VR) refers to a process of mental transcendence into synthetic, three-dimensional (3D) virtual environments (VEs) with the use of immersive technologies (Ellis 1994; Zhao 2009). Even though the first steps toward developing said technologies were taken in the mid-1960s through the pioneering work of Ivan Sutherland (Sutherland 1968), it took almost five decades for VR to reach its turning point and a long-awaited commercial breach. In 2014, the first affordable head-mounted displays (HMDs) established the rise of a new industry, and within a few years VR has proven to be indeed an industrious concept, currently overwhelming relating markets. Its growth is expected to keep rising.Footnote 1 Naturally, complementary technological developments have been accelerating accordingly to provide users, not just with state-of-the-art solutions, but more importantly, ones that are easily integrable within the mainstream culture of social media, networking and smart devices. Therefore, it comes as no surprise that, despite its infancy, VR has already been adjusted to allow remote synchronous interaction through web-based VEs, solidifying the emergence of social VR (Dzardanova et al. 2018b).
The concept of VEs facilitating communication has been suggested and analysed from the 1990s (Biocca 1992; Biocca and Levy 1995a, b; Ellis 1991) since there are certain characteristics of the medium that deem it suitable for mediated interaction. VR oftentimes comes with practical challenges in regard to technological feasibility, but aside the overall progression rate of VR in recent years, the otherwise slow-burning process of its development as a communication medium may now gain rapid momentum. The Covid-19 pandemic and the importance of social distancing are pushing several industries to explore face-to-face (FtF) alternatives that enable remote communication and collaboration. Such developments also generate the need for analytical and research-based dissemination of the VR-mediated communication process, especially vis-à-vis FtF.
This requires the breaking down of VR-mediated communication in components similar to those proposed in the present study, namely types and topics of interpersonal interaction, relational dynamics, medium-induced renegotiation of self and other, level of distortion of cues due to the medium’s involvement, deeper examination and effects of locomotive freedoms and interface interactivity, and many more. The first objective of the present study is to explore VR’s potential for mediated communication whilst maintaining a pragmatic approach to current technological limitations. The second objective is to unravel experiential differentiations between FtF and VR-mediated communication.
One of the key aspects that differentiates FtF from any other communication medium is the fact that it encompasses all possible non-verbal signals, which as Social Psychology and Linguistics exemplify make up for the biggest percentage of any give interaction (Birdwhistell 1970; Feldman et al. 1991; Matsumoto et al. 2012; Mehrabian 1972). Non-verbal cues not only supersede, clarify, complement or enhance the meaning of verbal communication, but also provide indications of social status and types of interpersonal relationship (Argyle 1973; Hargie 1997; Patterson 1991). A number of those non-verbal cues (e.g. surrounding space, proxemics, kinesics, etc.) can be de facto provided by most VR experiences whilst technical solutions for supporting others, especially facial expressions, are currently under development. However, it has been fairly established that virtual experiences converse with cognitive processes to overtake direct stimuli and/or empirical understandings of the physical world (PW), thus causing embodied experiences (Kilteni et al. 2012; Slater et al. 2010) and affective responses (Dzardanova et al. 2017) within a synthetic realm. It is therefore safe to hypothesize that, as a medium of communication and despite its technical limitations in regard to some non-verbal cues, VR may still, not just format messages and information—as computer-mediated communication (CMC) would—but, actively interfere during the primary cognitive construction and interpretation of said messages and information.
Therefore, a requirement arises for examining to what extent and in what ways the synthetic environment may alter interlocutors’ overall stance or management of the communicative process. For instance, would individuals be more or less submissive? More or less prone to obedience? More or less authoritative? Establishing a blueprint of such interferences and other VR-induced affects would lead the way for a closer examination of the underlying causes for any cognitive and behavioural fluctuation in comparison with the PW during the communicative process.
To this end, we have conducted a between-groups comparative experiment—once in the PW, meaning FtF, and once in a shared virtual setting—examining differentiations in the level of subject compliance to an authority figure. The experimental scenario has been formulated based on the existing knowledge base of subject compliance, specifically based on examples of forced compliance experiments; thus, a researcher monitored experimental sessions and applied a series of persuasive and soft tactics techniques. In total, 46 participants (23 per group) took part in reviewing and answering a number of questions/tests which were separated into three question sets (QS): factual, moral and sensorial.
The above distinction strictly directed each type of interaction to a process of reasoning (factual), emotional engagement (moral) and stimuli processing (sensorial), respectively, whilst also allowing the researcher, who was acting as the authority figure, to apply tailored persuasion techniques and effectively monitor and control the interaction. For all questions presented, we measured level of compliance (effectively, the number of responses altered under the researcher’s influence). Our research hypothesis has been that the PW group would exhibit higher level of compliance over the VR group, particularly since the latter comes with a number of limitations (e.g. limited eye contact, lack of touch, etc.). Results indicate that VR-mediated communication is as intricate as FtF, since subjects were equally or more compliant, with the nature of information exchanged (e.g. fact-based, morality-based, etc.) being a contributing factor, whilst exemplifying under-development and future applications of VR collaborative environments.
The remainder of the paper is structured as follows: Sect. 2 examines FtF components that can be simulated during VR-mediated interaction, with particular emphasis to non-verbal communication. It also presents the emergence of social VR as the second generation of social networking, and potential practical application of VR-mediated communication for a number of fields and industries. Section 3 introduces key concepts relating to relational dynamics and interpersonal communication. Our study’s experimental design and methodology are presented in Sect. 4, and the experiment’s technical set-up for the VR sessions is detailed in Sect. 5. Results are presented in Sect. 6, and main findings are discussed in Sect. 7. Finally, Sect. 8 concludes the study and suggests future research directions on the topic.
The rise of VR-mediated communication
Non-verbal communication comprises: (a) the surrounding space (as social context), (b) the characteristics of the communicator (physiology, clothing, etc.), as well as (c) her body and its kinesics (locomotion, posture, orientation, gestures, proxemics, facial expressions, eye contact, eye gaze and sound) (Knapp et al. 2013). Non-verbal cues not only supersede, clarify, complement or enhance the meaning of verbal communication, but also provide indications of social status and types of interpersonal relationship (Argyle 1973; Feldman et al. 1991; Hargie 1997). It is worth elaborating at least on the most evident traits that confirm VR’s ability for non-verbal cue inclusion.
More specifically, it is today possible to allow at least two, remotely based individuals, to engage in synchronous interaction in a shared VE (Kasapakis et al. 2018a, b, c). They may be provided with full-body motion support which allows accurate tracking of the body’s position, orientation and locomotion in real time (Roth et al. 2019, 2017; Roth et al. 2016a, b; Spanlang et al. 2014). These data may be solved onto their avatars, which in turn are highly customizable, therefore providing interlocutors either with real-life representations of themselves, for example through photogrammetry (Waltemate et al. 2018), or with realistically looking 3D human models bearing any characteristic found in the PW, including false height perspectives (Banakou et al. 2013). It is also possible to provide interlocutors with accurate, real-time hand and finger tracking, which is somewhat distinctive from body tracking. Eye-tracking technology can now be adjusted upon the HMD, therefore solving users’ real-time eye-gaze direction onto their avatars. Of course, intercommunication is available. On top of these solutions, the VEs can too reach extreme levels of realism, with detailed 3D modelling and photorealism. Absolutely any type of environment can be simulated in VR. Finally, interlocutors can interact with and manipulate real objects which are virtually represented and tracked in real time through mixed reality (MR) solutions (Kasapakis et al. 2018a). The complete combination of these solutions has been under examination for some time (Roth et al. 2015) since it is a challenging task, yet technologically feasible.
Based on the above, there are only two things currently missing from VR-mediated communication to regard it as equivalent to FtF communication with respect to transmission of non-verbal signals:
tracking of facial expressions, a solution already in existence but challenging to integrate because HMDs conceal part of the face and in fact limit facial movements, and
synchronous manipulation of the same real object by both interlocutors since that is physically impossible. An affordance that in itself is actually not a non-verbal sign, it may however be used for the expression of other signals (e.g. grabbing an object from someone’s hand).
Facial expressions are extremely important for non-verbal communication, to say the least; therefore, their absence is not taken lightly. Facial-tracking solutions, compatible with HMDs,Footnote 2 or development of HMDs that come with both facial and eye tracking, are already being developed and prototypes can be found and obtained.Footnote 3
In 2014, we witnessed the long-awaited commercial breach of HMDs. By taking a closer look at the timeline of events, it is almost astounding how quickly the industry exploited this fresh out-of-the-box set of equipment and proceeded tapping into an established market: social networking (Dzardanova et al. 2018b). In fact, the first known attempt in immersive social networking chronologically almost coincides with the contemporary HMD release initiated by Oculus’s Kickstarter campaign and its DK1 model (2012). The s/w company AltSpaceFootnote 4 began developing its homonymous platform in 2013. Starship, a VR/AR (augmented reality) innovation company, launched vTimeFootnote 5 in December 2015 (Dzardanova et al. 2018b). In April 2017, Facebook introduced Facebook Spaces (succeeded by Facebook Horizon in 2019Footnote 6) and essentially solidified social VR, since, unlike other platforms, it is structured upon users’ existing accounts, uploaded media and network of friends (Dzardanova et al. 2018b). Other social VR platforms include Oculus Rooms, VRChat and Sansar, developed by Linden Lab, the company best known as the creator of Second Life (Dzardanova et al. 2018b).
Social VR is a “combination of web-based social networking and an egocentric flow of information, a distinctive trait of both immersive VR and users’ profile structure in social media. It could be regarded as the second generation of social networking, incorporating immersive technologies, which allow users to engage in synchronous, interpersonal interaction with friends or strangers, in pre-designed, web-based, 3D VEs” (Dzardanova et al. 2018b). “Basic functionalities of social VR platforms entail use of immersive technologies such as HMDs and data gloves, avatars, and virtual worlds where users can interact amongst each other in a variety of ways; for instance, AltSpace allows private messaging” (Dzardanova et al. 2018b). Some platforms support playing games with one another, attending social events and, in general, participating in a variety of shared activities (Jonas et al. 2019; Tanenbaum et al. 2020).
Based on the above, and considering the gravitas of non-verbal cues, what would our findings be, if we were to filter all probable interactions through a synthetic space, where bodies, objects and environments can be simulated, transformed, deformed and so forth? Out of all the communication mediums, VR is by far the most mind bending, as it interplays with the cognitive and communication processes in ways that we do not yet grasp, because it does something that none of the other mediums can do at all or as successfully; it overtakes all other data and directly converses with cognitive processes, even at times when those are simultaneously affected by the physical reality or undeniable truths (e.g. one’s natural physiology; see, for example, Banakou et al. 2013; Banakou and Slater 2014; Kilteni et al. 2013; Osimo et al. 2015; Yee et al. 2009).
VR as a standardized communication medium is not yet showing signs of mass appeal, so it is reasonable to reflect on whether it ever will, as well as on whether studying it as such is of any importance. The chances of VR being used as a day-to-day personal communication medium depend on a multitude of factors that have less to do with usability and more with social shifts, commerciality, marketing, hype and so forth. For instance, arising public-health concerns, due to the Covid-19 pandemic, have already triggered a search for FtF alternatives to limit large-group gatherings (Avdiu and Nayyar 2020; Carrillo and Flores 2020; Mantovani et al. 2020; Riva et al. 2020). VR-mediated communication addresses public safety issues, but can also potentially be more engaging, especially for e-learning and collaborative applications, by providing to individuals an array of FtF benefits. VR is also the only medium with the ability to generate creative platforms that mimic “hands on" functions and thus allow collaborative, interactive and responsive design, even with MR solutions. For several fields and industries, VR-mediated communication is an upgrade that provides a new niche for modelling, engineering, simulation and collaboration.
Some more specific examples are architecture and engineering as fields which would largely benefit from VR-mediated interaction; parties could review and discuss designs and mechanics that are virtually, and thus visually, represented, but more importantly allow gestures such as pointing, on-the-spot virtual manipulation, and, of course, locomotion through spaces which provide realistic grasp of size, distance, placement and lighting. Additionally, any field investing in simulations would also benefit from, either individual or collaborative, full-body motion tracking solutions which raise the standards of execution but have none of the associated risks. Fields such as medicine and physiotherapy, which in actuality already come with both intense research (Slater and Sanchez-Vives 2016; Cipresso et al. 2018) and practical examples of, for example, VR-simulated surgeries (Basdogan et al. 2007; Larsen et al. 2009), could additionally exploit VR-mediated environments for collaborative medical diagnosis and remote treatment.
Training-oriented military simulation is a well-known field where VR-enabled technologies have received a fair share of funding over the years, especially for flight simulation (Lee 2017). Likewise, civil protection and urban development can be highly benefited, since they come with a long list of scenarios which require collaboration between multiple parties, crowd and space management, site scanning or modelling, as well as practical training, simulations and probability tests. Education is another field that can greatly expand in a variety of ways by incorporating VR-mediated teaching. In fact, education has already heavily invested in online classrooms and transitioning into collaborative VEs could not only empower existing curriculums but re-envision the learning process—another field that has been intensively researched (Baka et al. 2018).
A final example is the emergence of companies that will specifically provide this type of VR-based services (in the same manner that all medium and web-based platforms, and newly social VR platforms, allow communication and networking for individual users) and may even be employedFootnote 7 as integrated systems by variant departments (e.g. human resources, sales, etc.), to substitute all activities currently conducted through teleconferencing (e.g. interviews, meetings, webinars, product presentations, etc.).
All of these examples come with interpersonal interactions that should be presumed to, in turn, come with issues of power relations just as they would in the PW. Whether an authoritative figure is present or not, any collaborative environment will reveal relational dynamics. Hitherto, it has been fairly established that VR-mediated interpersonal interactions should not be dismissed as yet another example of CMC; however, they cannot be assumed to be a direct transference of FtF either.
The study presented here is part of broader scheme into unravelling differentiations between the PW and VR, with emphasis on cognitive and psychosocial parameters of interpersonal communication. There are few paradigms of interpersonal interaction in commercial immersive VR, coming mostly from social VR applications as those are presented above, and out of those existing solutions and platforms none provide full-body motion support. On the other hand, there are numerous studies in CMC, and a significant part amongst those focuses on power relations, especially in regard to authority and compliance in collaborative environments (Spears and Lea 1994). Given the fact that those are well-researched, extensive subfields of interpersonal communication in both FtF and mediated communication, they may be considered as a significant subject of investigation for VR-mediated communication as well.
Forced compliance, cognitive dissonance, authority and obedience
Spreading over several decades, there is a vast amount of literature regarding compliance and obedience to authority, with some experiments producing their results through extreme scenarios (Hofling et al. 1966; Milgram 1974; Milgram and Gudehus 1978). Those may be fascinating in their own accord, however, our day to day conformity to social cues, norms, authoritative figures, and so on, is far more subtle. We do indeed follow through with our natural tendencies for liking, reciprocation, social proof or actions based on judgemental heuristics (Cialdini 2009), yet those are heavily dictated by the underlying social relationship between parties and the experimentally eluding context of each separate interaction. Or, simply put, people communicate in the most utilitarian way possible for their individual well-being. It goes without saying that they are often mistaken, or they are guided by certain ethical or conceptual hierarchies that do not apply for other individuals. For example, worrying of one’s public image amongst certain peers more than doing the objectively moral thing could result in immoral choices even when they are recognized as such. The individual utility would lie in the relational dynamics if, for instance, maintaining peer approval, let alone authority approval, provides other types of benefits (e.g. employment, money, status, support, friendship, and so forth), which are deemed greater or more urgent than peace of mind.
Countless examples can be found historically or through field research, however, researchers have also attempted to generate situations in controlled environments, as is the case in Milgram’s study (Milgram 1974; Milgram and Gudehus 1978). Forced compliance is a term that refers to such experiments, typically conducted under social psychology. It is a type of experiments in which subjects are induced to perform a counter attitudinal behaviour, which will have them express attitudes that are in better accordance with that behaviour (Paulhus 1982). This attitudinal shift was first documented and associated with Festinger’s cognitive dissonance theory (Festinger 1962), according to which individuals cannot tolerate inconsistencies between their beliefs/attitudes and their behaviour; therefore, they will always look for ways to either eliminate or minimize dissonance. Several complementary theories have been proposed; for example, self-presentation theories state that attitudinal shifts are individuals’ attempts to present themselves as favorably as possible (Paulhus 1982), which implies that cognitive dissonance may be more tolerable privately (Cooper 2007).
When it comes to authority, there are variant incentives for compliance or obedience, which are not one and the same. As Cialdini and Goldstein explain (2004), there is differentiation between authority based on expertise and authority derived from one’s relative position in a hierarchy; the first would cause mere compliance, whereas the latter would cause obedience. In most cases of forced compliance, researchers would rely on their presumed—for subjects—expertise and therefore strive for compliance. Part of that presumption would be a well-known social construction according to which "scientists know better". Another part, however, is dependent on strategies which employ expert power, rather than hierarchy-based power, based on a class called soft tactics (Cialdini and Goldstein 2004).
Soft tactics describe a set of techniques employed by the influencer, who attempts to elicit compliance based on traits of integrity, credibility or charisma (Nahai 2012). These could be inspirational in nature and include use of rational persuasion and personal or inspirational appeals (Nahai 2012). Cialdini and Goldstein provide great insight into the underlying reasons for which individuals tend to comply, some of those being, existing social norms, desire to affiliate with others through liking, need to maintain a positive self-concept and so forth (Cialdini and Goldstein 2004). Finally, Harjunen et al. (2018) reference a small number of studies that prove the influence of non-verbal cues (e.g. smiling, touching, etc.) on interlocutors’ decision-making and mention two cases that study how these real-world findings are confirmed to also occur in online settings (Haans et al. 2014; Mussel et al. 2013).
The above information has been studied in-depth for the design of an experiment on the basis of forced compliance methods. It is, however, important to stress that our study is not concerned with the magnitude or even the emergence of, for instance, cognitive dissonance in subjects, since the experiment does not aim to cross-check attitudinal shifts or level of compliance. Rather, and upon reviewing relating literature, we wished to ensure that the authoritative figure introduced (i.e. the researcher) would apply appropriate techniques in a consistent manner between two groups of participants. The experimental specifics presented in following section discuss some of the techniques used during experimental sessions.
Related experimental studies
The line of research that mostly relates to the present study comprises experimental studies that juxtapose behaviours pertaining to relational dynamics in the PW and VR. To the best of our knowledge, no other study introduces an authority figure, explores issues of compliance or obedience, or in general reviews relational dynamics between simultaneously immersed interlocutors who share a VE and engage in VR-mediated remote synchronous interaction whilst provided with full-body motion support and real-object interaction via MR solutions.
There are a significant number of studies that use juxtaposition between the two settings to address other research questions. This type of comparative studies regards the virtual setting as the differentiating variable and thus examines possible fluctuations against same tasks and/or behaviours that occur in the PW. For example, there are studies examining users’ spatial perception (Arthur et al. 1997; Grechkin et al. 2010; Kort et al. 2003; Witmer and Sadowski Jr 1998) and differentiations between the PW and the VR experience (Kuliga et al. 2015) in view of exploiting VR for architectural and psychological research.
Apart from comparative studies, there are also a substantial number of experiments exploring all kinds of aspects relevant to communication in general. For example, a study conducted examining social anxiety during public speaking, revealed that the virtual audience to which participants were exposed may cause similar emotional responses in the PW (Pertaub et al. 2001). Roth et al. (2018) examined augmentation of social behaviours in multi-user VR applications, with particular emphasis to eye contact, joint attention, and grouping, and their results indicate that these parameters can impact social presence behaviour. Another experiment relied on pre-existing social norms between interlocutors was conducted by Dzardanova et al., (2017, 2018a) reviewing feelings of embarrassment and discomfort when the participants’ avatar is unexpectedly left naked in the public space of a virtual clothing store. Three groups of participants experienced three variant conditions: being alone, in the presence of an NPC salesman, and a researcher who was controlling the salesman avatar via motion-capture. The results indicated that the presence of a second character may influence users’ behavioural choices and emotional state in a manner similar to that of the PW. This study, along with the one by Pertaub et al. (2001) and Roth et al. (2018), explored experimental conditions but none of them involve a PW-based control group for juxtaposition between two settings, whilst subject locomotion is not solved onto their avatars by utilization of full-body motion tracking.
In regard to free locomotion, the present study largely differentiates from similar experimental scenarios with respect to the technical aspects of the experimental set-up. There are many technical options for implementing virtual experiences and even more so in regard to full-body motion support, which is a key aspect when studying interpersonal communication and transferability of non-verbal cues. Kasapakis et al. (2018a, b, c) suggested that relevant studies should be examined with caution in regard to their stated technical implements, because, for instance, the VR equipment used in the experimental set-up might be outdated, substantially different, or simply not validate any in-study claims of full-body motion support. Therefore, we have excluded from our related research overview studies that do not make use of Immersive VR and instead rely on other type of display systems.
Nonetheless, as an example, it is worth mentioning the study conducted by Slater et al. (2006) which directly reviews issues of subject obedience by replicating the Milgram experiment. Their set-up relies on a CAVE system that did not provide options of subject intercommunication as exemplified in the present study. It is confirmed, however, by the authors, that participants who had visual and auditory feedback of the virtual human that was being shocked—as per the Milgram experiment—showed psychological tendencies as if an actual individual was being tormented. In another example, the study conducted by Kyrlitsias and Michael-Grigoriou (2018) replicates the Asch conformity experiment (Asch and Guetzkow 1951). However, their experimental set-up involves participants interacting with virtual agents and not a real person, whilst the technological set-up again does not provide participants with full-body locomotive freedom. Their study confirms that participants demonstrate signs of conformity in the presence of virtual agents. Likewise, the study conducted by Harjunen et al., (2018) confirms that participants were persuaded to accept unfair economic offers after being exposed to manipulative haptic and facial cues coming from a virtual agent. These studies too do not juxtapose their findings to PW replicated experimental set-ups.
Therefore, individual aspects of our experimental scenario and set-up can certainly be associated with a great number of past studies. However, the specific methods and materials used, which are also directly correlated to the research’s objectives and the experimental scenario presented, do not seem to have been executed elsewhere in a similar manner.
Experimental design and methodology
Drawing inspiration from studies conducted under social psychology, we designed an experiment that was executed once in the PW (23 participants, 12 F), hereafter physical world group (PWG), and then replicated in a virtual setting with a different group of participants (23, 11 F), hereafter virtual reality group (VRG). Our research objective was the discovery of possible differentiations of level of compliance between the two groups. Both groups met with the researcher, hereafter R1, who acted as the authority figure and was responsible for the conduction of individual, one-on-one sessions. The PWG sessions were of course conducted FtF; however, the VRG subjects never met R1 in a FtF setting, since they were remotely located. Instead, they were greeted and briefed by another researcher, hereafter R2, who was responsible for the VR equipment, ensuring subject comfort and safety, and who was also monitoring the technical aspects of the VRG sessions. Therefore, R1 and R2 greeted and briefed the PWG and the VRG, respectively.
The overall task for both groups was to submit one of two possible answers (true/false, yes/no, A/B) for 35 questions in total presented through an interactive display. The 35 questions were divided into three sets (QS),Footnote 8 consisting of 20 (QS1), 5 (QS2) and 10 (QS3) questions. Each set addressed different aspects of cognitive processing (knowledge/reasoning, emotion/empathy, and sensorial input, respectively).
The experiment had been advertised around the university campus. No prerequisites were set, and no script related specifics were made public. The final sample consisted of 46 volunteers (23 females, 27 years old avg.), consisting mostly of undergraduate and post-graduate students, and a smaller number of unrelated to the institution individuals. The vast majority of participants had no previous experience with VR technology and also, no social or professional association with the research team or the laboratories where the experiments were executed. The research team’s institution does not require ethical approval for experiments with human participants who are of legal age; however, participants were still required to sign a consent form ensuring their anonymity but also stating that they understand the implications of their participation and they release all relevant data to the research team for academic, research and educational purposes. Finally, subjects did not receive payment or credit for their participation. Our research has been supported by public funding, yet there is no conflict of interest as no specific parties are directly benefited from it in an exploitative manner.
As soon as participants, of both the PWG and the VRG, arrived for their individual sessions, they were briefed (by R1 and R2, respectively) on the objective of the experiment, which, supposedly, was an evaluation of cognitive processing speed when subjects are presented with information in variant forms (text, image, speech and their own mind’s projections). They were then left by themselves to review only the questions of QS1 and QS2, and, on their own pace, submit their answers privately onto the interactive display. Once they were done, R1 directly printed a sheet of those answers in the PWG sessions or waited for R2 to send them for the VRG sessions. During one-on-one sessions, R1 would be consulting subjects’ initial answers by looking at their individual printed sheets.
Participants were falsely led to believe that the program into which they had previously submitted their initial answers would calculate their processing time against whether their responses were false or correct to produce the top 10 QS1 questions that they would have to review again. It was explained to them that not all 10 questions that would re-appear during the one-on-one sessions should be assumed to have been wrongly answered. Therefore, during the official experimental session, they were required to re-examine and re-submit answers for those 10 questions; this time, each question would be accompanied by imagery and verbal information provided by the research to assist, supposedly, the re-evaluation and submission of final answers.
Of course, for the actual objective of the experiment none of this was true or mattered. In other words, participants of both the PWG and the VRG were required to re-submit 10 out of 20 questions of QS1, thinking those were individually selected based on their initial performance, but the exact same 10 questions were programmed to re-appear for all participants. It should be noted that participants were not instructed to answer as fast as possible, since the program was supposedly calculating their individual average processing time and not in-group performance in comparison with other participants. However, the time factor was used as a manipulative technique and allowed R1 to apply pressure or cause mild performance anxiety if participants tended to overthink their answers during one-on-one sessions.
For the QS2 participants were aware that all 5 questions would re-appear for evaluation, with length of processing time, supposedly, still being a factor. During the resubmission of the one-on-one sessions, they were informed that they had to provide a justification for their initial response, which would then be followed by additional information and, upon reflecting on said information they had to decide whether they would alter their initial choice. Finally, subjects addressed QS3 for the first time during the one-on-one sessions.
Question set #1–20 questions (10 analysed)
QS1 consisted of what we labelled as 20 factual questions, since participants had to decide whether variant statements were true or false, or they had to select between two options for questions based on general knowledge and trivia relating to nature, physics and so on. Each question had a correct answer; however, they were all carefully selected/phrased to cause doubt or confusion. For example, one question asked participants whether clouds have weight; another stated that there are no penguins residing on the North Pole and another that the volume of the average human male brain is bigger than the one of the average female. Since those were random trivia, we successfully predicted that the majority of participants would either not know most of the answers or could be easily made to doubt their knowledge or presumption on each subject; however, no expertise was required to have an idea regarding the probable answer of those subjects, since they were more or less elementary. Out of the 20 factual questions subjects had to re-examine 10 of them, which had been randomly pre-selected and were the same for all participants in both groups. Only the 10 re-submitted answers were included in the final statistical analysis.
Question set #2–5 questions (10 analysed)
Questions of QS2 were brief narratives presenting a social or moral dilemma. For example, in one of the questions, participants were required to place themselves in the position of a university professor who has to decide whether to fail or pass—with a perfect mark—two students who submitted the exact same paper. According to the narrative, after meeting a few times with the students, the professor knows definitively that one of them stole the other one’s paper; therefore, amongst them there is a cheat and hard-working individual who deserves a perfect grade. If they both fail the course, a hard-working individual has been punished and if they both pass, a cheat has been rewarded and managed to get away with a perfect grade. Participants were led to believe that there is a correct answer for each of the questions of QS2, which of course is not true, since this type of dilemmas are part of mind-bending philosophical inquiries on the topic of morality and decision-based consequence. Due to the nature of those questions, being dilemmas with a greater amount of information, in combination with the fact that participant answers supposedly reflect their own beliefs and experiences, we apply a weight coefficient to the second set of questions. Variables produced from QS2 are multiplied by 2, therefore normalizing values which can then be compared to QS1 and QS3.
Question set #3–10 questions (10 analysed)
Finally, QS3 questions were based on sensorial information and participants addressed these questions only once during official experiment sessions. Allowing participants to review these exercises prior to the one-on-one sessions would be problematic, since those were carefully designed to ensure probability of detecting the correct answer quickly; thus for QS3 multitude of wrong answers is considered, rather than altered responses. Specifically, the first 6 out of 10 questions required participants to lift real objects (based on MR solutions for the VRG) and determine which was heavier/lighter, calculate by sight the distance between objects and determine which one was farther/closer positioned, and compare sounds to establish which one was louder/quieter. The last 4 questions were pictures of animals and human portraits requiring of participants to determine the depicted emotional state (e.g. picture of chimpanzees: are they playing or fighting? portrait of a woman: is she sad or determined?). All questions of QS3 had a correct answer which the majority of participants should detect easily enough as differentiations regarding weight, distance, etc., were based on the JND (just-noticeable-difference) formula (see, for example, Mills 1960; Wu et al. 2019). R1 applied manipulative techniques (presented below) or provided confusing information to ensure multitude of false responses. The following section presents in more detail the processes followed.
In summary, subjects were presented with 35 questions in total. Out of those, we subtract 10 factual questions that never re-appear, and we are left with 25. Since resulting variables of QS2 are multiplied by 2, we are normalizing values as if that set comprised of 10 questions as well. Therefore, 30 questions in total should be assumed for the statistical analysis presented in a later section. The activity diagrams of the experimental protocols corresponding to PWG and VRG are illustrated in Fig. 1.
R1 was responsible for the one-on-one sessions of both groups, which took place over a number of days but grouped together. Therefore, all one-on-one sessions of PWG took place back to back over the course of a week and the same applied for the VRG sessions, held on a different week later on.
Sessions themselves were fairly the same in regard of context and experiment-related object placement. Some differences had to do with inability of recreating the exact same environments, due to time restrictions and limited availability of spaces for the conduction of the experiment on both real locations. In addition, for the creation of the VE it would have been extremely time-consuming to recreate the real setting of the PW. We deemed that the overall environment where briefings and experimental sessions took place were not a defining factor as those were rather neutral in nature and typical university locations and laboratories. For the VRG, we kept the virtual room stimulating enough but with, overall, little visual information (e.g. there is a window with a view but very few objects and decorations).
Another difference, as already mentioned, had to do with briefing prior to one-on-one sessions. In short, the PWG meets and interacts with R1 before the official sessions, which is not the case for the VRG. When meeting someone for the first time and immediately start off the acquaintance by providing instructions, there is a certain control of the interaction that may enhance the gravitas of R1, compared to meeting participants directly in the virtual setting. We shall return to this point. Finally, during the PWG sessions, R1 would hold a notepad with the printed sheet placed on top and take notes not visible to participants. R1 would also consult the sheet by taking glances under the HMD during the VRG sessions. The notepad served a dual purpose. First of all, R1 did in fact had to consult what their initial answers were, but for the PWG it would also be the equivalent of a lab coat. Undisclosed notes, taken in reference to participant statements or choices, could induce a sense of uncertainty or anxiety. The notepad was not recreated virtually. This point, in conjunction with the briefing, will too be addressed in a later section of the paper.
Once participants submitted their initial answers for QS1 and QS2, and R1 was provided with the printed sheet, the PWG would follow R1 to a secluded area with a similar interactive display and the VRG would follow R2 to the motion capture equipped room where they would put on the VR equipment. The VRG was aware that another researcher would be reviewing their answers with them within the VE and through an avatar. For the VRG sessions, R1 was placing the VR equipment in a remote motion capture equipped location, simultaneously with the subjects and was ready for the sessions before they entered the virtual space. During both the PWG and VRG sessions, R1 would stand next to the subjects, across the interactive display and then the one-on-one sessions would begin.
Each session lasted 30 to 45 min, since some subjects would take more time to reflect on the process and the questions. As questions of QS1 and QS2 would appear on the screen, R1 would first consult each participant’s sheet, therefore checking their initial answer. From that point on, R1 would, on the spot, decide what sort of techniques suited each participant. Depending on how the interaction with each participant progressed, in both groups, R1 would settle into techniques that worked better, therefore techniques that pushed participants toward compliance by causing doubts that had them alter their initially submitted responses; or, alternatively, R1 would abolish techniques to which participants were not responding as desired, to avoid being exposed for having an agenda.
In summary, R1 would interrupt the process intermittently, but at random for each participant and depending on how each interaction progressed, to require justification for their answers, ask for clarifications or raise subtle concerns regarding selected answers (e.g. by saying for instance “Are you sure?”, by frowning or raising the eyebrows in surprise and so forth). Persuasion techniques included eye contact or lack off, touching, pointing, non-verbal cues of disapproval (e.g. with a smirk or a light sigh), or pleasantries with over the top smiling, joking, nodding, etc. The majority of these signals were subtle, relying heavily on non-verbal cues rather than direct verbal comments. However, verbally complementing their performance was crucial upon completion of QS1, if they had in fact complied enough (e.g. how many out of the 10 initial answers did they alter?). Therefore, if participants had complied with the researcher’s intentions and directed their answers accordingly, they were met with both verbal and non-verbal praise (e.g. “This is going great!”). If the opposite was true, R1 would choose to appear concerned or not very pleased, but in general would avoid negative verbal remarks. For the VRG, where non-verbal signals were limited, R1 would rely on sighs or awkward pauses, and, in general, a shift in voice tone and body movement opting either for excitement or disappointment.
The "vibe" of QS1 would in reality affect R1’s approach for QS2, since presumption of expertise is easier on factual information, whereas moral topics rely on core beliefs for which experience, rather than expertise, is a far more affective factor. The process for QS2 would be, for the majority of the cases, the presentation of an important counterargument, of which participants were aware, meaning that they already knew from the briefing sessions that they would have to further justify their answers after being provided with additional information. What they did not know is that that information, when R1 would choose to provide it, was always meant to weaken their initial justification. For example, in one of the QS2 questions, participants have found out who has been stealing money from their workplace’s register, an issue that has caused great tension and problems amongst co-workers. They know for a fact, however, that that person is in dire need for that money due to health expenses. Participants must choose whether to tell on that person or not. Whatever their initial response to that question, R1 would twist the argument and either raise issues of fairness toward all individuals involved in the scenario, or the probable guilt of having another person’s fate in your hands.
During QS2, again, techniques were applied intermittently; therefore, sometimes R1 would just praise their initial answer and make no effort of altering their response. This would depend on their overall stance, as any attempt in questioning their answer when it comes from a place of certainty could risk a defensive attitude. Ensuring two altered answers out of five is better than none, so the overall method of manipulation was not to antagonize participant ideas, knowledge or beliefs, but rather induce performance anxiety, confusion or need for approval.
During QS3, where participants had to, for instance, estimate which of two objects was, proximity-wise, closer to them, the researcher would frequently point toward the wrong object in a casual manner, as if simply presenting the objects, stand next to it, or have them rethink their initial answer by again applying the same techniques of non-verbal and occasionally verbal approval or disapproval. The basis of QS3 is that since this is sensorial information, subjects’ own senses should be more trustworthy than their need for approval. As mentioned, full-body motion support and intercommunication allowed the majority of those techniques to be applied during the VRG sessions as well, with the greater shortcoming for VRG being eyebrow/forehead-based facial expressions. Finally, during the QS3 preparation, the JND (just-noticeable-difference) formula was taken into account. JND cannot ensure across the board realization of difference, that is, why weight, proximity and decibel between objects and events were in fact slightly raised above the standard JND. Therefore, at least 50% of subjects—in actuality more since JND is elevated—should be able to tell the difference, and thus the probability of false answers being influenced by R1’s manipulation techniques, is increased.
VR environment technical set-up
Unity 3DFootnote 9 is a game engine which allows the coherent unification of different technologies into a VE, not to mention the design of the environment itself or on the spot manipulation of imported 3D models. Therefore, the common denominator between all equipment, software and elements for the creation of the virtual space where the VRG interacts with R1, whilst provided with simultaneous motion tracking of bodies and objects, is Unity. 3D models of objects were created with Cinema 4D,Footnote 10 whereas Adobe FuseFootnote 11 and the 3D avatar rigging tool, Adobe Mixamo,Footnote 12 allowed the generation of three avatars suitable for full-body motion capture solutions.
Motion capture, on both locations during the VRG sessions, was achieved using the Vicon Motion CaptureFootnote 13 system and Final IK (Inverse Kinematics).Footnote 14 Inverse Kinematics (IK) is currently the most appropriate solution available with respect to workload and intrusion during an MR experience which supports full-body motion (Kasapakis et al. 2018b; Roth et al. 2016a, b). Intercommunication between R1 and participants of VRG was supported by Photon Voice.Footnote 15 Finally, SALSA Lip-SyncFootnote 16 allowed us to apply real-time lip-syncing and automated eye blinking onto the avatars.
Experimental set-up—system architecture
RG: The only requirement for the PWG was the development of an application featured onto an interactive display (see Fig. 2) projecting questions of QS1, QS2 and QS3, along with two buttons containing the possible answers. The application was developed in Unity and was designed to collect and save in a relational database participants’ answer, which were then printed by the researcher, but also demographics (e.g. age, occupation, etc.) and data related to timing.
VRG: Fig. 3a illustrates the final VE where the VRG sessions were conducted. It is important to note that, for the accurate recreation of the QS3 elements, aspects such as physical dimensions of the objects, along with proximity between them, were based on the real objects used and placed during the PWG sessions to attain similar experimental conditions. As mentioned above, the 3D modelling took place in Cinema 4D and then imported into Unity. The interactive display used during the PWG sessions, was virtually recreated in the VE, allowing participants to submit their answers using the same interface and with the same gestures (e.g. by touching the virtually recreated screen).
3D Avatars: Three avatars were created as shown in Fig. 3b. Starting from the left, the first avatar embodied R1, who was female. The male and female 3D models, who functioned as the avatars of male and female participants, respectively, were designed as neutrally as possible, making use of the customization functionalities of Adobe Fuse.
Motion Capture: The most challenging aspect for the VRG sessions was the simultaneous tracking and transference—into the shared VE—of 2 avatars (R1 and participant) and 4 objects (two boxes and two cylinders of QS3). R1 and each participant were on two remote locations, laboratory spaces equipped with the Vicon Motion Capture system. These kinds of systems’ function are based on the optical tracking of retroreflective markers using infrared light. Usually, a large number of such markers are attached onto a full-body suit. However, and as related research confirms (Daniel Roth et al. 2016a, b), motion capture suits are time-consuming as they require individual and fine-tuned calibration. In addition, participants’ or users’ perceived task load is increased, since, being full-body, they can be invasive, disrupting and uncomfortable. Inverse kinematics (IK) represents the most suitable technological option, especially for experimental sessions such as the one presented here, where several participants attend and placing the equipment should be quick, easy and as unnoticeable as possible. Specifically, contrary to the motion capture suit, where markers are located on several spots across the individual’s body, IK uses 5 props in total. Each prop contains a defined number of markers and is placed on the feet, hands and head of the participant. In fact, the head prop is constructed directly upon the HMD, which will be worn anyway, further minimizing attached equipment. IK have the ability to estimate and thus compute the position of the unmarked areas of the body based on where the props are located. Therefore, if hand-props are, proximity-wise, close to the feet-props the subject is bending in some manner and the body form can be computed according to its physical capabilities (e.g. a human body would not fall on itself). Since during experiments such as the one presented here participants are not required to perform any complex physical activity and tend to mostly walk, turn, lean slightly and make basic hand gestures, IK is ideal with little to no error in estimating body poses during tracking and transference. More specifically, regarding the implementation of IK, markers are grouped per location as a prop in Vicon’s software system, Vicon Blade, and its SDK is used to transfer motion onto 3D objects in Unity, which would be the hands, feet and head of the avatar (see Fig. 4). The same process is followed for object tracking (see Fig. 4).
Since Vicon Blade streams the motion capture data online, it was possible to fuse together streams from two different, remotely located, motion captures. Starting from the left, Fig. 5 shows R1 wearing the VR and IK equipment in Location A, a top view of both avatars in the shared VE in front of the interactive display, what the participant is seeing in that moment, and, finally, the participant wearing the VR and IK equipment in Location B.
Figure 6 shows R1’s view perspective through the HMD as she observes the participant who is picking up objects simultaneously in the real world (in Location B), and in the shared VE, therefore implementing MR.
Architecture: The above described solutions were adjusted accordingly for the present study; however, the overall development of the system’s architecture is enabled by SEaMlESS (SharEd Mixed rEality Social Space) (Kasapakis et al. 2018a, b, c). SEaMlESS is a room-scale MR space, which enables remote, multi-user, synchronous, social interaction, and supports free full-body user movement, accurate real-object virtual representation (also with motion support) as well as interaction with both real and virtual objects. In addition, The VR-Ready PCs used featured a GeForce GTX 1060 (3 GB) graphics card, 8 GB of RAM, and an Intel Core i7-8700 CPU, generating 90 Frames per Second (FPS) with the lowest frame-rate during the experimental session to be 85 FPS. Morever, motion-to-photon latency, according to the Oculus Diagnostic Tool, was ~ 21 ms, Vicon motion capture Systems latency was < 10 ms, and network Round-trip time (RTT) was ~ 20 ms. The institutional infrastructure that facilitated the experimental sessions provided a High-Speed Fiber-Optic Internet connection. Those results show that the application was well inside the optimal corridor of latencies for visual feedback (40-70 ms) (Waltemate et al. 2015), further validated by the fact that there was not a single occurrence of cybersickness amongst the 23 participants of the VRG.
Intercommunication: The HMD used in this experiment was an Oculus Rift,Footnote 17 which comes with built-in microphone and headphones. R1’s version was not the same; therefore she was equipped with regular headphones. The real-time transference of verbal sound between R1 and participants was realized through the Photon Voice’s cloud service, whereas Unity allowed its management as 3D sound, meaning that if R1 was standing to the left of a participant, her voice would be enhanced through the left speakerphone, thus localizing sound sources. A very interesting implementation was that of real-time lip-syncing achieved with SALSA Lip-Sync. In simple terms, Photon Voice transfers spoken sounds to SALSA, which then generates lip-syncing accordingly, giving the impression that the avatar is indeed speaking (see Fig. 7). SALSA also provides pre-programmed eye blinking, thus breathing some life into the avatar as shown in Fig. 7.
We conducted a comparative analysis of the frequency of altered responses between PWG and VRG for:
all three questions sets (QS123_PWG vs QS123_VRG), and
for each question set independently (e.g. QS1_PWG vs QS1_VRG).
Each question of QS1, QS2 and QS3, and all contributing variables, is dependent per participant and vertically independent.
The variable values denote the absolute percentage of altered answers per participant for each QS. As previously explained, we consider QS2 to bear higher weight due to providing richer information and requiring far greater processing time. The interaction between R1 and subjects during QS2 is also richer in dialogue and finally the very topic, being moral dilemmas, has greater gravitas and cognitive demands than QS1 and QS3.
Frequency rate of altered answers
A standard frequency test across all categories was performed. The test results indicate a low frequency rate of altered answers in all three QSs (QS123 sum) cumulative for groups PWG and VRG. Specifically, the reviewed questions were 30 (QS2*2) in total; therefore, the mean value of 12.04 and the maximum number of altered answers (23/30) indicates that participants have been influenced by R1 across all categories (see Fig. 8).
However, upon running frequency tests per QS it appears that participants have been mostly influenced in QS1, which were the factual questions, and QS2, the moral dilemmas, as shown in Table 1.
Frequency rates of altered answers per group
The most important aspect would be the between-group comparison as that would indicate differentiations between the two experimental conditions. Figure 9 illustrates mean, median and mode values for each QS separately as well as for the three sets cumulatively (QS123) across PWG and VRG. The first point remark is that the PWG participants have not been influenced more than their VRG peers, and in fact there is an indication of a slight increase for VRG when examining all three sets of questions between the two groups. Values per QS across both categories PWG and VRG) are represented in Fig. 9b. Evidently, although QS1 produces the higher results in regard to multitude of overall altered answers, there are no significant between-group differences. Results for QS2 indicate substantial increase in compliance for the VRG. Finally, some differentiation appears in QS3, yet the sensorial QS also produced the lowest multitude of altered answers. Statistical significance of these values is discussed through the Mann–Whitney test presented below.
Independent samples Mann–Whitney U test
The null hypothesis (H0) is that the median measurement amongst the compared samples associated with the PWG and VRG groups is equal, and it is tested using the Mann–Whitney test (Mann and Whitney 1947). The Mann–Whitney test compares the medians of two groups of ordinal nonparametric data to determine whether they are statistically different. Unlike the t-test, the Mann–Whitney test does not assume normal distribution of data. The results presented in Table 2 suggest that the null hypothesis is rejected only for QS2, i.e. the result yield for the moral questions is statistically significant (p = 0.047 < 0.05). No statistical significance is indicated for QS1 and QS3, nor the combined QS123 samples.
Average duration of sessions between groups
As presented in Table 3, we documented and juxtaposed the average duration for each QS between the PWG and the VRG, presented in Table 3. The first notable indication is the overall session duration between the two groups, which reached or exceeded 20 min on average. In general, some individual sessions in both groups exceeded 30 min, which is of particular importance for virtual settings. Per QS, the VRG participants spent less time during QS1 sessions and more time in QS2 sessions. Noticeably, there appears a substantial between-group differentiation during QS3 sessions, with the PWG spending 3′38″ less on average.
During the conception and design of the experimental set-up, and in relation to the main objectives of the overall research scheme, we firstly juxtaposed VR-mediated communication against FtF through a scenario complex enough to reveal degree of deviation. The chosen QSs were selected to exemplify dominant types of conversation in a literal, and thus simplified, manner. These scenario-based choices inform our understanding of potential real-world application where interactions are naturally more intricate. For example, the field of education often requires an authoritative figure, and the type of information may in variant ratios tangle between factual knowledge and emotional engagement. Whereas relational dynamics in practical fields (e.g. engineering) are less evident, even if an authority figure is present, as interactions are more collaborative, and emphasize sensorial input relating to spatial cognition and subsequent management of spaces and objects.
Finally, interpersonal interactions are unique events that leave little room for quantification and comparison. They are also challenging to control variable-wise and in a non-biased manner. At an exploratory stage, the need for clear indicators is greater than a qualitative analysis, since the former should inform the latter. Therefore, upon review, we concluded that forced compliance could alleviate a considerable number of issues, since the researcher might have better control over the qualitative aspects of the interaction compared to having two distinct participants interacting freely (Cooper 2007). Forced compliance is not only a highly documented method with several examples of experimental set-ups to study and learn from, but also straightforward enough from a researcher’s standpoint in regard to quantifiable data and controllable variables (Joule 1991; Girandola 1997; Renard et al. 2007; Cooper 2007).
PWG versus VRG
Our main research hypothesis was that PWG would "outperform" VRG; meaning, frequency of compliance in PWG would be higher at all points. Results show not only that the VRG easily matched the PWG’s values but were increased for QS2 (statistically significant) and QS3.
A few aspects of FtF, as they occurred for the PWG, could not be reproduced in VRG. The first is the briefing sessions for the PWG, which were conducted by the same researcher who would act as the authoritative figure during one-on-one sessions. The second was her holding a notepad and taking notes intermittently as subjects were submitting their answers. The third and perhaps the most serious handicap of the VR sessions was the inability of transmitting important non-verbal cues, such as intentional eye contact—with alive-looking gaze, since basic eye contact is achieved by default if avatars are looking at one another’s face—and all facial expressions, particularly smiling and frowning at the eyebrows and forehead. These are distinct and intense signals that can convey a number of emotions and reactions. In view of these limitations in the VRG sessions, along with having no previous basis for hypothesizing in favor of VRG, our expectation was that we would explore how closely VRG values could reach PWG.
Therefore, the first, and most important result confirms that there are equal possibilities for compliance in VR-mediated communication as there are in FtF communication, further enhancing the potential of virtual interactions being as rich as FtF, yet still bearing unknown effects of VR as a medium, meaning that we cannot inconsiderately presume that matching variables refer to perfectly replicated effects or even effects of the same nature. The extent of the medium’s involvement is beyond our current understanding. Due to its limitations, especially in non-verbal signalling, individuals should be less compliant. Therefore, in parts there are indications that VR-mediated communication carries aspects of FtF, making it as close of a simulation as possible, compared to any other medium, and at the same time, the medium itself interplays with the communication process due to attributes that require in-depth investigation under CMC. Based on our results, we consider that, first of all, a medium that is experienced as a novelty may cause some form of confusion and/or hinder confidence levels, affects which also make for more compliant subjects/users. In addition, a rich interaction, even equivalent to FtF, does not equal ease or familiarity, and may in fact have the opposite effect, making individuals feel over-exposed and under scrutiny—in VR-mediated communication in general and not just during an experimental set-up, which is already a staged and therefore an emotionally and cognitively precarious situation to be in.
Apart from the above-mentioned aspects that were lacking in VRG and should have resulted on enhanced compliance levels in PWG (and not the opposite), there are further points raised when examining QSs separately. The three variant experimental conditions, based on different types of conversation, provided us with substantial information for the design of future experiments that should address communication in field-specific VR settings, and for CMC to acknowledge VR as a complex communication medium and examine it under its current framework.
In regard to the first QS (factual), no significant difference has been found between the two experimental conditions; subjects were equally compliant. In addition, compared to the other two question sets, QS1 produced the higher multitude of altered answers in both groups. It is safe to assume that, for subjects, the authoritative figure, not only as a researcher, but in fact as the individual who prepared those questions, has the absolute expertise on the subject, regardless of the medium of interaction. Therefore, both multitude of altered responses in both groups, as well as matching values of compliance are not that surprising. QS1 exemplifies the extent to which interlocutors may be as compliant or as dominant in VR as they would be in the PW when the topics of conversation and information exchanged have a factual basis.
The above-mentioned finding informs possible VR applications wherein interlocutors frequently or by principle exchange fact-based information and data. First of all, there are practical reasons for opting for a VR-mediated interaction, but there are also incentives due to the many additional solutions it can provide on top of the FtF simulation, therefore elevating communication by giving interlocutors access to augmentation tools and on-the-spot retrieval and manipulation of information. For instance, emerging MR technologies enable the generation of applications wherein interlocutors may project media within the virtual space and cross-check information instantly, without disengaging from an otherwise interpersonal conversation. Mind-maps and digital files may turn into customizable virtual memory palaces that allow individuals to stroll through their own notes. Such options may, in turn, enhance confidence, overall management of a discussion/debate in regard to both presentation time and richness of information, and therefore reconfigure relational dynamics. Completely eliminating occurrences of compliance is improbable, whilst, as discussed elsewhere, it is also differentiated from obedience. Therefore, complete elimination of compliance, for instance in an educational setting, might diminish the authoritative figure’s influence in a counterintuitive manner. Therefore, the points raised are more in relevance to the necessity of studying the medium’s overall implication to relational dynamics. Pending such in-depth research, providing individuals with tools that elevate their managerial and decision-making skills, is a sufficient enough reason for investing in VR as an alternative to FtF and other mediums currently allowing remote collaboration.
Similar findings are yield for the third QS (sensorial) for which VRG results present slightly increased frequency rates of compliance compared to the PWG. QS3 was based on sensorial information, as participants received visual, auditory and haptic input. A single study is in no position of addressing all these inputs, as they are part of a broad inquisition into cognitive processes and the human perceptual system. However, since the present study is not looking in any of these aspects, but simply juxtaposes in an exploratory manner, performance and compliance between two distinct experimental conditions, we may discuss significance of comparative results in regard to future applications and their possible modifications for performance enhancement.
QS3 examines exactly this kind of sensorial information processing and exemplifies how a virtual setting may benefit users by providing them with pre-processed information. More specifically, all physical properties of objects, spaces and sensory-based events are pre-determined during 3D construction and can thus be superimposed or retrieved at any moment. Providing all relevant data to professionals or users, who either deal with physical properties or could benefit from knowing them, not only minimizes error probability, but also accelerates training, education, and even diagnostics and therapy processes. Everything, from natural disasters to micro facial expressions may become part of the information pool available to professionals and users. This possibility could alter the very nature of a collaborative setting and thus completely modify the social play-out of a great number of profession-based interactions, again mostly in synchronous collaborative environments.
Noticeably, session duration measurements, presented in Table 3, illustrate that VRG participants spent approximately 3 min more than their PWG peers during the QS3 sessions. There is a clear-cut reason for this deviation which relates to free movement. Locomotion comfort is decreased during most VR experiences (Janeh et al. 2019), since users have to navigate in both the virtual and the physical environment. In addition, the VR equipment attached onto their bodies, elevates perceived task load and compromises proprioception. Naturally, this prolongs the duration, and occasionally quality of performances that require coordination.
The second QS (moral), where VRG was found more compliant and results were statistically significant, consisted of only five questions. Contrary to QS1 and QS3, where subject compliance had more to do with probability of error and overall performance, QS2 called forth personal beliefs and appealed to emotions. Aspects of cognitive dissonance were also more evident during QS2, since participants took the narratives quite personally, referencing their own family members and personal experiences. They were even eager to provide detailed and elaborate justifications for each of their choices, regardless of whether they were invited to do so or not. QS2’s averages in session duration per group (PWG, 6′23″/77″; VRG, 7′33″/91″) are also indicatory of subject incertitude, which allowed for prolonged negotiation, and provided opportunity for the researcher to intensify persuasion techniques.
More specifically, in regard to QS2 results, topics and interpersonal communication that pertain to emotions, morality, social life, human relations and so forth, also relate to the formation of individual and group identity. It has been shown that the CMC anonymity and isolation does not always benefit the expression of “personal identity”, but in fact increases chances of conformity to the norms of a group with a salient social identity (Spears and Lea 1994). Of course, the experimental condition of the present study does not capture in-group aspects, or virtual communities; it does, however, point towards the aspect of isolation. The overall setting may simulate an FtF interaction; however, virtual representation with an avatar that bears no resemblance to participants’ own physiology, along with the overall synthetic aesthetic of both the environment and the interlocutor may contribute, not to loss of identity, but limitations in expressing or enhancing said identity through interpersonal engagement, sense of self and physical movement, making individuals more susceptible. In general, QS2 poses the question of individual gullibility and the overall ability of a virtual setting to be as impactful, if not more, emotionally, and psychologically.
Conclusions and future research
As rich as VR-mediated communication may be, it is still computer-mediated, and CMC has oftentimes confirmed that mediation generates unexpected shifts in social engagement, self-awareness and identity construction. Apart from understanding the newly found medium in itself, the examination of VR-mediated communication reinvigorates the field of communication in general and sheds further light to both FtF and CMC. Those benefits relate to our overall research objective, which is to detect and, subsequently, understand the interplaying factors during VR-mediated communication. As shown from our experimental results, the latter not only has the capacity to be as rich and as intricate as an interpersonal interaction during FtF, but it can be further complicated due to the medium’s extensive involvement.
More specifically, our study’s hypothesis—that FtF interaction would result in more compliant participants—was refuted, since VR subjects were just as or more compliant. Subsequently, our findings prove that VR-mediated communication is as complex as FtF and requires in-depth investigation as an intricate medium of communication, whilst the study also exemplifies the great number of real-world VR applications as a) an alternative to FtF collaborative meetings, to limit large-group gatherings and ensure public health safety, b) a substitute for e-learning and teleconferencing systems that limit meaningful engagement and c) a creative platform through collaborative MR environments that mimic "hands on" functions and thus allow interactive and responsive design, construction, engineering and simulation. As Harjunen et al. conclude (2018, p. 26), such technological contributions are not limited to human–computer interaction cases, but also extend to “marketing behavioural economics, personality research, and social psychology as they bring new insight into the dynamics between economic decision-making, personality and nonverbal communication”. This renders our findings particularly important to a discourse in regard to VR’s future as a communication medium that may very well overtake particular industries.
Future research directions require further system-level performance optimization with respect to network latency experienced in remote interaction, investigation of participant task load based on appropriated locomotion tracking techniques, and incorporation of facial and eye-tracking technologies for deeper examination of non-verbal transference in VR-mediated communication. Other social psychology based sub-topics, such as social cognition (Fiske and Taylor 1991), should also be further investigated to inform the growing social VR knowledge base and other emerging applications for VR-mediated conferencing and collaborative environments.
Even though there are several teleconferencing solutions available, larger establishments, such as universities, are better accommodated by outsourcing to all-encompassing platforms and provide a standard and integrated solution for their entire community.
Questions can be seen at: https://drive.google.com/file/d/11F7Vs01hchi5XrxVR5VwX2fQM8xzACKa/view.
Argyle M (1973) The syntaxes of bodily communication. Linguistics 11(112):71–92
Arthur E, Hancock P, Chrysler S (1997) The perception of spatial layout in real and virtual worlds. Ergonomics 40(1):69–77
Asch SE, Guetzkow H (1951) Effects of group pressure upon the modification and distortion of judgments. Organizational influence processes, pp 295–303
Avdiu B, Nayyar G (2020) When face-to-face interactions become an occupational hazard: jobs in the time of COVID-19. Econ Lett 197:109648
Banakou D, Groten R, Slater M (2013) Illusory ownership of a virtual child body causes overestimation of object sizes and implicit attitude changes. Proc Natl Acad Sci 110(31):12846–12851
Banakou D, Slater M (2014) Body ownership causes illusory self-attribution of speaking and influences subsequent real speaking. Proc Natl Acad Sci 111(49):17678–17683
Baka E, Stavroulia KE, Magnenat-Thalmann N, Lanitis A (2018) An EEG-based evaluation for comparing the sense of presence between virtual and physical environments. In Proceedings of Computer Graphics International (CGI), pp. 107–116
Basdogan C et al (2007) VR-based simulators for training in minimally invasive surgery. IEEE Comput Graph Appl 27(2):54–66
Biocca F (1992) Communication within virtual reality: Creating a space for research. J Commun 42(4):5–22
Biocca F, Levy MR (1995a) Communication in the age of virtual reality. Lawrence Erlbaum Associates, Hillsdale
Biocca F, Levy MR (1995b) Virtual reality as a communication system. Communication in the age of virtual reality, pp 15–31
Birdwhistell RL (1970) Kinesics and context. University of Pennsylvania Press
Carrillo C, Flores MA (2020) COVID-19 and teacher education: a literature review of online teaching and learning practices. Eur J Teach Educ 43(4):466–487
Cialdini RB (2009) Influence: science and practice, 4, Pearson education Boston
Cialdini RB, Goldstein NJ (2004) Social influence: compliance and conformity. Annu Rev Psychol 55:591–621
Cipresso P, Giglioli IAC, Raya MA, Riva G (2018) The past, present, and future of virtual and augmented reality research: a network and cluster analysis of the literature. Frontiers in psychology 9:2086
Cooper J (2007) Cognitive dissonance: 50 years of a classic theory. Sage Publications
Dzardanova E, Kasapakis V, Gavalas D (2017) Affective impact of social presence in immersive 3D virtual worlds. In: Proceedings of the 22nd symposium on computers and communications (ISCC), pp 6–11
Dzardanova E, Kasapakis V, Gavalas D (2018a) On the effect of social context in virtual reality: an examination of the determinants of human behavior in shared immersive virtual environments. Consum Electron Mag 7(4):44–52
Dzardanova E, Kasapakis V, Gavalas D (2018b) Social Virtual Reality. Encyclopedia of Computer Graphics and Games. Springer International Publishing, pp 1–3
Ellis SR (1991) Nature and origins of virtual environments: a bibliographical essay. Comput Syst Eng 2(4):321–347
Ellis SR (1994) What are virtual environments? Comput Graph Appl 14(1):17–22
Feldman RS, Feldman RS, Rimé B (1991) Fundamentals of nonverbal behavior. Cambridge University Press
Festinger L (1962) A theory of cognitive dissonance. Stanford University press
Fiske ST, Taylor SE (1991) Social cognition. Mcgraw-Hill Book Company
Girandola F (1997) Double forced compliance and cognitive dissonance theory. The Journal of social psychology 137(5):594–605
Grechkin TY, Nguyen TD, Plumert JM, Cremer JF, Kearney JK (2010) How does presentation method and measurement protocol affect distance estimation in real and virtual environments? Trans Appl Percept 7(4):26
Haans A, de Bruijn R, IJsselsteijn WA (2014) A virtual midas touch? Touch, compliance, and confederate bias in mediated communication. J Nonverb Behav 38(3):301–311
Hargie O (1997) The handbook of communication skills. Psychology Press
Harjunen VJ, Spapé M, Ahmed I, Jacucci G, Ravaja N (2018) Persuaded by the machine: the effect of virtual nonverbal cues and individual differences on compliance in economic bargaining. Comput Hum Behav 87:384–394
Hofling CK, Brotzman E, Dalrymple S, Graves N, Pierce CM (1966) An experimental study in nursephysician relationships. J Nerv Ment Dis 143(2):171–180
Janeh O, Katzakis N, Tong J, Steinicke F (2019) Infinity walk in vr: effects of cognitive load on velocity during continuous long-distance walking. In: Proceedings of the symposium on applied perception (SAP), pp 1–9
Jonas M, Said S, Yu D, Aiello C, Furlo N, Zytko D (2019) Towards a taxonomy of social VR application design. In: Proceedings of the annual symposium on computer-human interaction in play companion extended abstracts (CHI-PLAY), pp 437–444
Joule RV (1991) Double forced compliance: A new paradigm in cognitive dissonance theory. The Journal of social psychology 131(6):839–845
Kasapakis V, Dzardanova E, Gavalas D, Sylaiou S (2018) Remote synchronous interaction in mixed reality gaming worlds. In: Proceedings of the 10th international workshop on immersive mixed and virtual environment systems (MMSys), pp 13–15
Kasapakis V, Dzardanova E, Paschalidis C (2018) Conceptual and technical aspects of full-body motion support in virtual and mixed reality. In: Proceedings of the 5th international conference on augmented reality, virtual reality and computer graphics (AVR), pp 668–682
Kasapakis V, Gavalas D, Dzardanova E (2018) Mixed reality. Encyclopedia of computer graphics and games. Springer International Publishing, pp 1–4
Kilteni K, Bergstrom I, Slater M (2013) Drumming in immersive virtual reality: the body shapes the way we play. Trans vis Comput Graph 19(4):597–605
Kilteni K, Groten R, Slater M (2012) The sense of embodiment in virtual reality. Presence Teleoper Virtual Environ 21(4):373–387
Knapp ML, Hall JA, Horgan TG (2013) Nonverbal communication in human interaction. Cengage Learning
Kort YAd, Ijsselsteijn WA, Kooijman J, Schuurmans Y (2003) Virtual laboratories: comparability of real and virtual environments for environmental psychology. Presence Teleoper Virtual Environ 12(4):360–373
Kuliga SF, Thrash T, Dalton RC, Hölscher C (2015) Virtual reality as an empirical research tool—exploring user experience in a real building and a corresponding virtual model. Comput Environ Urban Syst 54:363–375
Kyrlitsias C, Michael-Grigoriou D (2018) Asch conformity experiment using immersive virtual reality. Comput Anim Virtual Worlds 29(5):e1804
Larsen CR, Soerensen JL, Grantcharov TP, Dalsgaard T, Schouenborg L, Ottosen C, Schroeder TV, Ottesen BS (2009) Effect of virtual reality training on laparoscopic surgery: randomised controlled trial. BMJ 338:b1802
Lee AT (2017) Flight simulation: virtual environments in aviation. Routledge
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60
Mantovani E, Zucchella C, Bottiroli S, Federico A, Giugno R, Sandrini G, Chiamulera C, Tamburin S (2020) Telemedicine and virtual reality for cognitive rehabilitation: a roadmap for the COVID-19 pandemic. Front Neurol 11
Matsumoto D, Frank MG, Hwang HS (2012) Nonverbal communication: science and applications. Sage Publications
Mehrabian A (1972) Nonverbal communication. Transaction Publishers.
Milgram S (1974) Compliant subjects. Science 184:667–669
Milgram S, Gudehus C (1978) Obedience to authority. Ziff-Davis Publishing Company
Mills AW (1960) Lateralization of high-frequency tones. J Acoust Soc Am 32(1):132–134
Mussel P, Göritz AS, Hewig J (2013) The value of a smile: Facial expression affects ultimatum-game responses. Judgm Decis Mak 8(3):381–385
Nahai N (2012) Webs of Influence: the psychology of online persuasion. Pearson.
Osimo SA, Pizarro R, Spanlang B, Slater M (2015) Conversations between self and self as Sigmund Freud—a virtual body ownership paradigm for self counselling. Sci Rep 5:13899
Patterson ML (1991) A functional approach to nonverbal exchange
Paulhus D (1982) Individual differences, self-presentation, and cognitive dissonance: their concurrent operation in forced compliance. J Pers Soc Psychol 43(4):838
Pertaub D, Slater M, Barker C (2001) An experiment on fear of public speaking in virtual reality. Studies in health technology and informatics, pp 372–378
Renard E, Bonardi C, Roussiau N, Girandola F (2007) Forced compliance, double forced compliance and experimental dynamics in social representations. Revue internationale de psychologie sociale 20(2):79–130
Riva G, Mantovani F, Wiederhold BK (2020) Positive Technology and COVID-19. Cyberpsychol Behav Soc Netw 23(9):581–587
Roth D, Bente G, Kullmann P, Mal D, Purps CF, Vogeley K, Latoschik ME (2019) Technologies for social augmentations in user-embodied virtual reality. In: Proceedings of the 25th symposium on virtual reality software and technology (VRST), pp 1–12
Roth D, Klelnbeck C, Feigl T, Mutschler C, Latoschik ME (2018) Beyond replication: augmenting social behaviors in multi-user virtual realities. In: Proceedings of the 25th symposium on virtual reality software and technology (IEEEVR), pp 215–222
Roth D et al (2015) Hybrid avatar-agent technology—a conceptual step towards mediated “social” virtual reality and its respective challenges. i-com, 14(2):107–114
Roth D, Lugrin J-L, Büser J, Bente G, Fuhrmann A, Latoschik ME (2016) A simplified inverse kinematic approach for embodied VR applications. In: Proceedings of the virtual reality conference (IEEE VR), pp 275–276
Roth D, Lugrin J-L, Galakhov D, Hofmann A, Bente G, Latoschik ME, Fuhrmann A (2016) Avatar realism and social interaction quality in virtual reality. In: Proceedings of the virtual reality conference (IEEE VR), pp 277–278
Roth D, Waldow K, Latoschik ME, Fuhrmann A, Bente G (2017) Socially immersive avatar-based communication. In: Proceedings of the virtual reality conference (IEEE VR), pp 259–260
Slater M, Antley A, Davison A, Swapp D, Guger C, Barker C, Pistrang N, Sanchez-Vives MV (2006) A virtual reprise of the Stanley Milgram obedience experiments. PLoS ONE 1(1):e39
Slater M, Spanlang B, Sanchez-Vives MV, Blanke O (2010) First person experience of body transfer in virtual reality. PLoS ONE 5(5):e10564
Slater M, Sanchez-Vives MV (2016) Enhancing our lives with immersive virtual reality. Frontiers in Robotics and AI 3:74
Spanlang B, Normand JM, Borland D, Kilteni K, Giannopoulos E, Pomés A, González-Franco M, Perez-Marcos D, Arroyo-Palacios J, Muncunill XN, Slater M (2014) How to build an embodiment lab: achieving body representation illusions in virtual reality. Front Robot AI 1:9
Spears R, Lea M (1994) Panacea or panopticon? The hidden power in computer-mediated communication. Commun Res 21(4):427–459
Sutherland IE (1968) A head-mounted three dimensional display. In: Proceedings of the fall joint computer conference. Part I (AFIPS), pp 757–764
Tanenbaum TJ, Hartoonian N, Bryan J (2020) How do I make this thing smile? An inventory of expressive nonverbal communication in commercial social virtual reality platforms. In: Proceedings of the conference on human factors in computing systems (CHI), pp 1–13
Waltemate T, Gall D, Roth D, Botsch M, Latoschik ME (2018) The impact of avatar personalization and immersion on virtual body ownership, presence, and emotional response. Trans vis Comput Graph 24(4):1643–1652
Waltemate T, Hülsmann F, PfeifferT, Kopp S, Botsch M (2015) Realizing a low-latency virtual reality environment for motor learning. In: Proceedings of the 21st symposium on virtual reality software and technology, pp 134–147
Witmer BG, Sadowski WJ Jr (1998) Nonvisually guided locomotion to a previously viewed target in real and virtual environments. Hum Fact 40(3):478–488
Wu J, Shi G, Lin W (2019) Survey of visual just noticeable difference estimation. Front Comput Sci 13(1):4–15
Yee N, Bailenson JN, Ducheneaut N (2009) The Proteus effect: Implications of transformed digital selfrepresentation on online and offline behavior. Commun Res 36(2):285–312
Zhao Q (2009) A survey on virtual reality. Sci China Series F Inf Sci 52(3):348–400
The research work has been supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: HFRI-FM17-1168).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Dzardanova, E., Kasapakis, V., Gavalas, D. et al. Virtual reality as a communication medium: a comparative study of forced compliance in virtual reality versus physical world. Virtual Reality (2021). https://doi.org/10.1007/s10055-021-00564-9
- Virtual reality
- VR-mediated communication
- Computer-mediated communication
- Face-to-face communication
- Forced compliance
- Cognitive dissonance