The metaverse is here, and it’s not only transforming how we see the world but how we participate in it.

– Satya Nadella, Microsoft CEO and chairman (2021)

The metaverse has caught the attention of some of the world's largest technology companies, including Facebook (which has fittingly rebranded itself as Meta) and Microsoft, which are investing billions of dollars in virtual-reality hardware and software (Bass & Chang, 2021; Byford, 2021). Originating from science-fiction writer Neal Stephenson (1992), the metaverse concept still lacks consensus on some of its definitory aspects, but core elements have emerged. We define the metaverse as a new computer-mediated environment (Hoffman & Novak, 1996) consisting of virtual “worlds” in which people act and communicate with each other in real-time via digital representatives referred to as avatars (Miao et al., 2022).Footnote 1 Though not the only interface technology for accessing the metaverse, virtual-reality headsets are often considered the most powerful (e.g., Ball, 2021; Kannan & Singh, 2021; Metz, 2021).

The social nature of the metaverse and particularly its ability to host real-time multisensory social interactions (RMSIs), defined as interactions between two or more people that occur synchronously and involve multiple senses (e.g., sight, hearing, touch), have captured the attention of global executives. For example, Meta CEO Mark Zuckerberg considers the metaverse “the holy grail of social interactions” (Newton, 2021). Executives envision the metaverse as an environment in which RMSIs can provide consumers and companies more value than two-dimensional (2D) computer-mediated environments such as Zoom, Skype, and Discord. Financial analysts agree with this logic; expecting a massive shift of business from the 2D internet to the metaverse, Goldman Sachs predicts that the metaverse will generate several trillion dollars annually (Sheridan et al., 2021).

Yet whether the metaverse can actually provide users such superior value is an open question, as is the mechanisms through which such additional value would arise (Balis, 2022). The value of the metaverse is closely tied to people’s reactions to the use of specific and complex hardware affiliated with accessing the metaverse (i.e., virtual-reality headsets), with these reactions being little understood. As RMSIs in the metaverse accessed via virtual-reality headsets require substantial investments in such hardware (e.g., equipping its new consultants with 60,000 Meta Quest 2 headsets costs consulting firm Accenture approximate $30 million; Greener, 2021), providing answers to these questions is essential for all who consider the metaverse an alternative environment to the 2D internet for RMSIs, whether they involve meetings between employees and/or customers.

This research investigates the value potential of RMSIs in the metaverse. Specifically, we examine whether RMSIs in the metaverse accessed via virtual-reality headsets help firms achieve desired outcomes in terms of better interaction performances (e.g., more creative solutions by work teams), more positive evaluations (e.g., of a service provided to customers), and more positive emotions of interactants (e.g., employees in workshops, customers in service experiences), as well as the underlying psycho-physiological mechanisms. We address the innovative nature of the phenomenon under scrutiny (i.e., RMSIs in the metaverse) with a three-step approach: first, we develop a tentative theoretical framework of the metaverse’s value potential, investigating how RMSIs in the metaverse, when it is accessed through state-of-the-art virtual-reality technology that enables high-fidelity experiences, affect interaction outcomes, in comparison with RMSIs on the 2D internet. Second, we carry out extensive empirical probes to glean initial insights into the framework’s proposed paths. In the third step, we synthesize theory and insights gathered through the empirical probes and offer a refined framework that serves as a roadmap for further scholarly and managerial exploration of RMSIs in the metaverse. Our endeavor is in response to recent calls for a richer understanding of how virtual-reality technology influences users’ behaviors and the mechanisms through which it does so, particularly in the context of social exchanges (Kumar, 2018; Wedel et al., 2020).

For developing our theoretical framework, we combine fundamental insights of research on virtual reality (e.g., Oh et al., 2018) and RMSIs on the 2D internet (e.g., Brucks & Levav, 2022). We propose that interactants’ social presence, exhaustion, and physical mobility serve as intermediate conditions during RMSIs that differ when RMSIs take place in the metaverse via virtual-reality headsets versus on the 2D internet, leading to differences in interaction outcomes between the two computer-mediated environments. Then 328 business students participated in a series of five field experiments, resulting in a cumulative sample size of 1,363. In each experiment, participants take part in RMSIs via high-fidelity state-of-the-art virtual-reality headsets (i.e., Meta Quest 2) either on the 2D internet (e.g., via Zoom) or in a “non-virtual-reality metaverse” setting in which interactants access the metaverse remotely in 2D via a computer. Reflecting the broad scope of the metaverse, the experimental studies feature RMSIs across three basic life contexts: work (i.e., employee–employee RMSIs), joint consumption (i.e., consumer–consumer RMSIs), and the interface between customers and frontline employees (i.e., customer–employee RMSIs).

Our empirical probes support central arguments of the tentative framework, demonstrating the metaverse’s power to add value to RMSIs through a higher level of social presence when accessed through virtual-reality headsets. At the same time, we do not find the metaverse to systematically outperform RMSIs on the 2D internet across the five experimental settings with regard to any of the studied outcomes. While we attribute this in part to the proposed detrimental role of greater interactant exhaustion when using virtual-reality headsets (which we find diminishes with habituation to the usage of such headsets), we also use the empirical insights to further advance our theorizing and offer a refined framework that can serve as a roadmap for future scholarly explorations of RMSIs in the metaverse.

Foundations of the metaverse

The metaverse as a new virtual computer-mediated environment

When the stationary 2D internet connected computers and enabled digital exchanges between users in the 1990s, Hoffman and Novak (1996, p. 53) introduced the idea of computer-mediated environments, defining them as distributed networks with “associated hardware and software.” Some ten years later, the stationary 2D internet was complemented by the mobile internet as another computer-mediated environment that connected users via smartphones and tablets (unique operating systems) and introduced apps as a specific kind of software.

Drawing from Hoffman and Novak’s (1996) pioneering work, we conceptualize the metaverse as a new computer-mediated environment that consists of virtual “worlds” in which people can act and communicate with each other via avatars (Miao et al., 2022). The metaverse involves a distinct set of hardware (i.e., head-mounted virtual-reality devices, or headsets) that serves as a main gateway, along with unique operating systems (e.g., Meta/Oculus) and distinctive apps that offer virtual “worlds” (e.g., Altspace by Microsoft, Horizon by Meta). The metaverse, as an “inherently social place” (Novaquark executive Sébastien Bisch, qtd. in Batchelor, 2021), provides room for a large spectrum of joint human activities, from entertainment (e.g., bowling, watching movies in a virtual theater; Baker, 2021) to work collaborations (e.g., interacting with co-workers, having business meetings; CBS News, 2021), in addition to individual deeds.

Virtual-reality headsets as main gateway to the metaverse

The idea of virtual reality refers to a synthetic environment that involves real-time simulation and multi-sensorial (e.g., visual, auditory, tactile) interactions between a (human) user and a computer (Burdea & Coiffet, 2003). Attempts to create the technology required for such environments date back to the 1960s, when initial conceptualizations and prototypes of head-mounted virtual-reality devices were presented (Sherman & Craig, 2019).

From the beginning, research on user-computer interactions in virtual reality has appealed not only to computer scientists but also to academics in multiple other disciplines, including education, tourism, medicine, and media. Marketing scholars had a muted interest in virtual reality though, mirroring the decade-long limited relevance of virtual-reality hardware for consumers and companies. This has changed recently, with both conceptual and empirical discoveries being made.

New conceptual contributions in marketing include the identification of value-creation opportunities of virtual reality along the customer journey (Hoyer et al., 2020; Meißner et al., 2020; Wedel et al., 2020) and for retailers (Grewal et al., 2017, 2020). Researchers have also conceptually compared virtual reality with other digital contexts regarding the nature of communication (Moffett et al., 2021) and social media (Appel et al., 2020). Empirical marketing studies show that virtual reality can improve predictions about product adoption (Harz et al., 2022) and how virtual reality can affect specific aspects of consumer behavior, such as the haptic (when consumers “touch” products; Luangrath et al., 2022) and auditory (when consumers hear a product sound in virtual reality; Ringler et al., 2021) sensing of products. Meißner et al., (2020) reveal that virtual reality affects certain aspects of consumer choice but does not necessarily trigger more satisfaction than websites.

The focus of all these studies is on the interaction between a single user and his or her synthetic environment, while social interactions between multiple users in virtual reality have received little attention by scholars, including those in marketing, despite virtual reality’s frequent conceptualization as a “social technology” (Chen, 2022). However, technological limitations, which have largely suppressed this social nature, have been overcome in recent years. Since, virtual-reality technology has seen massive performance increases in motion tracking (which transmits the movements of the user from the physical to the virtual environment), processing power, screen resolution, battery life, and usability (Bailenson, 2018), which have enabled features such as “room scaling” (i.e., allowing users to move freely in their physical space when exploring virtual worlds) and high fidelity in terms of settings and avatars’ gestural and mimic expressiveness (Han et al., 2022).

Such advances in virtual-reality technology have been crucial, as they not only facilitate high levels of perceived (spatial) presence for users (Al-Jundi & Tanbour, 2022), as the “fundamental characteristic” of effective virtual-reality applications (Bailenson, 2018, p. 19), but also enable social interactions between avatars in real time that involve multiple senses and thus pave the way for the metaverse as a virtual environment in which such interactions take place. The only scholarly studies so far that have empirically investigated such social interactions using more powerful virtual-reality headsets are by information systems scholars, who have mostly used dyads as groups (Wei et al., 2022). They have shed light on group processes in virtual gatherings and the role of avatars for users’ responses (e.g., Han et al., 2022), along with non-verbal communication patterns (Abdullah et al., 2021). In addition, studies have highlighted similarities of interactants’ behaviors in virtual environments to those of the physical world (Dzardanova et al., 2022), including a high level of “social presence”, a key concept of media and communications research (Short et al., 1976; Smith & Neff, 2018).

RMSIs on the 2D internet

Researchers have also investigated RMSIs on the 2D internet (e.g., videoconferencing via Skype or Zoom), noting deficiencies of this digital environment that we expect to apply less to RMSIs when they are hosted in the metaverse and accessed via virtual-reality headsets.

Using face-to-face settings as reference, one group of scholars has attributed the relative underperformance of RMSIs on the 2D internet to the environment’s users’ lower social presence perception. Among them are Basch et al., (2021), who find lower ratings by job interviewees for videoconferencing versus face-to-face meetings, and Andres (2002), who finds software development teams that met face-to-face to be more productive and interact better.

Another research stream deals with the physical behaviors associated with RMSIs via 2D videoconferences and their consequences. Specifically, Brucks and Levav (2022) find that such RMSIs generate fewer creative ideas than face-to-face meetings and blame the static nature of an environment that points “communicators on a screen, which prompts a narrower cognitive focus.” Similarly, Bailenson (2021), in his conceptual analysis of “Zoom fatigue,” argues that RMSIs on the 2D internet will be less effective because of interactants’ lack of physical mobility due to the restricted range of motion resulting from the use of computer cameras, along with related factors such as an unusual amount of constant eye gaze.

In summary, the metaverse accessed via high-fidelity virtual-reality headsets constitutes a radically novel computer-mediated environment that extends consumers’ and (marketing) managers’ options with regard to RMSIs. While some of the world’s largest technology companies are investing billions of dollars in the metaverse as a powerful environment for RMSIs, scholarly findings on its value potential are still limited. The focus of research on virtual reality has been on user-computer interactions rather than RMSIs because of previous hardware generations’ limitations, with some recent studies offering initial insights. Moreover, research on RMSIs via 2D internet applications such as Zoom stresses certain limitations inherent to that technology.

Step 1: A tentative theoretical framework of RMSIs in the metaverse

Our theoretical framework focuses on RMSIs in computer-mediated environments. We theorize how the metaverse accessed by interactants via virtual-reality headsets affects the value created by RMSIs in terms of interaction outcomes, compared with the 2D internet (e.g., Zoom) as the current de facto standard for RMSIs in computer-mediated environments.

Reflecting the heterogeneous nature of the value concept and the various contexts in which RMSIs take place (for a typology of interactions in computer-mediated environments, see Yadav & Pavlou, 2014, 2020), we consider a broad range of interaction outcomes, specifically interactants’ performance, their evaluations, and their emotional responses. Performance outcomes such as the level of creativity of work team solutions (e.g., Brucks & Levav, 2022) are of particular relevance for RMSIs in a work context. Evaluation outcomes (e.g., customers’ service quality perceptions) are essential for RMSIs that take place at the customer–frontline employee interface. Emotion outcomes (e.g., positive affect of interactants) are critical for RMSIs as part of joint consumption experiences, such as watching a movie together with friends.

All three basic outcome categories have established ties to the financial performance of firms, which serve either as employer of those who interact or as provider of products and services to customers who interact with employees or other customers. Performance outcomes such as work team creativity can influence firms’ market and financial performance (e.g., Im & Workman, 2004), as can evaluative metrics by customers (e.g., service quality) and employees (e.g., work satisfaction) (e.g., Hogreve et al., 2017). The same applies to the emotions of customer and employees (e.g., Hennig-Thurau et al., 2006).

The foundational logic of our framework is that RMSIs in the metaverse, when accessed via virtual-reality headsets, differ systematically from those on the 2D internet, with those differences affecting RMSI outcomes through intermediate conditions. To identify these intermediate conditions, we draw from literature dealing with core aspects of the two computer-mediated environments which are at the center of this research, namely research on virtual reality and on RMSIs on the 2D internet. Specifically, from virtual-reality research we derive the concepts of social presence and exhaustion. While we propose that social presence serves as the main competitive advantage for RMSIs in the metaverse accessed via virtual-reality headsets over RMSIs on the 2D internet, we argue that social presence’s positive effects will be mitigated by exhaustion associated with the use of virtual-reality headsets. We complement these two virtual-reality-related intermediate conditions with one we draw from research on RMSIs on the 2D internet, namely physical mobility, which is argued to be a main deficiency of this environment for hosting RMSIs. At this point, we limit our framework to these three intermediate conditions, prioritizing thoroughness over an attempt at comprehensiveness at this infant stage of metaverse exploration; we will discuss potential extensions based on empirical probes as part of our research roadmap. Figure 1 summarizes the tentative framework and its proposed relationships.

Fig. 1
figure 1

Tentative theoretical framework

Social presence as link between computer-mediated environments and interaction outcomes

Presence, defined as a person’s perception of “being there” or being immersed in a medium, is an established psychological condition for all kinds of experiences mediated by a computer (Nowak & Biocca, 2003; Schuemie et al., 2001) and, as such, is considered as the main difference between experiences in virtual-reality headsets and those in other kinds of media environments (Bailenson, 2018). For social interactions between two or more people via media, media and communication scholars (e.g., Oh et al., 2018) have argued that the related concept of social presence plays a similarly fundamental role. While (spatial) presence refers to a place’s geography, social presence refers to a person’s perception of “being (somewhere) together” with other people (Biocca et al., 2003); it is thereby determined by the number and intensity of social cues transmitted by others (Short et al., 1976).

Scholars have drawn on media richness theory (Daft & Lengel, 1986) to argue that because virtual-reality environments are “rich” (i.e., provide more room for cues than 2D media), interactants will be able to exchange not only social cues in the form of text and audio cues but also multidimensional visual and sometimes even haptic sensations (e.g., Schroeder, 2002). Accordingly, virtual reality enables high levels of social presence (e.g., Oh et al., 2018; Wedel et al., 2020), which Smith and Neff (2018) even relate to gatherings in the physical world.

We adopt this logic and argue that RMSIs via the metaverse will produce higher levels of social presence for interactants than RMSIs via the 2D internet. Current virtual-reality headsets provide realistic and vivid illusions of environments and people in those environments (Wedel et al., 2020). The 360-degree nature of metaverse settings in which people can move around should add to the number and intensity of social cues that can be exchanged (Oh et al., 2018). Together, these characteristics should contribute to an “illusion of non-mediation” (Lombard & Ditton, 1997), in which interactants have limited perception (if any) of intervening technologies (Yadav & Varadarajan, 2005), and which should evoke strong feelings of “togetherness” between interactants (Bailenson, 2018; Grewal et al., 2020).

By contrast, RMSIs in 2D internet settings suffer from “sensory disadvantages” (Steinhoff et al., 2019, p. 375), which should limit this computer-mediated environment’s ability to stimulate high levels of social presence. In line with this, Andres (2002) and Basch et al. (2021) also blame the 2D internet environment’s inferior outcomes (compared with face-to-face RMSIs) on its lower level of social presence. As a result, social cues should appear more authentic and “real” in the metaverse when accessed via virtual-reality headsets than on the 2D internet, triggering higher levels of social presence for interactants (Hudson et al., 2019; Sra et al., 2018).Footnote 2 Drawing on these theoretical arguments, we propose the following:

P1

RMSIs in the metaverse when accessed via virtual-reality headsets are associated with higher levels of social presence than RMSIs on the 2D internet.

The basic argument for a positive impact of social presence on interaction outcomes is that interactants’ perception of “being together” facilitates the exchange of arguments, thoughts, and feelings in an open and honest way, which then should result in a variety of interaction outcomes. Social presence resembles key social relationship concepts, including relational closeness, a perceptual state associated with the sharing of innermost feelings and thoughts (Aron et al., 1992; Hennig-Thurau et al., 2012). Like closeness, social presence has been linked with more intimate exchanges, as high levels of social presence allow interactants “to act out and express their sense of ‘closeness’ or intimacy” (Baldassar, 2008, p. 261). In line with this logic, scholars have suggested that intimacy constitutes a facet or dimension of social presence (Bente et al., 2004; Short et al., 1976; Sung & Mayer, 2012).

Both social psychologists (Altman & Taylor, 1973; Schaubroeck et al., 2011) and marketing scholars (Morgan & Hunt, 1994; Yim et al., 2008) have found extensive evidence that perceived closeness and intimacy influence interaction behavior and outcomes, respectively. We adapt these findings to social presence in RMSIs in computer-mediated environments, arguing that a higher level of social presence will lead to more positive outcomes of RMSIs across life contexts (see also Grewal et al., 2020).

Specifically, a high level of social presence should be positively associated with performance outcomes, given more open and richer conversations between interactants. In support of this logic, Roberts et al. (2006) find that groups of information-systems students who experience higher levels of social presence perform better (i.e., participate in more discussions and cooperate more) when assessing the usability of computer interfaces.

Social presence should also influence interactants’ evaluation of their gatherings for similar reasons. Regardless of whether external elements (e.g., services offered by a frontline employee, products consumed jointly) or the group of interactants itself are evaluated, the closeness and intimacy associated with a high level of social presence during RMSIs should unearth otherwise hidden thoughts and feelings and subsequently affect the interactants’ evaluation in a positive way. In line with this argument, Russo and Benson (2005) find positive correlations between students’ social presence and their attitudes toward their class as well as their satisfaction with their own learning in an educational service setting.

Finally, social presence should also lead to more positive emotions during RMSIs not only because it facilitates open and deeper exchange but also because of more focused attention to focal stimuli (e.g., a shown movie) as a result of the closely shared experience (Boothby et al., 2014). These arguments lead us to propose the following:

P2

 The higher the level of interactants’ social presence during RMSIs, the more positive are the interaction outcomes.

Exhaustion as link between computer-mediated environments and interaction outcomes

While our logic for social presence suggests that RMSIs in the metaverse when accessed via virtual-reality headsets are superior to those on the 2D internet, virtual-reality research also points at some negative effects associated with the use of headsets, which are echoed by reports of uncomfortable feelings and disorientation, headaches, eye strain, and nausea by users (e.g., TheDon2016, 2017) and journalists (Nunn, 2021). While such effects vary in their details, they all involve certain forms of exhaustion, a broad concept that describes a person’s physical, psychological, and emotional drain (Wright & Cropanzano, 1998). We thus use “exhaustion” as an umbrella term for the different, but related, negative states users experience with virtual-reality headsets.

Scholarly evidence of the virtual reality–exhaustion link includes findings of “cybersickness,” a state of physical discomfort associated with the use of virtual-reality headsets (Weech et al., 2019; also referred to as “motion sickness” or “virtual-reality sickness,” Kim et al., 2018). Accordingly, users of such devices can suffer from a mismatch between visual stimuli and corresponding sensory feedback (Gavgani et al., 2018), which triggers feelings of discomfort. A separate research stream attributes exhaustion due to virtual-reality usage to cognitive processes induced by the new computer-mediated environment’s higher richness. Accordingly, as virtual-reality technologies are more complex and offer more sensory cues than those of the 2D internet, users may struggle to process the information properly and, eventually, become exhausted (Gao et al., 2018). Assuming that virtual-reality technology is more complex than the 2D internet, technostress theory (Shu et al., 2011) offers a similar logic, holding that high complexity levels of computer-mediated environments trigger users’ feeling of losing control over their time or space (Lee et al., 2014) and increase exhaustion (Tarafdar et al., 2007). In addition, exhaustion is argued to be reinforced by the relative heaviness and tightness of virtual-reality headsets (Wei et al., 2022). These arguments lead us to propose the following:

P3

 RMSIs in the metaverse when accessed with virtual-reality headsets are associated with more exhaustion than RMSIs on the 2D internet.

With regard to the link between exhaustion and interaction outcomes, we refer to the fundamental argument that humans need cognitive, emotional, and physical resources to complete tasks successfully (Fredrickson, 2001). During RMSIs, exhaustion due to a lack of such resources causes people to turn from their fellow interactants to an internal focus, as they try to self-regulate their energy, and thus neglect the challenges they are confronted with externally (Demerouti et al., 2005).

Empirical support for such a negative link between people’s exhaustion and their performance exists for all three basic outcome categories we consider in this research, though most studies investigate exhaustion in contexts other than RMSIs or the broader concept of social interactions. In connecting exhaustion with performance outcomes in a work context, Wright and Cropanzano (1998) find that exhausted social welfare workers receive less positive job performance ratings. Focusing on frontline workers, Hur et al., (2015) report that exhausted bank employees serve customers less effectively. Findings are not fully uniform though; for example, Babakus et al., (1999) survey the sales force of a business-to-business service provider and find no significant effect of exhaustion on performance. For evaluation outcomes, Hur et al., (2015) report that exhaustion worsens employees’ evaluation of their job satisfaction. Finally, with regard to emotion outcomes, exhaustion prevents people who play games from reaching a flow state and from experiencing the positive emotions associated with it (e.g., Weibel & Wissmath, 2011). Similarly, examining exhaustion in an education context, Goetz et al., (2015) find that exhausted teachers feel more situational anger, anxiety, shame, and also boredom. These arguments lead us to propose:

P4

 The higher the degree of the interactants’ exhaustion during RMSIs, the less positive are the interaction outcomes.

Physical mobility as link between computer-mediated environments and interaction outcomes

In addition to the features of virtual-reality technology, the effectiveness of RMSIs in the metaverse relative to those on the 2D internet is also influenced by characteristics of the latter environment. Specifically, researchers studying videoconferences and related apps on the 2D internet have noted that RMSIs on the 2D internet are inherently limited in the degree of interactants’ physical mobility (e.g., Bailenson, 2021), which we argue is not the case for RMSIs in the metaverse when interactants access it via virtual-reality headsets.

According to Bailenson (2021), such lack of physical mobility results from the computer-mediated environment’s requirement to use predefined (and fixed) camera settings and the constant need of the interactants to stay near their computer and in reach of the keyboard and mouse. By contrast, today’s high-fidelity virtual-reality headsets offer people more physical mobility during RMSIs (e.g., Bailenson, 2018). Users of a virtual-reality headset can now move freely when conversing with others, which includes the free movement of their head, arms, and body during an interaction. Furthermore, interactants in the metaverse using virtual-reality headsets can perform more radical physical movements, such as walking around in a predefined space, as a result of stand-alone room-scaling technology (Freina & Ott, 2015).

Moreover, the metaverse should stimulate such physical mobility in social interactions, as it enables forms of nonverbal communication, such as patting, handshakes, or fist bumps between interactants. In addition, while the audio of all participants of a 2D videoconference is frontal and steady regardless of the speaker’s position, motivating no change in position from those who listen, the spatial audio element of the virtual-reality metaverse (i.e., sounds and voices are locked in to their “geographic” origins) facilitates turning and moving toward the source.

We assume that interactants will make use of this physical mobility potential when partaking in RMSIs in the metaverse via virtual-reality headsets, using gestures and body language to express (dis)agreement or emotions such as excitement (De Stefani & De Marco, 2019) and also varying their position in a (virtual) room during RMSIs. Such behavior should then result in more physical mobility (Lindley et al., 2008) when participating in RMSIs in the virtual-reality metaverse versus on the 2D internet. We thus propose:

P5

 RMSIs in the metaverse when accessed with virtual-reality headsets are associated with more physical mobility than RMSIs on the 2D internet.

Our final proposition links interactants’ physical mobility during RMSIs with positive effects on interaction outcomes, drawing on embodied cognition theory, which states that the human body’s interaction with its environment contributes to and helps shaping the mind (Wilson, 2002). Accordingly, a person’s environment can stimulate his or her mind by providing access to additional cues through interactions with it. Such interactions can involve almost all human senses, including seeing, hearing, and touching. Embodied cognition is not limited to interactions with physical environments but also applies to digital settings (e.g., Mueller & Gibbs, 2007).

Higher levels of physical mobility during RMSIs involve higher levels of interaction with the environment, such as more head movements, implying people’s exposure to additional visual, auditory, and haptic stimuli, which consequently trigger more processing, both cognitive and emotional, and should contribute to a better understanding of their surroundings (Dove, 2011; Spaulding, 2014). We argue that because of the increased level of interactions with the environment and their impact on processing activities, more physical mobility during RMSIs should contribute to more positive interaction outcomes in general.

Regarding performance outcomes, Oppezzo and Schwartz (2014, p. 1142) find higher levels of creative ideation for (individual) consumers who walk versus those who sit, as movement “opens up the free flow of ideas.” The effects of this higher processing should not be restricted to creative tasks in RMSIs, but also affect other facets of performance (e.g., volume of exchange between interactants, productivity).

Physical mobility should also lead to more positive evaluations of RMSIs by triggering a higher level of situational involvement (Richins et al., 1992) among interactants (Arts et al., 2011). Specifically, we argue that the larger number of cues perceived when interactants’ physical mobility is high should lead to greater involvement and, consequently, more positive evaluations (Pierro et al., 2006). This effect should emerge independent of the context, affecting work teams’ evaluations as well as those by consumers.

Finally, higher levels of physical mobility during RMSIs should also contribute to more positive emotions, as exposure to additional environmental cues should reduce the sensory monotony associated with being exposed to a constant set of stimuli (e.g., fellow interactants’ Zoom tiles; Boletsis & Cedergren, 2019). A higher level of physical mobility should also trigger positive emotions through improved social connections with fellow interactants because of more vivid interactions (e.g., Mueller et al., 2003). These arguments lead us to propose the following:

P6

 The higher the level of interactants’ physical mobility during RMSI, the more positive are the interaction outcomes.

Step 2: Enriching the tentative theoretical framework with empirical probes

The fundamental nature of our theoretical framework, along with the phenomenological broadness of the metaverse, prevents a comprehensive empirical testing of the framework. To still take an initial step beyond a solely theoretical contribution and to advance our conceptual logic, we ran a series of empirical studies that probed the effects of RMSIs in the metaverse on interaction outcomes through intermediate conditions across different contexts (i.e., work, consumption, and the customer–employee interface), tasks, and activities.

The insights of these studies advance our understanding of the phenomenon under scrutiny, without the ambition of formal hypothesis tests. Instead, we combine theoretical logic and initial empirical results to offer a refined version of our framework, which then should provide a solid basis for guiding future scholarly explorations of RMSIs in the metaverse.

A series of five studies: Timeline, settings, and study designs

We conducted a series of five experimental studies over a four-month period in the second quarter of 2021. We designed the studies in a way to shed light on the value (in terms of our types of interaction outcomes) that the metaverse adds to RMSIs in different basic life contexts, namely work, consumption, and the customer–employee interface. We designed the studies so that they all resemble important types of real-life interactions in their respective life context.

Specifically, the first two studies involved how employees accomplish tasks in teams at work. Groups of participants were asked to find solutions for a creativity-related task (Study 1) and a productivity-related task (Study 2). Study 3 was then situated in the context of joint consumption, with groups of participants watching films together. Watching movies is a prominent pastime for consumers all over the world, with “more than 90% of movie visits occur[ring] with others” (Hamilton, 2021), and thus is a suitable context for observing RMSIs in joint consumption situations. The remaining two studies dealt with customer–frontline employee interface constellations, reflecting how service and sales interactions currently take place in digital environments. In Study 4, the participants took part in a customer feedback session in an educational service context, and in Study 5, a salesperson offered the participants a movie ticket (sales context). We limited the scope of the studies to a specific RMSI constellation of broad practical relevance across the studied contexts—namely, a meeting of a small, predetermined group of people (between two and four people; the average group size was three) who had met before with a clearly defined task or activity.

The five studies were preceded by a prestudy, in which we asked participants to create a group name and a creative team logo together. The prestudy enabled participants to familiarize themselves with and acclimate to the computer-mediated environment to be used during the main studies, something we considered particularly important for the usage of virtual-reality headsets in the metaverse setting (for a similar approach, see Qorbani et al., 2021). This approach should also minimize potential ordering effects due to unfamiliarity and insecurity with the headsets. Web Appendix A shows the timeline of the studies.

All five studies were of a field-experimental nature, in line with our intention to enhance our tentative framework with relevant insights having high external validity. Participants in all studies were business students who accomplished various kinds of group work as part of their course of study and received credit for their performance in the tasks and activities. Each study as well as the pre-study consisted of a metaverse setting in which participants took part in RMSIs via virtual-reality headsets (hereinafter “virtual-reality metaverse” setting) and a 2D internet setting. In addition to these two main settings, we included a “non-virtual-reality metaverse” setting, in which participants took part in RMSIs in the virtual metaverse worlds not via virtual-reality headsets but via their computers’ keyboard and mouse. Such a setting, in which users maneuver their avatar in three-dimensional spaces through their 2D computers’ monitors, has been argued to act as a “fast track” for accessing the metaverse without the substantial costs of virtual-reality hardware (e.g., Keach, 2022), often with reference to the popularity of social internet platforms such as Roblox and Fortnite (e.g., Amenabar & Lee, 2022; Hollensen et al., 2022).

When developing the study designs, we employed leading commercial applications in all cases (i.e., Zoom and Watch2Gether for the 2D internet setting; Glue, Bigscreen, and Altspace for the virtual-reality metaverse setting; and the same or similar applications in the non-virtual-reality metaverse setting). We preferred this approach to developing proprietary solutions, as it ensures that our results reflect the full potential of metaverse and 2D internet technology for RMSIs at the time we carried out the studies. In all cases, we had full control over the experimental situation without any external distraction (high internal validity). In the virtual-reality metaverse setting, we provided all participants with high-fidelity state-of-the-art virtual-reality headsets (i.e., Meta Quest 2) on which we preinstalled the respective applications. We asked students to design their personalized avatars in the respective apps, allowing for a high level of expressiveness (Han et al., 2022). Participants used their own computers in the 2D internet and the non-virtual-reality metaverse setting; in both settings, the software was either provided by the first author’s university (e.g., Zoom) or free to access (e.g., Altspace).

Table 1 shows the designs and software programs/apps used in the five studies for the three settings. We provide a detailed description of each study’s design and procedure along with stylized photos of all experimental settings in Web Appendix B.

Table 1 Overview of settings of empirical studies and software programs/apps used

Participants and groups

Participants in all studies were final-year undergraduate business students at a large public German university. We randomly assigned 328 students to one of the three settings. Ninety-six students participated in the virtual-reality metaverse setting and were assigned to 32 groups. These students met remotely from their respective locations (note that spatial separation is important for the effectiveness of RMSIs in the virtual-reality metaverse; Born et al., 2019) and carried out their group tasks and activities in the respective virtual-reality apps (e.g., the app Glue in Study 1) while wearing a Quest 2 headset. For each of the apps and studies, we created a separate virtual room for each group (e.g., 32 rooms in Glue in Study 1), with all rooms being identical “digital twins” of each another. The students could only enter the room assigned to their specific group.

Regarding the other settings, we assigned 123 students to 39 groups in the 2D internet setting and 109 students to 36 groups in the non-virtual-reality metaverse setting. The number of people in the virtual-reality metaverse setting was slightly smaller because of capacity restrictions in terms of hardware and licenses. Participants remained in the same groups and settings across all five studies, which allowed us to determine how repeated usage of virtual-reality technology affects interactants’ reactions over time. Repeated use over the course of several months resembled real-life adjustment processes of interactants with regard to technology usage (“habituation”; Diemer et al., 2014).Footnote 3

The different tasks and activities were an integral part of an innovation management class tutorial; students received extra class credit for their participation in the different tasks and activities and were debriefed after the final study. The metaverse, virtual reality, and related topics were not discussed in class to avoid potential confounds. All studies took place during a fully digital semester in the summer of 2021; the RMSI tasks and activities via computer-mediated environments were thus seamlessly combined with the other class elements. Web Appendix C lists all sample information in the different conditions on both the individual and group level.

Outcomes, measures, and method

While we included all three basic outcome categories (i.e., performance, evaluation, and emotions) in each study, we varied the specific kinds of outcomes, selecting outcomes most relevant to each study’s specific context (e.g., creativity of solution as a performance outcome in the creativity-task work context of Study 1, fun as an emotional outcome in the movie-watching context of Study 3). In the Appendix Table 5, we show which specific interaction outcomes we included in which study, along with exemplary studies that demonstrate the links between outcome constructs and financial value for firms.

Immediately after each study, we asked the participants to fill out a short questionnaire, in which they rated the outcome variables of the respective study and the three intermediate conditions. In addition, we asked participants to provide information on several other variables serving as controls in the analyses; these included situational variables (e.g., weather, internet quality), group variables (e.g., group size, familiarity with group members before class), and participant characteristics such as gender, age, and grade point average (please see Web Appendix D for details on the controls in each study). While our prestudy ensured that all participants had experienced virtual reality before Study 1, we nevertheless also measured their ownership of virtual-reality headsets to capture potential differences in pre-experimental virtual-reality experience; we left this out of the analyses, however, as none of the participants owned a device.

We used reflective 7-point multi-item scales for most constructs of the framework, drawing from established prevalidated scales when possible. For some of the variables, we created new items based on literature to match the context of our studies. We report the individual items along with their respective sources and their reliability in terms of Cronbach’s alpha values (above 0.70 for all framework constructs) in the Appendix Table 5. To assess the creativity of the group solutions in Study 1 and the productivity of the group solutions in Study 2, we hired independent coders.

We analyzed all five studies separately with partial least squares structural equation modeling (SmartPLS 3 with 10,000 bootstrapping samples), which allowed us to estimate all proposed relationships simultaneously in a single estimation procedure and to harvest the full information for each item instead of using mean scores (Collier, 2020). Our main level of analysis was the individual participant; in each of the individual-level analyses, we included the computer-mediated environment (i.e., virtual-reality metaverse or 2D internet) as the independent variable, the three intermediate conditions (social presence, exhaustion, and physical mobility) as mediators, and the interaction outcomes that matched the respective study’s task or activity as dependent variables. In addition, we estimated a model on the group level in Study 1, in which we used the creativity of the groups’ solutions as the dependent variable, and also in Study 2, in which we used the productivity of the groups’ solutions as the dependent variable.

In all estimations, we also included a direct path from the computer-mediated environment to the interaction outcomes, which allowed us not only to determine whether the intermediate conditions in the model serve as full mediators of the RMSIs–outcomes link or if additional mediators exist, but also to assess the total effect of the computer-mediated environments on outcomes. We linked the controls with all intermediate conditions and all interaction outcomes of the respective model.

Main results: Comparing RMSIs in the virtual-reality metaverse and on the 2D internet

Table 2 shows the results of our comparisons of the virtual-reality metaverse setting and the 2D internet setting for all five studies (for detailed results of each study, see Web Appendix E). In terms of total effects, which take both mediations and direct effects into account and reveal the overall influence of RMSIs in the virtual-reality metaverse (vs. on the 2D internet) on interaction outcomes, we find that RMSIs in the virtual-reality metaverse neither systematically outperform nor underperform those on the 2D internet. Instead, total effects are mostly nonsignificant; the virtual-reality metaverse has evaluation and emotion advantages in Study 1, whereas the 2D internet generates more interactions in Study 4 and a more positive affect in Study 5. These results suggest that the value of RMSIs in the new virtual environment is not generally superior to meetings via 2D environments (e.g., Zoom) and underscore the need for a more fine-grained investigation.

Table 2 Overview of PLS analyses of virtual-reality metaverse versus 2D internet: Significant effects

Crucial for a deeper understanding of the value-creation potential of RMSIs in the virtual-reality metaverse are the findings of the theoretically proposed paths of our tentative framework. For social presence and its proposed role as an intermediate condition, we consistently find that participants experience a higher level of social presence when their RMSIs take place in the metaverse via virtual-reality headsets rather than on the 2D internet, in line with P1. This higher level of social presence occurs across all experimental contexts, regardless of whether people gathered for creativity work tasks, to jointly watch a movie, or to participate in an educational service.

Moreover, this higher level of social presence translates into more positive outcomes in our studies in many of the experimental contexts and for several different outcome categories as sources of value, consistent with P2. Consequently, when accessed via virtual-reality headsets the metaverse’s social presence is a potential value source. Specifically, we find that higher levels of social presence result in a higher amount of interaction in most settings; the level of social presence also positively influences interactants’ evaluations and emotions in most of the settings. Noteworthy exceptions are the group-level findings of Study 1 for creativity and those of Study 2 for productivity, neither of which is significantly affected by social presence. We also find that social presence does not affect outcomes in Study 5, which might be due to the brevity of the RMSIs in that context.

The results for exhaustion are consistent with P3, as interactants who meet in the virtual-reality metaverse indeed experience more exhaustion than those who participate in RMSIs on the 2D internet. Exhaustion is greater in the virtual-reality metaverse setting in all studies but Study 5, which suggests that exhaustion caused by virtual-reality headsets may require a certain time to unfold, and the short Study 5 did not pass that threshold. We also find that exhaustion negatively affects several interaction outcomes, as we propose in P4, though the effect is most pronounced for emotion outcomes. In the joint movie-watching context of Study 3, exhaustion also worsens interactants’ evaluations of the films and the atmosphere of the experience. The finding that exhaustion is positively associated with interactants’ group identification evaluation in Study 1 is noteworthy, as it might indicate a “bonding” nature of jointly experienced exhaustion in the metaverse in some situations.

The results for physical mobility, the third proposed mediator, are only partially in line with both P5 and P6. We find that interactants’ physical mobility is greater in both work-related contexts (Studies 1 and 2), as theoretically argued, while it is about the same in the two customer–employee interface studies (4 and 5) and lesser in the joint consumption context of Study 3. Thus, rather than increasing interactants’ physical mobility in all situations, the high-fidelity nature of virtual-reality headsets might resemble real-world behavior, which in a movie-watching setting would induce participants to focus on the screen and the films shown, while reducing their mobility. We find some effects of physical mobility on interaction outcomes, but in the majority of constellations, more physical mobility does not lead to more positive outcomes in our studies.

The direct effects of the setting on interaction outcomes provide additional insights. For four of the five studies, we find negative direct effects of the metaverse accessed via virtual-reality headsets (vs. the 2D internet), in addition to the effects mediated by the intermediate conditions in the theoretical framework, while we find no positive direct effects at all. Negative direct effects hurt the amount of interaction in three studies (Studies 2–4), interactants’ positive affect in two studies (Studies 4 and 5), and their anticipatory positive emotions in one (Study 4). We find the largest number of negative direct effects in the service setting of Study 4. Two key insights from these direct effects are that the RMSI environment influences users’ responses in more ways than captured by the intermediate conditions in our tentative framework and that these additional effects are more pronounced in some contexts than in others.

Additional analyses

Habituation

 Our series-of-studies design enables us to shed some light on people’s habituation to RMSIs in the virtual-reality metaverse. While we capture potential abnormal effects related to the first-time usage of the new technology (e.g., initial excitement, knowledge deficits) with our prestudy, we still expected habituation to occur as a result of the technology’s continued use (see also Han et al., 2022).

To determine whether such habituation might affect the value contribution of the virtual-reality metaverse for RMSIs, either positively or negatively, we pooled the data of all five studies. Because we were interested only in habituation effects for virtual-reality technology (RMSIs on the 2D internet were already an inherent part of students’ lives when we administered the tasks and activities), we used only the virtual-reality metaverse subsample for this analysis. However, to isolate the habituation effect from other effects occurring over time, we subtracted the average value of the 2D internet subsample for each dependent variable from the rating of each individual participant of the virtual-reality metaverse subsample for that dependent variable in the respective study. Thus, our value for the dependent variable is the deviation of the value in the virtual-reality metaverse setting from the variable’s baseline level in the 2D setting.

As the independent variable, we used the number of the respective study (1–5) for each observation and included the duration of each study as a control in each model. As dependent variables, we used the three intermediate conditions and the four RMSI outcomes we measured in at least three of the five studies (i.e., amount of interaction, group identification, group atmosphere, and anticipatory positive emotions). To acknowledge the hierarchical nature of our data set that contained repeated observations for each participant, we followed the established approach of Allenby and Rossi (1998) and specified a linear mixed-effects (LME) model, in which we also included the length of each study as a covariate. LME models are especially suitable to control for the nested structure of our data by accounting for the multiple observations of each participant with the help of subject-specific fixed effects and subject-specific random intercepts (for more details, see Kupfer et al., 2018). Sample sizes for the analyses ranged from 281 to 403, reflecting the number of studies in which a dependent variable was used and the number of study participants who took part in these studies. We ran estimations with the help of the LME4 package in R (Bates et al., 2015).

We report the results in Table 3. We find habituation effects for two of the three intermediate conditions and for all four interaction outcomes. For the intermediate condition constructs, we find that the increase in social presence that interactants gain from meeting in the virtual-reality metaverse versus on the 2D internet becomes lower over the course of the studies while virtual-reality-specific exhaustion tends to decrease with the number of studies. Interactants’ physical mobility does not change systematically.

Table 3 Overview of generalized linear mixed model analyses for habituation effects on relevant outcome variables

For interaction outcomes, we find a systematic pattern that is not in favor of the virtual-reality metaverse: all outcome variables—namely, the amount of interaction (performance outcome), group atmosphere and identification (evaluation outcomes), and interactants’ anticipatory positive emotions (emotions outcome)—decrease with the number of past experiences. Despite the clear pattern we find, the studies’ lengths, and participant fixed effects, these findings should be taken only as early indications given the heterogeneous nature of the study tasks and activities.

Non-virtual-reality metaverse setting

 When comparing the 2D internet with the non-virtual-reality metaverse setting in which interactants access their groups via a PC (vs. virtual-reality headset), we find that the results differ quite strongly from those of our main analyses. Importantly, we find that the non-virtual-reality metaverse setting is inferior to the 2D internet in terms of total effects in multiple constellations. Nine total effects are negative across four of the five studies (vs. only two of such effects for the virtual-reality metaverse), while the non-virtual-reality metaverse does not outperform the 2D internet in a single constellation.

Why is this the case? While the virtual-reality metaverse dominates the 2D internet in terms of social presence and physical mobility during RMSIs, we find lower levels of social presence (in two of the five studies) and also for physical mobility (in four of the five studies) when RMSIs in the metaverse are accessed via a PC instead of a virtual-reality headset. That exhaustion is greater in only one study in this setting (vs. in four in the virtual-reality metaverse setting) does not compensate for the absence of social presence–induced positive effects in this setting, which appears to strictly limit its value.

Finally, we also compared the two metaverse settings. The results suggest that under the conditions of our studies, accessing the metaverse via virtual reality is largely superior in terms of value creation than doing so via the PC. While total effects are higher for the virtual-reality metaverse setting than the non-virtual-reality metaverse in seven constellations, in no constellations does the non-virtual-reality metaverse setting lead to a more positive outcome than the virtual-reality metaverse. Again, this is because of higher level of social presence in the virtual-reality metaverse setting (all studies) and more physical mobility (four of the five studies, with the exception of the movie-watching context of Study 3). The finding that the metaverse when accessed via virtual-reality headsets causes more exhaustion in all studies except Study 5 does not compensate for these effects. We report the detailed parameters for both comparisons in Web Appendix G.

Our empirical probes into RMSIs in the virtual-reality metaverse are not formal tests of the tentative framework. Among other things, the specific contexts and corresponding tasks and activities used in the experiments and the homogeneous composition of participants limit the generalizability of the reported results. Nevertheless, by giving more support to some proposed framework paths than to others, the findings help deepen our understanding of the value-creating potential of the metaverse as a new environment for RMSIs. We now make use of these insights by offering a refined version of our theoretical framework to provide a roadmap for future research on RMSIs in the metaverse.

Step 3: A refined theoretical framework and research roadmap

While financial analysts argue that the metaverse will become a multitrillion dollar industry, mainly as a substantial share of human activities shifts from the 2D internet to the new computer-mediated environment accessed by virtual-reality headsets (Sheridan et al., 2021), our extensive empirical probes suggest that value creation in the metaverse is not trivial. Drawing on the results gathered and reported above, we now synthesize theoretical logic and empirical insights to provide guidance for the future exploration and use of the metaverse. Figure 2 depicts a refined version of our theoretical framework, in which we made adjustments with regard to mediating forces and also added moderators and assigned a particular role to the format of social interaction.

Fig. 2
figure 2

Refined theoretical framework as foundation for research roadmap

This refined framework serves as foundation for a roadmap for future research. In addition to research avenues that correspond with the different elements of the framework, we also broader our perspective beyond the factors that influence value creation and suggest business areas for which research on RMSIs in the metaverse could be particularly powerful, as well as societal aspects of particular importance that might result from a shift toward the metaverse. Table 4 depicts our roadmap and lists illustrative research questions. Beyond inspiring scholarly research, we consider our roadmap to be informative also for (marketing) managers and policy makers.

Table 4 A roadmap for future research on RMSIs in the metaverse

Theorized and additional mediating forces

Our tentative theoretical framework proposed three user-sided conditions that mediate the impact of the virtual-reality metaverse on interaction outcomes. Our empirical results underscore the critical role of one of them (i.e., social presence), which both theoretical logic and empirical probes stress as a major differentiator of the metaverse when accessed via virtual-reality headsets compared with either the 2D internet or approaches to access the metaverse with a computer. We suggest that social presence is the pivotal construct for future research on computer-mediated RMSIs and call for future studies to shed light on its determinants in a metaverse context. Such studies might include people’s previous exposure to technology as well as personality and meeting characteristics (e.g., length). As our probes show that social presence is more closely linked with some outcomes than others, better understanding these variations is desirable. For example, why do we find no (direct) links with groups’ creativity and productivity performance outcomes? Is there a threshold in terms of meeting time for social presence to unleash its power?

Our framework also proposes that exhaustion is greater for those interacting in the metaverse when accessing it via virtual-reality headsets (vs. those who do so on the 2D internet, e.g., via Zoom), which shall also hurt interaction outcomes. Our empirical probes indeed show that the use of headsets is associated with higher exhaustion, but regarding outcomes we find exhaustion to matter only in some of the constellations, particularly those involving emotional outcomes. At the same time, we uncover constellations in which exhaustion exists among interactants, but does not affect outcomes. Moreover, we find signs that interactants’ habituation regarding virtual-reality headsets affects their exhaustion. Thus, further research on the role of exhaustion is certainly warranted. For example, knowing the sources of exhaustion could also help predict how it will affect RMSIs when virtual-reality technology advances further. Can digital “teleportation,” when interactants change the position of their avatars by using the buttons or joysticks of their controller, or software-based solutions (e.g., HyperJump; Hector, 2022), which aim to mitigate users’ mismatch between visual stimuli and corresponding sensory feedback, reduce user exhaustion? To what degree does exhaustion differ between headsets, and how will more powerful hardware iterations affect it?

Regarding our third proposed mediator, we find tentative evidence that the degree of interactants’ physical mobility is indeed higher in the virtual-reality metaverse than on the 2D internet. However, few links between physical mobility and interaction outcomes are significant in our probes, despite the solid foundation of embodied cognition theory. It seems that factors exist that prevent physical mobility from exerting interaction outcomes and limit its value-creating role for RMSIs in the virtual-reality metaverse. Perhaps physical mobility during RMSIs comes with certain downsides for interactants, such as the distracting nature of the additional stimuli they perceive when moving or looking around from their task instead of sitting stationary in front of their PC monitors or the limited ability of using a second screen when being mobile in the metaverse with virtual-reality headsets. Scholars and managers might search for strategies that help interactants harvest the potential of physical mobility in RMSIs while suppressing its “dark side.” However, until more is known about the role of physical mobility for RMSIs in both the metaverse and on the 2D internet, other aspects and concepts might deserve more attention.

We found direct negative effects of the virtual-reality metaverse setting in several of our studies, which indicates the existence of additional intermediate conditions. Shedding light on these would advance the understanding of barriers and limitations of using the metaverse for RMSIs. Informal feedback from our participants during debriefing and from colleagues suggest that interactants’ separation from their physical worlds and self-presentation issues are intermedia conditions that can contribute to the understanding of RMSI effects. While virtual-reality technology now provides high-fidelity depictions of simulated worlds, headsets usually fully separate users from their physical worlds, which certain interactants might consider detrimental. While some consequences might be functional (restricted access to “real-world” resources such as computers and smartphones, but also to beverages and food), others might be physiological, such as a perceived loss of control over what is happening in an interactant’s physical environment. Can virtual-reality technology (e.g., by offering virtual keyboards and “pass-through” visibility modes) mitigate such detrimental effects?

Another limitation of the effectiveness of RMSIs in the metaverse that we do not address in our initial version of the framework might stem from the use of avatars versus the display of the actual interactant on 2D videoconferences. Does being represented by avatars conflict with people’s need to present themselves in a desired way, with identity being “a function of the story that [they] construct about [themselves]” (Battersby, 2006, p. 27)? What role does self-presentation through avatars play in different contexts such as gaming (Vasalou & Joinson, 2009) versus work? Do avatars restrict interactants’ social relations in terms of building trust and rapport, which require emotion contagion processes (e.g., Hennig-Thurau et al., 2006)?

Moderators

The aim of this research is to facilitate the understanding of the main effects and fundamental mechanisms at play when RMSIs take place in the metaverse instead of on the 2D internet. Beyond that, our empirical probes into RMSIs in different computer-mediated environments reveal contextual differences in how RMSIs in the metaverse accessed via virtual-reality headsets affect intermediate conditions and interaction outcomes. We find empirical indications of the moderating role of four categories of variables that we believe deserve particular attention.

Life context

 Our probes show that interactants’ physical mobility is mostly higher in the virtual-reality metaverse, but we observe less physical mobility in a movie-watching context; in this context, physical mobility is also associated with less, not more, positive emotions. Exhaustion also seems to affect outcomes most strongly in the movie-watching context. Both findings stress the role of hedonic versus utilitarian activities for user responses to RMSIs in the metaverse. The results also suggest that effect patterns differ with work tasks: physical mobility triggers positive emotions when the interactants’ task is to be creative, while the link is nonsignificant for a productivity task.

Time/habituation

 Our probes suggest that interactants’ responses to the metaverse change with time, which echoes findings of information systems scholars (e.g., Han et al., 2022). However, while habituation might be beneficial for RMSIs in the metaverse when accessed with virtual-reality headsets, as exhaustion appears to decrease over time, our results also indicate negative habituation effects on interaction performance, evaluations, and also emotions. As most firms and also consumers would likely plan for longer-term, repeated use of virtual-reality headsets, such developments would be cause for concern. Thus, future research on habituation and its role for value creation in the metaverse is certainly warranted, particularly as we cannot rule out that the order in which we conducted the different experiments might have influenced habituation.

Technology

 Although all the virtual-reality apps selected for this research were market leaders in the respective study context and similarity exists among them in multiple respects, they differed in several facets, including avatars, aesthetics, and functionality. We assume that these differences might have shaped some of the findings and that our findings should not be generalized to other apps without closer investigation. Regarding avatars, which vary in their respective level of realism and emotional expressiveness, research in information systems (e.g., Yoon et al., 2019) and also marketing (Miao et al., 2022) offers a good starting point for understanding how avatar features might influence the perception, evaluation, and behaviors of those who maneuver them and also those with whom avatars interact. A related issue is avatar interoperability across apps. While cross-platform avatar systems such as Ready Player Me have benefited from high investment over the past years (Fink, 2022), uncertainty remains about whether users like to have a single virtual identity across metaverse contexts or instead prefer context-specific avatars when participating in RMSIs at work and in their leisure time, similar to people dressing and styling differently depending on the occasion in the physical world (Preda & Jovanova, 2013).

The apps we use in our probes also differ in terms of aesthetics, which pertain to the design of the environment and its perceived attractiveness to interactants. Marketing research has extensively examined the behavior-inducing role of aesthetics in physical (Turley & Milliman, 2000) and digital (Vilnai-Yavetz & Rafaeli, 2006) environments. Which of these learnings can be transferred to virtual environments is unclear yet, as is what unique aesthetics dimensions influence the value creation of RMSIs in the metaverse. Relatedly, metaverse apps vary in their functionality, which is characterized by the potential of avatar-to-avatar and avatar-to-environment interactions. The information systems literature on human–computer interactions might offer a good starting point for research on how the different functionalities influence the attractiveness of virtual-reality technology for users (e.g., Dix et al., 2004).

Furthermore, our additional analyses stress the role of the hardware technology interactants use as a gateway for the metaverse, as we show that results differ substantially between virtual-reality headsets and 2D computer monitors as interface technologies for the metaverse. This finding should sensitize metaverse scholars to the role of hardware in general. In the past, researchers have used a variety of devices when studying virtual reality, most of which lack the characteristic features of high-fidelity, room-scale hardware (e.g., Meta Quest 2). We strongly urge metaverse scholars, as well as reviewers and editors, to contextualize metaverse-related findings with regard to the technology used and avoid misleading generalizations of findings. For example, we propose and find empirical support that interactants’ social presence is higher in meetings in virtual environments than when meeting on the 2D internet (e.g., Zoom), something that studies using low-fidelity hardware have not found. The explanation for this new insight is in our empirical design: We find that while virtual-reality headsets outperform Zoom in terms of social presence, the non-virtual-reality metaverse setting (which has often been used in research as a proxy for headset usage, labeled “desktop virtual reality”) tends to trigger less social presence than Zoom.

Interactants

 In addition to mean effects, our empirical probes reveal substantial heterogeneity among interactants with regard to their response to RMSIs in the virtual-reality metaverse. For example, we find that, despite the demographic homogeneity of our student sample, the standard deviation for exhaustion across studies is 1.7 (on a 7-point scale) among participants who accessed the metaverse via virtual-reality headsets. While 13% report very high levels of exhaustion (average score of 6 or higher), 25% experience very low levels (average score of 2 or lower). For other framework constructs, including social presence, we find similar levels of heterogeneity among participants. How do these interpersonal differences affect the paths of our framework, and what are their drivers? For example, technology acceptance and readiness research has stressed the importance of technology users’ attitudes toward technology. Given the relative newness of virtual reality and its complexity, we believe that this attitude will influence value creation in RMSIs in the metaverse. In this context, we encourage metaverse scholars to use more diverse samples to learn about the role of interactant characteristics for value creation, something that would also contribute to the further development of a robust and generalizable theory of RMSIs in the metaverse.

In addition to these categories of potential moderators, one could also argue that the intermediate conditions of our framework may also moderate other paths of the framework. For example, interactants’ exhaustion might limit interactants’ capability to experience social presence when participating in RMSIs. However, when we ran additional OLS regressions using the data of our experimental probes (with one intermediate condition serving as DV in each analysis and interaction terms of the computer-mediated environment and the respective other two intermediate conditions as IVs), we found no empirical support for such moderating effects of our framework mediators.

Interaction formats

In our empirical probes of RMSIs in computer-mediated environments, we focused on a single constellation: interactants always addressed a clearly defined task or activity and gathered with a small and predetermined group of others whom they had met before. While this constellation is common across life contexts, other prominent constellations also exist for RMSIs. We assume that the RMSI constellation can both act as another moderator of framework paths and shape the functioning of RMSIs in the virtual-reality metaverse far beyond such a moderating role.

Consider, for example, RMSIs that take place when people meet randomly, without a clear purpose or task, unlike in our empirical probes. Such RMSIs occur in workplace cafeterias, in the hallways, or at the watercooler, but they also do so on the street or the subway. Such unplanned encounters are considered a source of value creation (Lin & Kwantes, 2015), as they provide social value for interactants, but also because of the “serendipity effects” they carry by introducing innovative thinking and ideas. The value potential of the 2D internet appears limited with regard to facilitating such unplanned meetings, while the addition of a spatial dimension could help the virtual-reality metaverse do just that. Given the often limited net value advantages we encounter in our probes, we wonder if the added value of RMSIs in the virtual-reality metaverse is systematically higher for such kinds of interaction formats. As such, understanding when and how the metaverse can create value through unplanned meetings would be a worthy extension of this research.

For the specific interaction format we chose, our probes indicate that certain characteristics of the task/activity, such as its length, affect the framework paths. Specifically, we find few links between mediators and outcomes in the case of the shortest activity (Study 5); even social presence, though being higher in the virtual-reality metaverse, does not affect outcomes here. Thus, a systematic understanding of how RMSI characteristics (e.g., meeting length) influence the framework paths is desirable.

Another exciting avenue in the field of interaction formats is the meeting of “artificial others.” While our framework and probes build on the assumption that all interactants are humans, how would value creation in RMSIs in the metaverse differ if some interactants are AI-powered, something known as “non-playable characters” to gamers? Does their mere presence increase the attractiveness of metaverse worlds (as in preferences for restaurants that are not empty), or does it cause feelings of eeriness? To address this issue, scholars might draw on recent findings on interactions between humans and AI-powered chatbots and robots (Huang & Rust, 2021a, 2021b).

Beyond the framework: Business areas and societal impact

Finally, the radical newness and potentially disruptive nature of RMSIs in the metaverse also raise questions that go beyond the elements of our framework and their contribution to a rich understanding of the value creation process. Specifically, the identification of promising business areas for RMSIs in the metaverse as well as their societal impact should warrant particular attention. All three basic life context we consider in this research include business areas which we consider as particularly well-suited for shifting RMSIs from the 2D internet to the metaverse (Hennig-Thurau & Ognibeni, 2022). In the work context, many pioneering applications are linked to team building and employee onboarding, drawing on the social presence and emotions potential of metaverse gatherings, while others use the metaverse for employee training due to similar purposes. We also envision creativity-targeted innovation tasks, such as design thinking meetings, as well as also making reliable predictions particularly for "social products” as promising areas.

In the context of joint consumption, entertainment offerings such as the movie-going setting we studied and gaming are natural choices for research on RMSIs in the metaverse. We furthermore consider joint shopping as a core business area, given the enormous economic relevance of retailing and the prominent, but somewhat underesearched role of companions for consumers’ shopping behavior. As in the physical world, brands, branded products, and branded environments (such as “Nikeland”, a virtual world in the Roblox metaverse) will play a major role for consumers’ metaverse behaviors, something we need to understand much better. At the customer-employee interface, we consider personalized service encounters to be promising business areas in several industries where deep exchanges take place, including education, health care, and financial services. Moreover, we suspect that three-dimensional presentations of products in the metaverse can enrich sales interactions, something that might be particularly attractive for complex business-to-business products which prospects can virtually experience via headsets together with a salesperson.

For all these areas, weighing the added value of metaverse engagements with their costs (e.g., for equipping employees with headsets, for building or renting virtual spaces) will be a crucial task for scholars and managers. While hardware accessibility is mostly a matter of costs for work contexts, RMSIs which involve consumers also have to account for the (lack of) availability of virtual-reality headsets among the target group. Thus, understanding headset adoption among consumers should be of interest for all metaverse initiatives in the area of joint consumption and those at the customer-employee interface; it needs to account for the social nature of the metaverse which suggests network effects to determine headset diffusion (Gustafsson, 2022). Do “hybrid” approaches, which allow accessing the metaverse from different devices (e.g., headsets as well as PCs), overcome the adoption challenge? While our findings on non-virtual-reality metaverse settings are not encouraging, some metaverse apps that employ such a “hybrid” strategy (e.g., VRChat and Rec Room) appear to be quite successful with it.

Research should also take a thorough look at how RMSIs in the metaverse will affect our societies. Scholars might use social media and its developments as a starting point for understanding how RMSIs in the metaverse will influence society. Large-scale negative outcomes, such as privacy violations, harassment, and other unethical, violent, or abusive practices certainly exist, and the new environments’ 360-degree nature might only add to their intensity and impact. Especially with the rapid advancement of artificial intelligence and the possibilities to develop deep fakes, virtual reality may threaten to be the next playground for disinformation campaigns, in which people use other people’s avatars to participate in harmful activities as part of RMSIs. With the enormous amount of data being generated when people stroll through virtual worlds, the risks of privacy misuse grow exponentially in the metaverse.

Scholars should explore how such developments can be prevented or at least mitigated. What safety mechanisms should be implemented to prevent the metaverse from becoming a three-dimensional dark web? What role should metaverse builders such as Mark Zuckerberg’s Meta play versus the role of governmental actors? Would open standards and interoperability be helpful, or would they rather support the monopolization trends inherent in the network economy of the metaverse? Must hardware (e.g., headsets) and software (e.g., apps) operations be split? By probing these issues, marketing academia can help societies’ decision makers develop a clear understanding of the risks associated with the metaverse.

Meanwhile, RMSIs in the metaverse may also provide unique chances for societies. They could be helpful in overcoming national borders and gaining the opportunity to become part of fruitful exchanges with people from different cultures, backgrounds, and nationalities in various virtual “locations,” without the need to travel. When exploring the metaverse ourselves, we have had inspiring encounters and met wonderful people. To fully develop such opportunities, future research could shed greater light on how the metaverse can be designed to help foster tolerance and understanding and build and maintain “virtual” relations.

In summary, this research takes a first significant step toward a theory of value creation of RMSIs in the metaverse, as a new computer-mediated environment accessible via virtual-reality technology that is set to challenge RMSIs on the 2D internet along with activities in the physical world. We enriched a tentative theoretical framework of the effects of RMSIs in the metaverse when accessed via virtual-reality headsets, versus those on the 2D internet, on interaction outcomes with extensive field-experimental probes. Doing so enabled us to shape our theoretical considerations and to develop a refined version of the theoretical framework. We use this refined framework to lay out a roadmap for future research on RMSIs in the metaverse, which includes mediators, moderators, and interaction formats, but also suggests selected business areas of particular interest and societal issues that we believe deserve the attention of those who study the metaverse.