Introduction

Aim and motivation of this study

The quality of a mediated social communication experience depends on the extent to which one feels like being physically together (spatial presence) and having an affective and intellectual connection (social presence) with another person. Unfortunately, measuring the quality of a mediated social communication experience is not straightforward, since the concepts of spatial presence [1,2,3] and social presence [4,5,6] are both ill-defined. Both concepts have been operationalized in many different ways, adopting various aspects like involvement, engagement, attention, transportation (co-location), social richness (salience, mutual awareness), realism, and social actors [2, 5]. While there are many different spatial and social presence questionnaires [4, 7,8,9], most are constructed ad-hoc for particular research purposes and contain context, content, and user-dependent items. Only very few have been validated and contain neutral questions [10]. To achieve unification and standardization of concepts and measures, there have recently appeared calls for community efforts to validate already existing questionnaires [6] and to perform a meta-analytic review to identify the relative contribution of different items to the sense of social presence [5]. In response to these calls, we propose a general and unifying multilevel scale approach that addresses each of the relevant psychological processing levels in the human brain that contribute to the senses of spatial and social presence. For each level, we suggest associated items formulated in a general manner. In contrast to our holistic approach, existing questionnaires typically address only a subset of these levels, mostly in an ad-hoc and task-specific manner. In turn, we call upon the community to apply (and possibly update) our proposed multilevel scale in combination with existing (spatial and social) presence questionnaires. In this way, it will become possible to establish the relative contribution of the individual items from existing questionnaires to each of the different subdimensions of the multilevel scale proposed here. A full validation, including the convergent validity of our proposed scale, may ultimately lead to the unification and generalization of the different assessment tools, where items of existing scales may become subitems of the more generally formulated scales of our holistic social presence questionnaire.

Mediated social communication

Humans have an inherently social and personal need for communication to maintain their interpersonal relationships and mental wellbeing [11]. In our digital age, human social communication is often mediated. Technologies like videoconferencing software (e.g., Zoom, Microsoft Teams, Skype, etc.) are becoming increasingly popular as they afford a new form of virtual togetherness by facilitating shared and synchronous social activities, thereby substituting face-to-face (F2F) interactions [12, 13]. New immersive (VR, AR, or MR-based) communication systems extend regular video- or audio-conferencing tools by affording social experiences that more closely approximate the experience of F2F meetings. Sophisticated capturing, modeling and rendering techniques afford high-fidelity shared mediated experiences of remote communication partners and their physical environment [14,15,16,17,18,19]. For instance, VR-based collaborative communication systems can represent their users either as computer-generated avatars or as photorealistic point clouds and place them in shared virtual spaces in which they can interact and communicate [20]. The same holds for systems that take in other positions on Milgram’s reality-virtuality continuum [21], like AR, MR and augmented virtuality (AV) platforms that afford the blending of high-fidelity representations of remote users into shared collaboration spaces in which they can interact with the local users. Extended reality (XR, i.e., AR, VR, or MR) based communication systems attempt to merge the physical world with digital information (e.g., the mediated representation of the communication partners, elements from their own environment or computer generated objects) while preserving the (multisensory) coherence and plausibility of the overall representation. These systems can give local hosts the impression that their remote communication partners are actually present in their immediate (shared) environment [22, 23]. Systems stimulating multiple sensory channels (mulsemedia systems: [24]) can be particularly effective in eliciting a strong feeling of a shared space.

To develop and optimize social communication systems, there is a need for metrics that allow an efficient and full evaluation of the Quality of Experience (QoE; [23, 25]) of mediated social communication. To enable a reliable comparison of user experiences across systems, contexts and users, these QoE measures should quantify the intrinsic capability of a communication system to provide a compelling social communication experience that feels coherent, realistic, and plausible (see Table 1 for the working definitions of the concepts and constructs used in this study) at all psychological processing levels. They should also be independent of secondary (mediating) factors like context, content and user state and personality. Questionnaires are typically the preferred way to measure the quality of mediated social interactions since they can efficiently be applied to almost any system in any condition [7, 26]. Recent studies have argued that perceived realism, plausibility and coherence are the primary central outcomes of the sensory, perceptual, and cognitive processing layers in the human brain that determine the quality of a mediated experience [2, 3, 27]. However, most currently used QoE questionnaires predominantly measure secondary (content, context or user-dependent) factors like attention, involvement, enjoyment, and the sense of “being there” in the mediated (shared) environment (see Table 2). The sense of “being there” is strongly associated with secondary factors like attentional allocation [28], and is inherently an ambiguous concept [2, 3], that becomes even more ill-defined for systems situated further from the virtual towards the real environment along Milgram’s Virtuality continuum [29]. Hence, to reliably measure the QoE of mediated communication experiences, there is still a need for questionnaires that quantify the degree to which systems can provide experiences that are coherent, plausible, and realistic [2, 3, 9].

Table 1 Working definitions for concepts and constructs used in this study
Table 2 Some of the most influential social and spatial presence questionnaires and the concepts they address

Quality assessment of mediated social communication

Effective mediated shared social communication experiences involve a sense of social presence together with a sense of spatial presence. The sense of social presence consists of two components: copresence [30]: the sense of being physically together with one’s communication partner in the same environment (physical proximity), and social interaction: the sense of having an affective and intellectual interaction with one’s communication partner [8, 31, 32]. The sense of spatial presence [33] also consists of two components: telepresence [28]: the feeling of being located in the mediated (shared) environment, and agency [34]: the feeling of being able to act within that environment. The difference between these two concepts is that social presence primarily deals with human–human relations, whereas spatial presence only pertains to human-object relations. Since physical proximity is a factor of social presence [4], feelings of spatial presence may enhance the perception of social closeness and intimacy to others [35, 36]. At the same time, since social information is a powerful driver of attention [37] and attention to the environment is a precondition for spatial presence [38], feeling the presence of others might also lead to increased spatial presence. As a result, spatial and social presence are typically correlated [39, 40]. A valid QoE metric for mediated social communication should quantify both social and spatial presence and their subcomponents.

Social interaction is inherently bidirectional, involving a sense of mutual awareness. A valid QoE assessment tool should therefore also be able to measure both the internal (‘one’s own’) and external (‘the other’s’) assessment perspectives.

The interaction with our environment and the people therein activates different (sensory, emotional, cognitive, reasoning, and behavioral) processing levels in our brain that all contribute to the subjective quality of the experience [41,42,43]. A valid QoE metric should therefore describe how a mediated social communication experience affects our brain at each of these different processing levels, and should link these levels to relevant perceptual, affective, cognitive, reasoning, and behavioral outcomes.

Attempts to link QoE to QoS parameters have only had very limited success because QoE is inherently a subjective, multidimensional, and multisensorial construct [44,45,46]. ITU-T [47] Sect. 6.212 defines QoS as “[The] Totality of characteristics of a telecommunica-tions service that bear on its ability to satisfy stated and implied needs of the user of the service.” Note that QoS is defined from a system’s perspective, in contrast to the QoE, which is defined entirely from the user’s perspective. QoS evaluations therefore typically rely exclusively on system performance parameters and metrics, such as bandwidth, latency, jitter, throughput, transmission delay, packet loss, etc. [48]. Hence, it is still not clear how QoS parameters relate to the affective, behavioral, and cognitive aspects of a mediated communication experience.

Next to the fidelity of the representation of a mediated environment and the persons therein, the experienced quality of a mediated social communication experience may also depend on highly subjective secondary factors like its personal relevance [49] and the user’s context (e.g., task, available information: [50, 51]), current (mental and physical) state, personality [52,53,54], engagement and involvement (e.g., enjoyment, flow, and mental absorption or attention, [55]). A QoE metric for social communication should primarily address the experiential fidelity [56] of social presence experiences to ensure that its outcomes are relatively independent of such secondary factors. In other words, a QoE metric should quantify the intrinsic capability of a communication system to provide a compelling social communication experience that feels realistic or natural at all psychological processing levels. In agreement with the media richness theory [57, 58] (see also [5]), this requirement is based on the hypothesis that the fidelity of the experience increases with the quality and the capability of the communication medium.

To summarize, a QoE metric for mediated social communication should satisfy the following four requirements:

  1. 1.

    The metric should measure both social and spatial presence and their subcomponents (copresence + social interaction and telepresence + agency),

  2. 2.

    The metric should assess both the internal (‘one’s own’) and external (‘the other’s’) assessment perspectives,

  3. 3.

    The metric should address each of the relevant psychological processing levels (sensory, emotional, cognitive, reasoning, and behavioral), and

  4. 4.

    The metric should measure a communication system’s experiential fidelity, i.e. the system’s intrinsic capability to provide a realistic or natural mediated social communication experience.

A wide range of methods has been developed to measure the sense of (social) presence [7,8,9]. The methods can be classified as objective (instrumental) and subjective (perception-based) measures [25]. Objective measures include biomarkers (e.g., heart rate, EEG and EMG measures, eye tracking [59], skin conductance and skin temperature), behavioral measures (e.g., gaze behavior [60], reflexive responses, postural sway), or measures related to social behavior, task performance and choice-making in the mediated environment [61,62,63]. Objective measures are generally costly and complex and have methodological limitations that do not allow their application in all conditions, while their interpretation is not unequivocal [5, 6]. Subjective measures are typically obtained through questionnaires, self-report ratings, or interviews. Presence questionnaires are still the preferred method of investigation since they are cheap and easy to administer and apply to almost any condition [7, 26]. Also, it has been argued that the use of presence questionnaires remains firmly grounded and legitimized because the sense of presence is the outcome of spatial cognitive processes and determines our reasoning and behavior [64].

In the next section, we first discuss the most widely used and related questionnaires for mediated social and spatial presence, and we identify their limitations for measuring the quality of mediated social communication experiences. In particular, we identify the need for QoE questionnaires that are independent of technology and of secondary factors like context, content and user personality. Then, we propose a new conceptual multiscale quality assessment approach that meets our requirements, and we propose an associated multiscale measurement tool (questionnaire). Next, we present the results of some initial studies investigating the content and face validity of the proposed questionnaire. Finally, we draw some conclusions and discuss the limitations of the new conceptual method in its current form. Although it has not yet been rigorously validated, we make our draft questionnaire available to the community for further evaluation studies and to stimulate the discussion on this topic.

Spatial presence questionnaires

A wide range of methods has been developed to measure the sense of telepresence in a mediated (possibly virtual) environment (for reviews, see [7,8,9]). The most widely applied telepresence questionnaire is the Presence Questionnaire (PQ: [65, 66]). Other frequently used methods are the Slater-Usoh-Steed Questionnaire (SUS: [67]) the Measurement, Effects, and Conditions Spatial Presence Questionnaire (MEC-SPQ: [68]), and the Igroup Presence Questionnaire (IPQ: [34]). While most questionnaires aim to quantify the same underlying construct (typically spatial presence), they differ widely in their scope (since they are based on different definitions of presence) and details (their items and subscales differ largely; for a review, see [7]). The SUS and PQ tap into different aspects of presence. The SUS addresses the user’s sense of being in the represented environment, the extent to which the represented environment replaces the user’s physical environment, and the extent to which the represented environment is remembered as an actual place. The PQ, IPQ, MEC-SPQ and Place Probe [69] also measure the user’s involvement. The PQ is more sensitive for factors related to technology and interaction while the SUS is more sensitive to personal factors [70]. However, both questionnaires are insensitive to variations in the internal consistency or plausibility of a represented environment [71], which is an essential factor contributing to the sense of spatial presence [38]. The IPQ also measures the experienced realism of the environment. The MEC-SPQ and the Place Probe also measure the amount of attention users devote to the represented environment and the quality of their mental spatial model of that environment.

The sense of agency in the mediated environment is typically measured through questionnaire items asking users to rate the extent to which their actions in the mediated space appear natural. Only a few existing presence questionnaires address the sense of agency: the PQ [66] includes six items related to agency, the MEC-SPQ [68] three items, and the Igroup Presence Questionnaire (IPQ: [34]) only one item.

Social presence questionnaires

Next to making strong assumptions about the technology that is used [31], most existing social presence questionnaires only implicitly and incompletely address the different processing levels in the human brain that are involved in mediated social communication experiences [10, 23, 72, 73]. An exception is the Virtual Experience Test (VET, [74]) that provides a more holistic measure of a mediated social presence experience by including affective, cognitive, active, and relational dimensions in addition to its sensory dimension. However, the instrument is designed for the development of virtual environments and games and is not sufficiently general for the evaluation of multisensory social communication systems. While the VET measures the experience of the environment at the sensory, emotional, and cognitive levels, it measures the experienced quality of social interaction only at the behavioral and reasoning levels. The Multimodal Presence Scale (MPS, [75]) measures three components of presence in a mediated environment: physical presence (the experience of the environment), social presence (the experience of the social actors in the environment), and self-presence (the extent to which the virtual representation of oneself is experienced as the actual self). Like the VET, this instrument was designed for the assessment of virtual environments and games, but not for MSC systems. Also, the MPS does not address the quality of social interaction at the emotional and reasoning processing levels. The Networked Minds Social Presence Inventory (NM-SPI, [10]) was specifically designed to measure social presence in mediated communication. It measures social interaction at the sensory, emotional, and behavioral processing levels from both the internal and external assessment perspectives, but contains no items related to the cognitive and reasoning levels. Also, its items measuring copresence do not relate to the sense of physical proximity (being in the same environment). The Social Presence Survey (SP Survey, [72]) measures social interaction from the ‘own’ perspective, explicitly at the sensory level and only implicitly at the emotional and cognitive levels. The Sense of Being Together questionnaire (SBT, [73]) measures social interaction only from the ‘own’ perspective, explicitly at the sensory level and only implicitly at the emotional and behavioral levels. The Social VR Questionnaire (SocialVR-Q, [23]) was designed to investigate photo-sharing experiences in immersive environments. It addresses social presence only from the ‘own’ perspective. Also, it contains no items that tap into the cognitive processing level of social interaction. The ITC Sense of Presence Inventory (ITC-SOPI: [76]) was developed as a standard cross-media presence measurement tool, intended to be usable across different media types, such as television programs or movies. Two of its four subscales (Sense of Physical Space and Naturalness) contain items related to spatial and social presence, while the other two subscales (Engagement and Negative Effects) only address secondary factors (e.g., appeal of the environment, tiredness, headache, eyestrain). The ITC-SOPI measures the experience of the environment explicitly at both the cognitive and behavioral levels and implicitly at both the sensory and emotional levels, but contains no items tapping into the reasoning level. Regarding social interaction, it measures the experienced quality of copresencence both from the ‘own’ and the ‘other’ perspectives, but it has no items that tap into any of the other four processing levels (Tables 3, 4).

Table 3 The Holistic Mediated Social Communication Questionnaire (H-MSC-Q) for measuring the quality of mediated social experiences. The item numbers and their identifiers are enclosed in square brackets. C = construct describing the experience that is to be assessed, Q = questionnaire item used to assess the associated construct
Table 4 Concise version of the Social Presence part of the Holistic Mediated Social Communication Questionnaire (H-MSC-Q) C = construct describing the experience that is to be assessed, Q = questionnaire item used to assess the associated construct

Limitations of existing questionnaires

In this section, we systematically discuss the extent to which existing social and spatial presence questionnaires meet the four requirements for a mediated social communication QoE metric formulated in "Quality assessment of mediated social communication" section (see Table 5). Table 5 shows how ten of the most widely used presence questionnaires tap into each of the five relevant (sensory, emotional, cognitive, behavioral and decision making) processing levels for multisensory environmental stimuli [42], for both Spatial Presence and Social Presence and for both (‘one’s own’ or ‘the other’s’) assessment perspectives. This table also shows whether the items in these questionnaires explicitly (filled circles in Table 5) or implicitly (open circles) address each of these constructs.

Table 5 The relation between some of the most influential presence questionnaires and each of the five relevant (sensory, emotional, cognitive, behavioral and decision making) processing levels for Spatial Presence and for Social Presence

Requirement 1: measure both social and spatial presence

The MPS is the only questionnaire that measures both social and spatial presence and their subcomponents (copresence + social interaction and telepresence + agency). All other questionnaires measure either only spatial presence (SUS, PQ, IPQ, MEC-SPQ, Place Probe) or social presence (SP Survey, SBT, NM-SPI, SocialVR-Q).

Requirement 2: measure both internal and external assessment perspectives

The NM-SPI measures both copresence and social interaction from both assessment perspectives. Social interaction is measured explicitly at the emotional level and only implicitly at the behavioral level.

The MPS explicitly measures copresence from both assessment perspectives. It measures social interaction implicitly and only from the ‘own’ perspective at the cognitive and behavioral psychological processing levels.

The SocialVR-Q measures copresence explicitly from the ‘own’ perspective and implicitly from the ‘the other’s’ perspective. It measures social interaction explicitly at the emotional level from both perspectives, and only from the ‘own’ perspective at the reasoning and behavioral levels.

Requirement 3: measure all relevant psychological processing levels

For Spatial Presence, only the PQ-v.3 and the MEC-SPQ address all five processing levels. However, the PQ-v.3 only explicitly addresses agency and telepresence at the emotional processing level, while the MEC-SPQ only implicitly addresses telepresence at the reasoning level.

For Social Presence, none of the questionnaires measures all relevant psychological processing levels. All social presence questionnaires (SP Survey, SBT, NM-SPI, MPS and SocialVR-Q) measure copresence (typically explicitly, except the NM-SPI). Most social presence questionnaires also measure social interaction at the emotional (except the MPS) and behavioral (except the SP Survey) processing levels.

The SocialVR-Q measures social interaction at three processing levels (all except the cognitive level) from the ‘own’ perspective. The SP Survey, SBT, NM-SPI, and MPS each measure social interaction at two processing levels from the ‘own’ perspective.

Requirement 4: measure a communication system’s experiential fidelity

All questionnaires listed in Table 5 that tap into the cognitive processing level, measure the fidelity of the (telepresence or social interaction) experience at this level. The PQ also measures the fidelity of spatial presence at the behavioral level (i.e., the fidelity of agency), while the SocialVR-Q measures the fidelity of social interaction at the behavioral level. None of the existing questionnaires measures the fidelity of a social communication experience at the sensory or reasoning levels.

Towards a holistic multiscale quality assessment method for mediated social communication

We adopt the feeling that one actually experiences a natural social interaction in a realistic shared environment (i.e., the experiential fidelity) as the overarching (holistic) quality construct for a mediated social communication experience. Thus, we explicitly exclude social communication experiences in simulated settings that afford super-human abilities (e.g., super-hearing, super-vision, teleportation, etc.) to their users. A high quality mediated social presence experience then implies that the communication system provides both a natural sense of spatial presence (with subcomponents telepresence and agency) and a natural sense of social presence (with subcomponents copresence and social interaction), without introducing any idiosyncrasies (sensory distortions) due to system limitations or abnormalities in the mediated representations of the environment and the persons therein.

In the next section we will first discuss an established conceptual holistic framework that describes how multisensory stimulation affects our brain at five different processing levels (sensory, emotional, cognitive, decision making, and behavioral), and we will link these levels to relevant perceptual, affective, and cognitive outcomes. Then, in the following two sections, we will show how this holistic framework can be used to characterize the overall quality of mediated social communication based on social and spatial presence (the Holistic Mediated Social Communication or H-MSC quality assessment method). We also propose an associated tool (the Holistic Mediated Social Communication Questionnaire or H-MSC-Q) that measures the quality of mediated social communication by tapping into each of the five relevant processing levels as defined in the conceptual framework. The H-MSC-Q measures the quality of social communication through (1) the sense of spatial presence (telepresence and agency) in the mediated environment and (2) social presence (copresence and social interaction) with the other person(s) therein. The items in the H-MSC-Q can for instance be scored on 5, 7 or 9-point Likert scales. In practice, a 7-point scale is preferred since it is near-optimal in terms of reliability, validity, discriminating power, and respondent preferences [77,78,79].

A multiscale approach to multisensory perception

The new multiscale approach to the quality assessment of mediated social communication proposed in this paper is based on a holistic model that describes how multisensory stimulation affects our brain at the sensory or perceptual, emotional, cognitive, behavioral, and decision-making levels [42]. This holistic model distinguishes two assessment perspectives, related to the object of focus that is assessed and responded to: an external perspective in which individuals only assess and respond to information in their environment, and an internal perspective in which the internal reaction of the individual to the environmental information is assessed and responded to. For instance, if a person is asked to describe an experience, an internally focused assessment and response follows, e.g.: “I felt excited/stressed”. If a person is explicitly asked to provide an affective evaluation of an object or environment, an externally focused assessment and response follows, e.g.: “This conversation or environment is stimulating/boring”. Both assessment perspectives tap into different processes, as we will discuss next.

The first processing steps of environmental stimuli are mediated automatically and unconsciously through our senses and the primary sensory areas in our brain. In both assessment perspectives, this processing level results in the sensation of environmental stimuli. In these early processing stages, one can, however, already distinguish different processing routes, which are later linked to the different assessment perspectives [80, 81]. One route (that goes through the sensory cortices where feature extraction and sensory integration take place) serves to guide the external focus and performs an assessment of environmental stimuli (‘external assessment perspective’). This processing level involves a subtle interplay of lower-order and top-down processes, steering attention and resource allocation [82, 83]. This internal perspective is mediated by a secondary route via the limbic structures, prominently including the amygdala that affects the arousal level (‘internal assessment perspective’).

The second processing level involves both conscious and unconscious processing. From the external assessment perspective, the integration and interpretation of the sensory information results in a holistic percept (Gestalt) of an object or environment [84, 85], while it results in an emotional experience from the internal assessment perspective [86,87,88]. In this paper, we define an emotional experience or emotion as a short-term state that is directly related to the environmental stimuli.

The third processing level involves higher-order processes for cognitive processing. From the external assessment perspective, the primary outcome is an evaluation or appraisal of the percept [89]. Depending on the task, this appraisal can be affective (like or dislike of a percept) or functional (evaluation of the characteristics of a percept such as strength, size). From the internal assessment perspective, the cognitive processing may result in an emotional response (e.g., conscious feelings or behavioral intentions [90]).

The fourth processing level involves both conscious and unconscious behavioral responses. From the external perspective, environmental appraisals may trigger both highly trained (automated) reflexive behavior or more deliberate (externally motivated) behavioral responses [91]. From the internal assessment perspective, emotions and appraisals may elicit (unconscious or deliberate) approach and avoidance behaviors [92].

The fifth processing level involves decision-making processes. From the external assessment perspective, appraisals trigger cognitive functions such as working memory, reasoning, and planning [93]. From the internal assessment perspective, emotions and feelings drive our judgments and choices [94, 95].

In the next two sections, we will identify the characteristics of a mediated social communication experience that determine its perceived quality, by decomposing the experience into quality features [96] at each of the five relevant processing levels in the human brain [42]. Here we distinguish between quality factors and quality features [25]. A quality factor can be defined as ‘Any characteristic of a system, whose actual state or setting may influence the QoE for the user’ [97]. A quality feature can then be defined as “A perceivable, recognized and nameable characteristic of the individual’s experience of a service which contributes to its quality” [98]. Thus, features can be seen as a dimension of a multidimensional perceptual event. A feature becomes a quality feature when it is relevant for the experienced quality of the event. For the experience of social presence, we will identify associated quality factors and features at each of the five processing levels and formulate questionnaire items that can be used to rate the quality features. Since the new multiscale approach to the quality assessment of mediated social communication proposed in the next two sections is based on experiential fidelity, its associated quality factors are in between objective factors related to a system’s quality of service (QoS: system characteristics) and highly subjective context, task, and mood dependent secondary features like enjoyment, engagement, flow, and mental absorption or attention. This allows the formulation of an associated QoE questionnaire with items that are relatively insensitive for variations across conditions and personalities.

According to the neuroscientific theory of predictive encoding [99,100,101], the brain generates models at each level of perceptual and cognitive processing to predict what information it should be receiving from the level below it (i.e., top-down). The brain then compares the actual bottom-up sensory information with the model predictions. Only discrepancies between both (referred to as prediction errors or surprises) are passed to higher levels where they are used to update the current model or activate an alternative one. Model activation and updates are both directed at minimizing or suppressing prediction errors at a lower level [101, 102]. Note that the order between the different processing levels need not be fixed, and levels may even be skipped [42].

Quality of spatial presence

In this section, QoE will refer to the quality of the spatial presence component (i.e., the environment in which the social communication experience takes place) of a mediated social communication experience.

Sensory level

At the sensory level, the relevant quality factor for telepresence is the perceptual or sensory fidelity of the experience, i.e., the extent to which users fail to perceive or acknowledge the fact that (part of) their sensory input is mediated. Users should preferably experience the feeling that their sensory input originates directly from the represented environment (the illusion of non-mediation: [103, 104]). In other words, they should experience a natural and acute awareness of the (partially) mediated environment. At this level, quality features are related to individual sensory channels, such as visual features, auditory features or tactile features, and may also be linked to the perception via multiple senses in parallel (e.g., audio-visual features; [98]). Example quality features for the visual channel include color naturalness, sharpness, darkness (of black areas), brightness, contrast, flicker, blur, geometrical distortion, and coding and packet-loss induced degradations such as blocking, freezing, and slicing. Examples for the auditory channel include audio-streaming quality parameters like localization and timbre, and speech-transmission quality features like coloration, noisiness, or loudness [98]. At this level, QoE is directly related to the QoS or fidelity of the system mediating the remote or simulated environment [44]. Note that the fidelity of an experience can differ largely between the different sensory modalities. Such inconsistencies can lead to a strong sense of presence in one modality but not in another [105]. For services that address multiple sensory channels simultaneously, relevant features are e.g. balance and synchrony, and a QoE assessment should address the extent to which one feels like being in direct contact with the environment (one’s impression that one directly sees, hears, feels, or smells the environment). At this level, the overall QoE can be assessed by rating a statement like: “I feel in direct contact with the environment” (item 1 in Table 3).

Affective/emotional level

At the affective or emotional level, the relevant quality factor for telepresence is the internal plausibility or sensory congruity [106] of the experience, i.e. the extent to which users have the feeling that their multisensory input is coherent [71] and agrees (is congruent and consistent) with their mental model (expectations or memories) of the represented environment [38, 106, 107]. Hence, internal plausibility refers to the extent to which an experience is consistent within itself or with respect to the expectations raised by its genre [107]. The relevant quality feature at this level is the semantic consistency and congruency between all sensory signals, and the QoE can be quantified by rating a statement like: “My sensations are consistent and agree with the represented environment” (item 2 in Table 3).

Cognitive level

At the cognitive level, the relevant quality factor for telepresence is the external plausibility or environmental and thematic congruity [106] of the experience, i.e., the perceived fidelity [108], realness [3, 61] or illusion that the represented environment is authentic [109] and a place that can actually be visited [105, 110]. Hence, external plausibility refers to how consistent an experience is to the users’ real-world knowledge [107]. At this level, the QoE can be quantified by rating a statement like: “The represented environment appears real” (item 3 in Table 3).

Reasoning level

At the reasoning level, the relevant quality factor for telepresence is the degree of realism of the multisensory representation of the mediated environment [2]. A multisensory representation of the mediated environment with a high degree of fidelity and realism is expected to influence one’s reasoning in a similar way as its unmediated counterpart. At this level, the QoE can be quantified by rating a statement like: “The environment affects my thoughts as its real counterpart would” (item 4 in Table 3).

In the Spatial Presence subscale of the H-MSC-Q we collapsed both environmental assessment perspectives into a single item at each processing level. For this subscale, maintaining a distinction between the items tapping into the internal and external assessment perspectives on the environment would have resulted in items with only slight nuances in their formulation (asking people to assess either the capability of the environment to evoke their response or to assess their actual response to the environment on different processing levels). This would make these items hard to distinguish and would, therefore, most likely yield similar responses (not understanding the difference between the items, people would probably give the same answer to both items). Since the different perspectives are so closely linked, we believe this reduction in the number of items will not result in a significant loss of information on the experience of mediated social communication.

Behavioral level

At the behavioral level, the relevant quality factor for agency is the degree to which the mediated environment affords natural behavior without any limitations or restrictions, i.e., the feeling that one can interact with objects and persons in the represented environment as in reality. At this level, the QoE can be quantified by rating a statement like: “My interaction with the represented environment feels realistic” (item 5 in Table 3).

Quality of social presence

In this section, QoE will refer to the quality of the social presence component of a mediated social communication experience. Social presence inherently involves a bidirectional exchange of physical and emotional signals. Since the difference between the internal (‘own’) and external (‘the other’) assessment perspectives can be clearly formulated for social interaction, the distinction in both perspectives is maintained for the social presence subscale of the H-MSC-Q (see Table 3). However, by emphasizing the bidirectionality in the formulation of the items of this subscale, both assessment perspectives can also be collapsed into a single one to obtain a more concise version of this subscale (see Table 4).

Sensory level

At the sensory level, system factors should not affect the sensory impression that people have of one another, i.e., users should have the impression that they are in direct contact with each other (physical immediacy or the illusion of non-mediation [111]). At this level, the relevant quality factor for copresence is the feeling that the represented individuals are in one’s physical proximity or direct influence sphere (the feeling that one can make direct physical contact). The QoE can then be quantified from one’s own perspective by rating a statement like: “I feel the presence of the other person(s)” (item 6 in Table 3), and from the other’s viewpoint by rating a statement like: “The other person(s) appear to feel my presence” (item 7 in Table 3). Both perspectives can be assessed simultaneously by rating a statement like: “We feel each other’s presence” (first item in Table 4).

Affective / emotional level

At the affective or emotional level, the mediation process should not degrade the feeling of intimacy [111], i.e., the mediated representation of an individual should convey and evoke similar emotions as its unmediated counterpart. At this level, the relevant quality factor for social interaction is the feeling that one has an emotional and intellectual connection with the represented individual(s) [112,113,114]. The QoE can then be quantified from one’s own perspective by rating a statement like: “I feel an emotional and intellectual connection with the other person(s)” (item 8 in Table 3), and from the other’s viewpoint by rating a statement like: “The other person(s) appear to feel an emotional and intellectual connection with me” (item 9 in Table 3). Both perspectives can be assessed simultaneously by rating a statement like: “We feel a mutual emotional and intellectual connection” (second item in Table 4).

Cognitive level

At the cognitive level, the mediation process should not affect the natural appearance of the represented individuals (the credibility of their representation). At this level, the relevant quality factor for social interaction is the feeling that the represented individuals should look as in normal life. The QoE can then be quantified from one’s own perspective by rating a statement like: “The appearance of the other person(s) feels normal” (item 10 in Table 3), and from the other’s viewpoint by rating a statement like: “My appearance seems normal to the other person(s)” (item 11 in Table 3). Both perspectives can be assessed simultaneously by rating a statement like: “Our appearance feels normal” (third item in Table 4).

Reasoning level

At the reasoning level, the mediation process should not affect the reasoning processes of the communication partners. At this level, the relevant quality factor for social interaction is feeling that the communication system represents individuals in such a way that they affect one’s thinking as they would in normal life. The QoE can then be quantified from one’s own perspective by rating a statement like: “While communicating, my reasoning feels normal” (item 12 in Table 3), and from the other’s viewpoint by rating a statement like: “While communicating, the reasoning of the other person(s) feels normal” (item 13 in Table 3). Both perspectives can be assessed simultaneously by rating a statement like: “While communicating, our mutual reasoning feels normal” (fourth item in Table 4).

Behavioral level

At the behavioral level, the mediation process should not restrict the natural interaction between individuals. At this level, the relevant quality factor for social interaction is the feeling that one’s interaction with represented individuals is the same as in normal life. The QoE can then be quantified from one’s own perspective by rating a statement like: “While communicating, my behavior feels normal” (item 10 in Table 3), and from the other’s viewpoint by rating a statement like: “While communicating, the behavior of the other person(s) feels normal” (item 15 in Table 3). Both perspectives can be assessed simultaneously by rating a statement like: “While communicating, our mutual behavior feels normal” (fifth item in Table 4).

Content and face validity

Validity is the extent to which an instrument measures what it purports to measure [115, 116]. A full and rigorous validation of the H-MSC-Q requires the assessment of its criterion and construct validity, as well as its sensitivity and test–retest reliability. This will for instance involve (1) repeated application of the questionnaire using the same systems in similar scenarios to assess its reliability, (2) application to different social communication systems to assess its sensitivity, (3) comparison with related questionnaires to assess its convergent validity, etc.

Validation studies are typically performed in an iterative fashion, involving several rounds of review and revision. The H-MSC-Q presented here evolved from an initial version that was reviewed in a previous study [117]. This initial version underwent several rounds of revisions, using input from various user- and expert groups. In this study, we evaluated the content and face validity of the final version of the H-MSC-Q, to assess whether the instrument is comprehensive enough regarding conciseness, completeness, and clarity to establish its credibility.

A full and rigorous validation of this questionnaire will be a major and time-consuming effort, consisting of several phases [78]. In this paper we only performed the first phase, involving content and face validity assessment. Therefore, we make the initial draft questionnaire available to the community in the hope that it may be used in future studies for further review, development, and testing.

Measures

Content validity

Content validity refers to the extent to which an instrument measures all relevant aspects of a given construct (in this study: social presence). Content validity is typically assessed by a panel of experts familiar with the construct of interest. In this study, content validity was estimated both at the item level and at the overall scale level.

At the item level, content validity was rated for each sub-construct in the H-MSC-Q (C’s in Table 3) on a 4-point Likert scale (1 = “not relevant”, 2 = “somewhat relevant”, 3 = “quite relevant”, 4 = “very relevant”: [118, 119]). By classifying ratings of 1 and 2 as “not essential” and ratings of 3 and 4 as “essential”), the four ordinal responses were collapsed into two dichotomous response categories (‘content valid’ and ‘content invalid’). An item-level content validity index (I-CVI) was calculated by dividing the number of experts who rated an item as “essential” over the total number of experts [120,121,122]. Values of I-CVI range between 0 (the item is rated “not essential” by all experts) and 1 (the item is rated “essential” by all experts). For a panel consisting of 18 experts (this study), items with I-CVI values below 0.40 are considered “unacceptable “, those in the range of 0.40—0.59 are considered “questionable (in need of further improvement)”, those in the range of 0.60—0.74 are considered “good”, and those with values of 0.75 or higher are considered “excellent” [120].

Scale level content validity indices (S-CVI’s) were computed both for the Spatial Presence, Internal Perspective, External Perspective, and Social Presence subscales of the H-MSC-Q (see Table 3) and for the overall H-MSC-Q, as the average over the individual I-CVI’s in each (sub-)scale (i.e., the sum over all I-CVI’s divided by the total number of items: [121, 122]). Scales with S-CVI values exceeding 0.80 are considered to have good content validity, while values larger than 0.90 reflect excellent content validity [120].

Face validity

Face validity is the degree to which a measure appears to be related to a given construct in the judgement of both experts and non-experts. Thus, a test has face validity if its content appears relevant, reasonable, unambiguous and clear to the target population. In this study, the clarity of each item in the H-MSC-Q (Q’s in Table 3) was rated on a 4-point Likert scale (1 = “not clear”, 2 = “somewhat clear”, 3 = “quite clear”, 4 = “very clear”; e.g. [123]). By classifying ratings of 1 and 2 as “not clear” and ratings of 3 and 4 as “clear”, two dichotomous response categories were obtained (‘valid’ and ‘invalid’). Item-level Face Validity Indices (I-FVI’s) and scale Face Validity Indices (S-FVI’s) were then computed in a similar way as the I-CVI’s and S-CVI’s. Items with I-FVI values below 0.40 are considered “unacceptable”, those in the range of 0.40—0.59 are considered “questionable (in need of further improvement)”, those in the range of 0.60—0.74 are considered “good”, and those with values of 0.75 or higher are considered “excellent” [120]. Scales with S-FVI values exceeding 0.80 are considered to have good face validity, while values larger than 0.90 reflect excellent face validity [120].

Open remarks

A free text box on the score sheet gave respondents the opportunity to comment on each of the H-MSC-Q items regarding their grammatical construction, simplicity, representativeness, comprehension, or ambiguity, and to suggest modifications and/or additions or deletions.

Interrater agreement

The interrater reliability was quantified through the intraclass correlation coefficient (ICC) with its associated 95% confidence intervals, based on a mean-rating (k = 3), consistency, 2-way mixed-effects model [124, 125], using IBM SPSS Statistics 26 (www.ibm.com). ICC values less than 0.5 are indicative of poor agreement, values between 0.5 and 0.75 indicate moderate agreement, values between 0.75 and 0.9 indicate good agreement, while values greater than 0.9 indicate excellent agreement [125].

Procedure

The content and face validity of the H-MSC-Q were assessed through anonymous online surveys. A cover letter explained the aim of the survey, along with clear and concise instructions on how to rate each item, both for content and face validity. The online survey started with a brief description of a use case, asking the participants to imagine that they that just had experienced a meeting with a remote friend whom they had not seen for a while, using a novel multisensory communication system. This procedure served to provide all participants with a similar and clear mind frame about a possible setting in which the H-MSC-Q can be applied. After reading the introduction, the participants continued their evaluation of the H-MSC-Q by rating either content or face validity for each of its 15 constructs and associated items. Participants in the content validity study were 18 experts in different technologies across the reality-virtuality continuum colleagues (14 males, 4 females, mean age was 38.4 years, ranging from 23 to 70 years). Participants in the face validity study were 21 students and colleagues of the authors (14 males, 7 females, mean age was 26.4, ranging from 19 to 50 years).

Results and discussion

The ICC values for the content (N = 18) and face (N = 21) validity ratings returned by all participants were 0.88 [0.78, 0.95] and 0.76 [0.55, 0.89], indicating good agreement between the different raters.

The I-CVI and I-FVI values of all items in the H-MSC-Q exceed the critical level of 0.75. Hence, the underlying constructs of all items appear to be essential while their associated questions appear to be clearly formulated.

Three items (numbers 4, 11 and 14 in Table 3) obtained a minimal I-FVI value of 0.81. For item 4, four participants remarked that although they understood the construct, they found it hard to imagine how the representation of an environment could distract them from their conversation or otherwise affect their thinking. For item 14, two participants remarked that the distinction between the content (reasoning) and mode (behavior) of the communication could be more clearly formulated. For items 10 and 11, several participants remarked that the word “normal” should be replaced by “familiar”. This suggestion was probably inspired by the use case scenario presented in the introduction of the survey (a virtual meeting with a remote friend). We intentionally used the word “normal” in the H-MSC-Q to make it also applicable for the evaluation of mediated social communication between people who are not well acquainted (or even strangers).

The S-CVI value of the Spatial Presence scale (0.92) exceeds the critical level of 0.90, reflecting excellent content validity. The S-FVI value of this scale (0.89) exceeds the critical level of 0.80, indicating good face validity.

The content validity of Internal Perspective, External Perspective and Social Presence subscales is excellent (all S-CVI values exceed 0.90). The face validity of External Perspective and Social Presence subscales is also excellent, while the Internal Perspective subscale has a good face validity (S-FVI = 0.90).

To summarize, all items of the H-MSC-Q appear to be essential and clearly formulated. All subscales of the H-MSC-Q have excellent content validity, while their face validity ranges from good (Spatial Presence, Internal Perspective) to excellent (External Perspective, Social Presence).

The H-MSC multiscale method and an initial draft of the H-MSC-Q were presented as a poster at the EuroVR 2020 conference [117]. The final version of the H-MSC-Q presented here (see Table 3) evolved from this initial version after an iterative refinement process that involved several rounds of evaluations and discussions that served to improve the relevance and clarity of its questions. As a result, most items in the final version presented here are formulated (slightly) differently as in the initial draft version. The main difference between both versions is the replacement of the term “natural” by “real” (item 3) or “realistic” (item 5), and the term “normal” by “real” (in item 4) in the spatial presence subscale, and the replacement of the term “natural” by “normal” (items 10, 11, 14 and 15) in the social presence subscale. Also, the term “engaged” in items 8 and 9 was replaced by “an emotional and intellectual connection”, since “engaged” refers to a mind-state or intrinsic motivation of the user and does not reflect the system’s capability to afford a true emotional connection.

Conclusions

There is a need for efficient, validated, and standardized measures that fully characterize the QoE of mediated social communication experiences provided by systems on any position along the reality-virtuality continuum, in a way that is independent of secondary factors like context, content and user personality factors. To this aim, we propose a new multiscale approach to the quality assessment of mediated social communication (H-MSC) and suggest an associated questionnaire (the H-MSC-Q). The approach is based on an established conceptual framework for multisensory perception developed by Schreuder, van Erp, Toet and Kallen [42]. Since the multiscale H-MSC approach is based on experiential fidelity, the associated measurements are largely independent of context, media content, and personal factors. It is also technology-independent and can therefore be applied to a wide range of multisensory (visual, auditory, haptic, and olfactory) communication systems along the reality-virtuality continuum. The approach agrees with the latest theoretical insights that perceived realism, plausibility and coherence are the central outcomes of the sensory, perceptual, and cognitive processing layers in the human brain that determine the quality of a mediated experience [2, 3, 27]. In contrast to existing questionnaires, the H-MSC-Q does not rely on ambiguously formulated presence items that have no clear relation to VR/AR/MR experiences [2, 126, 127]. The H-MSC-Q is complete and parsimonious, using only a single item to tap into each of the relevant processing levels in the human brain: sensory, emotional, and cognitive, reasoning, and behavioral. It measures the quality of Spatial Presence (i.e., the perceived fidelity, internal and external plausibility, and cognitive, reasoning and behavioral affordances of an environment) and the experience of Social Presence (i.e., perceived mutual proximity, intimacy, credibility, reasoning and behavior of the communication partners). Initial (Phase 1: [78]) validation studies confirm the content and face validity of the H-MSC-Q.

Limitations

Scale development consists of three phases [78]. In the first or item development phase, items are generated, and their content and face validity is assessed. In the second or scale development phase, items are pretested and exploratory factor analysis is used to reduce the number of items and establish the number of factors. In the third or scale evaluation phase, the dimensionality is tested with confirmatory factor analysis and the scale reliability and validity are assessed. The multi-scale questionnaire proposed in this study has just passed its first stage of development and is therefore not yet fully validated. A full validation of this questionnaire will be a major and time-consuming effort that involves (1) repeated application of the questionnaire using the same systems in similar scenarios to assess its reliability, (2) application to different social communication systems to assess its sensitivity, (3) comparison with related questionnaires to assess its convergent validity, etc. By making the initial draft of our questionnaire available to the community we hope to further its validation and development in a joint effort and to stimulate the discussion on this topic.

In its current form, the multiscale H-MSC quality assessment approach and the associated H-MSC-Q only apply to social communication in (simulated) real-world settings. For certain thematic environments, such as those associated with science fiction or fantasy (that often involve fictional worlds and attribute superpowers to their users), several items in the questionnaire (e.g., external plausibility and agency) may need to be adapted.

To keep the questionnaire concise, high-level formulations were adopted for each of its items. Also, each of the individual constructs of the H-MSC-Q is measured by a single-item scale. Although it has been shown that single-item presence scales can be sensitive, valid, and reliable tools for measuring presence [128,129,130], additional subscales with items that for instance zoom-in on each of the individual sensory modalities (visual, auditory, haptics, olfactory) will be required to analyze the different factors underlying the quality or experience at each of the processing levels in more detail. Such subscales may result from an analysis of the convergent validity of our proposed scale with existing scales.

The H-MSC-Q does not contain items explicitly addressing secondary (content, context or user dependent) factors like appeal of the environment, attention, involvement, engagement, enjoyment, personal relevance, personality, mood, tiredness, headache, eyestrain etc. The H-MSC-Q scales measuring the experienced quality of the sensory fidelity, internal plausibility and agency of the simulation implicitly address each of these issues. For instance, factors contributing to cybersickness are distortions in the mediated representation (e.g., low frame rate, jitter, delay), information mismatches across sensory streams, and conflicts between observed and expected sensory cues (particularly with respect to visual-vestibular cue conflict; [131]). A full validation, including the convergent validity of our proposed scale, can show how existing assessment tools for each of these secondary factors may become subitems of the more generally formulated scales of our holistic social presence questionnaire.

As is the case with any questionnaire-based assessment tool, demand characteristics (implicit and explicit cues that may communicate the aim of the experiment: [132]) may bias user responses. To minimize response bias due to demand characteristics the H-MSC-Q should preferably be applied in naturalistic settings where people are minimally aware of being observed. The experimental procedure should be such that it stimulates a natural conversation between participants. A discussion about the characteristics of the system(s) that are to be judged should be avoided. The system(s) should be presented in a neutral manner (e.g., as an early prototype of an alternative communication mode); it should in no way be advertised as an improved, enhanced, modern, updated communication mode. Experimenters should show no involvement with the new system(s), so that participants have no need to please the observer. Preferably multiple (versions of) systems are tested so that participants will not be biased to one or the other system. Overall, we expect that the questionnaire is not very sensitive to demand characteristics, since it only involves rating the perceived intrinsic capability of a communication system to provide a compelling social communication experience, and the associated task (having a social interaction) does not require any specific behavior or performance on the part of the users.

The H-MSC-Q (Table 6) assesses the perceived quality of a mediated social presence experience through self-report or introspection. Although people are not able to directly observe their cognitive processes (metacognition: [133]), they are quite able to provide introspective reports on their conscious experiences and feelings [134]. Recent hierarchical Bayesian models of multisensory perception even suggest that human observers can introspect not only the final integrated (coherent) multisensory percept but also its constituting (unisensory) estimates and their causal relationships [135] (Table 6).

Table 6 The draft Holistic Mediated Social Communication Questionnaire (H-MSC-Q) as provided on the Open Science Framework (OSF) repository (osf.io/9qkhr)

Availability of the questionnaire

The draft Holistic Mediated Social Communication Questionnaire (H-MSC-Q) is publicly available (both in Microsoft Word and interactive PDF format) from the Open Science Framework (OSF) repository at osf.io/9qkhr with https://doi.org/10.17605/OSF.IO/9QKHR under the CC-By Attribution 4.0 International license. Use is only allowed after complying with the following two conditions: (1) a credit line in publications and presentations reading: “The Holistic Mediated Social Communication Questionnaire (H-MSC-Q) is available from the OSF repository at https://osf.io/9qkhr,” and (2) a citation to the current article in any publication in which the H-MSC-Q is used.