1 Introduction

Audiovisual technology has advanced drastically over the last decade. Spatial audio in conjunction with advanced visual technologies such as enhanced color reproduction and greater dynamic range is witnessing wide scale adoption for domestic audiovisual applications (e.g., gaming, entertainment, broadcast). In addition to the technological progress, the emergence of virtual reality (VR) and augmented reality (AR) is swiftly changing the paradigm for domestic audiovisual experiences.

The vocabulary for describing new audiovisual experiences unlocked by these technologies has evolved as well. Immersion has emerged as the predominant term for describing audiovisual experiences. Nevertheless, the concept of immersionis poorly understood. Immersion is studied in a variety of different field such as film [52, 55, 68], video games [1, 8, 31, 53, 58], virtual reality [24, 41, 46, 51], and music [4, 14]. It is used to describe a large array of experiences that contributes to the ambiguity surrounding the term. Immersion is often considered synonymous to presence and envelopment which further dilutes the concept. A lack of definitional consensus and the interchangeable use with terms such as presence have reduced immersion to an “excessively vague, all inclusive concept” [39]. A formal definition of immersion is a prerequisite for communicating the idea effectively and conducting research on the topic. Thus, the first half of this chapter attempts to formalize the meaning immersion by proposing a definition that has been synthesized from a non-exhaustive literature review of the subject. A wide perspective has been adopted for the proposed definition such that it can be easily adapted for different applications as well as interactive and non-interactive activities.

As technologists, we are interested in enabling experiences with a greater degree of immersion on the premise that more immersive experiences are preferable. This can be achieved by developing a deeper understanding of the various factors that influence immersion and subsequently harnessing their capabilities for delivering more immersive experiences. However, the fundamental challenge in investigating immersion is a lack of methodologies for measuring immersion. To this end, an exploratory study was conducted for quantifying immersion in audiovisual experiences as a first step. The experimental framework detailed in the latter half of this chapter can form the basis for developing experimental paradigms aimed at investigating the impact of immersion’s influencing factors.

2 Conceptualizations of Immersion

Immersion is a complex subject that can have a different meaning depending on the context and the field of study. While the origin of immersion’s conceptualization is unknown, it is agreed that it is a metaphorical term derived from the physical experience of being surrounded by a completely different medium. Murray [43] has provided the following description of immersion:

Immersion is a metaphorical term derived from the physical experience of being submerged in water. We seek the same feeling from a psychologically immersive experience that we do from a plunge in the ocean or swimming pool: the sensation of being surrounded by a completely other reality, as different as water is from air, that takes over all of our attention, our whole perceptual apparatus ([43], p. 99).

The analogy of “experience of swimming underwater” has been open to interpretation as some researchers have approached the topic from a physical perspective (i.e., being surrounded by a different reality) while others view it from a psychological viewpoint (i.e., similar to the metaphorical derivation described by Murray [43] where attention is a factor). The descriptions of immersion appearing in literature can be largely classified into two perspectives: immersion as a psychological experience and immersion as an objective property of the system or the technology that facilitates the experience. A brief introduction to these perspectives and a visual summary of the literature review in this chapter is provided by Fig. 11.1.

Fig. 11.1
figure 1

Adapted from [3]

Structure of the proposed literature review.

2.1 Psychological Perspective

The psychological perspective on immersion states that immersion is the psychological state of the individual when they are mentally involved in an activity [37]. It argues that attention is at the heart of immersion and de-emphasizes the role of the system or the technology that mediates the experience.Footnote 1 Instead, significance is placed on the narrativeand its presentation along with the individual participating in the experience. The idea of psychological immersion can be illustrated through the example of reading books. Books provide limited sensory stimulation to the reader in comparison to multisensory audiovisual experiences; nevertheless, the narrative content presented by books and its relevance to the reader can lead to a psychologically immersive experience.

The three recognized reasons that can lead to psychological immersion are the sense of being surrounded, absorption in the narrative or its depiction, and absorption when facing challenges. While these are often viewed as different types/dimensions of immersion, we believe that conclusive evidence is required to determine if the experiences they lead to are fundamentally different to warrant the classification of psychological immersion. An overview of the three reasons is presented in the following subsections.

2.1.1 Sense of Being Surrounded or Experiencing Multisensory Stimulation

ImmersionFootnote 2 is often viewed as a perceptual experience that is directly dependent on the capabilities of the rendering system. The sense of being surrounded or experiencing multisensory stimulation is a prevalent conceptualization of immersion. Biocca and Delaney [7] dubbed this perceptual immersion: the extent of submersion of the user’s perceptual system in the environment. It is believed that perceptual immersion can be measured objectively by “counting the number of the user’s senses that are provided with input and the degree to which inputs from the physical environment are shut out” [36]. McMahan [39] stated that perceptual immersion can be achieved by blocking the external world and constraining the user’s perception to the presented stimulus.

The role of sensory information in immersive gaming experiences was recognized by Ermi and Mäyrä [16] for the development of a gameplay experience model (sensory, challenge-based, and imaginative immersion model or SCI model). The authors called it sensory immersion: an overpowering of the sensory information from the real environment through large screens and powerful sounds to focus the user entirely on the stimulus. In their study on presence, Witmer and Singer [70] made the distinction between immersion and involvement such that the former is the subjective experience of being enveloped in an interactive environment and the latter is a psychological state which results from directing attention to the stimulus.

It may appear that what many researchers call perceptual or sensory immersion is a completely different perspective on immersion compared to psychological immersion. Nevertheless, it is instead a facilitator for psychological immersion since overpowering sensory information or blocking the stimuli from the immediate environment does not guarantee psychological immersion but can prevent “an exogenous shift of attention” [45] away from the activity; consequently, leading to psychological immersion. The current attempts to create supposed immersive audiovisual experiences are based on this idea of eliciting psychological immersion. It is assumed that augmenting the sensory information (e.g., in spatial audio reproduction) and/or attempting to reduce the inputs from the physical environment (e.g., virtual reality) will lead to the users focusing on the stimulus and experiencing psychological immersion.

2.1.2 Absorption in the Narrative or its Depiction

The role of the narrativeis considered to be an important dimension of the immersive experience. Mental absorption in the story or the mediated world is the definition of being immersed on a diegetic level. Adam and Rollings [1] called it narrative immersion: “the feeling of being inside a story, completely involved and accepting the world and events of the story as real.” A similar description has been provided by Thon [45]: “narrative immersion refers to the player’s shift of attention to the unfolding of the story of the game and the characters therein as well as to the construction of a situation model representing not only the various characters and narrative events, but also the fictional game world as a whole.” The idea of narrative immersion has been echoed in the context of video games under imaginative immersion [16] and as fictional immersion [5] for all narrative forms.

It has been suggested that an exciting story and interesting characters are prerequisites for experiencing narrative immersion [1]. Ryan [57] classified the causes that lead to narrative immersion as temporal, spatial, and emotional immersion. Temporal immersion is experienced when one is curious to known how the story unfolds. Spatial immersion refers to the experience of having a sense of space and enjoyment in exploration. Lastly, emotional immersion occurs when one is emotionally invested in the story and/or emotionally attached to the characters. It can also be observed when the narrative elements remind the individual of emotionally relevant instances or characters.

2.1.3 Absorption When Facing Challenges

The idea of being absorbed when facing challenges stems from the work conducted on immersion in gaming experiences. Absorption in the activity due to challenges occurs when a balance is achieved between ability and the perceived challenge [16]. These challenges can be mental challenges or sensorimotor challenges. Ermi and Mäyrä [16] believed that the challenges encountered will often be a combination of mental and sensorimotor challenges to a certain extent. Thus, the individuals must have attentional surplus to face the challenges simultaneously or the overlap between the challenges must be brief to avoid attentional overload [44]. The nature of the challenge was used to distinguish challenge-based immersion as strategic and tactical immersion by Adam and Rollings [1]. Strategic immersion is experienced when one is preoccupied strategizing and making choices mentally to conquer the task on hand. Tactical immersion refers to the state of mental absorption when one is fully concentrated on the activity that has a stream of demands for swift tactile movements (e.g., when playing action-packed video games).

Challenge in the current view refers to active hurdles encountered in participatory activities. Arsenault [5] argued that challenges are not required to experience immersion and suggested to substitute challenge-based immersion with systematic immersion: immersion in the activity where one accepts the mechanics (e.g., rules, physical movement, etc.) of the mediated experience instead of the mechanics from the unmediated reality. The idea of systematic immersion can be applied to non-participatory activitiesFootnote 3 such as a screening of a fictional film where one readily accepts the existence of magic and flying sea mammals, for instance.

2.2 Physical Perspective

A substantial portion of the work on immersion has been performed in the context of media consumption for interactive applications (e.g., video games, virtual environments). This has supposedly led to the notion of immersion being an objective property of the system or the technology that facilitates the experience. In Slater and Wilbur’s [60] words, “Immersion is a description of a technology, and describes the extent to which the computer displays are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant.” In this regard, immersion is seen as the capability of the system/technology to support the different modalities, deliver sensory information, and provide interaction capabilities. Slater rejects the idea of immersion being a subjective experience. Instead, he views immersion as an objective property of the system that consists of reproduction fidelity of the different modalities, isolation from the physical world, and behavioral fidelity among others [62]. These properties of the “immersive system” can lead to different subjective experience of place and plausibility illusionaccording to him [62].Footnote 4

It is important to state that approaching immersion as an objective property fails to consider the perceptual limits, context, and individual factors such as mood, preference, expertise, and expectations. It has been established that an improvement in the technical specifications of the system does not necessarily lead to a proportional perceptual change (evident by non-linear psychophysical curves). Limiting immersion to the physical domain removes the sensory and cognitive filters that play an active part in determining the overall experience. It has been appropriately suggested that the term system immersion [61] should be used when referring to this perspective on immersion.

The ideas of immersion being an objective property and perceptual or sensory immersion are closely related. An improvement in the technical specifications of the system such as an increase in the number of loudspeakers can increase the sensory information leading to psychological immersion as explained in Sect. 11.2.1.1. This can give the impression that it is the system or the sensory information that leads to psychological immersion. While the system is a factor that can influence immersion, it is not the only factor as the physical perspective suggests.

3 Immersion: A Cognitive Concept

ItFootnote 5 is clear from the preceding section that we must organize the usage of the term immersion to communicate the intended ideas and conduct research on the subject. We use the filter model [6, 33] to differentiate and categorize the ideas conveyed by the common term. The model (depicted in Fig. 11.2) has been used for sensory analysis in food science, sound and image quality evaluations, and to study the spatial characteristic of sound among others [6].

Fig. 11.2
figure 2

Filter model and the suggested terms for referring to the perspectives on immersion in the different domains

The model starts with the physical domain which houses a physical stimulus (e.g., a music signal played by a loudspeaker). The stimulus is characterized by the physical measurements of the audio frequency content, spatial audio channels, etc. The stimulus is perceived after passing through the sensory filter when it is transformed by the sensory system (e.g., auditory system) to neural energy. The result is an auditory event which is comprised of attributes of sound (e.g., loudness, envelopment). The elicitation of the attributes and their strength depends upon the characteristics of the physical stimulus and the sensory system. The auditory event can be evaluated by perceptual measurements in the perceptual domain. Finally, to form an overall impression of the auditory event, the perception passes through the cognitive filter which accounts for emotional state, expertise, expectations, mood, context, etc. The cognitive factors and the individual attributes from the perceptual domain contribute to the overall impression which requires an integrative frame of mind. These affective or hedonic measurements include assessment of quality, degree of liking, annoyance, acceptance, etc.

The filter model is simple yet powerful as it allows us to evaluate the influence of physical parameters of the system and signal on affective measurements by linking the two domains through the perceptual domain. The primary ideas conveyed by the use of the term immersion can be categorized using the filter model. First, we have Slater’s idea of immersion as being an objective property of the technology/system. Slater [62] has stated that “Let’s reserve the term ‘immersion’ to stand simply for what the technology delivers from an objective point of view. The more that a system delivers displays (in all sensory modalities) and tracking that preserves fidelity in relation to their equivalent real-world sensory modalities, the more that it is ‘immersive’.” Slater [61] has suggested the term system immersion to denote his understanding of immersion which is in the physical domain. Second, immersion is used to refer to the sense of being surrounded by a stimulus (see Sect. 11.2.1.1). This is the perception of the stimulus and thus exists in the perceptual domain. We recommend that the term perceptual immersion should be used when referring to the feeling of being surrounded. The goal with surrounding the user with a stimulus is often done with the hope of eliciting psychological immersion as both perceptual attributes and cognitive factors contribute to affective measurements as explained in the preceding paragraph. Finally, the idea of psychological immersion (see Sect. 11.2.1) or involvement/absorption in the activity can be explained in the cognitive domain. The user (their personal characteristics) plays an important role in the experience of psychological immersion but the perceptual attributes (e.g., envelopment, naturalness) can influence psychological immersion.

Our motivation for studying immersion is to identify the influencing factors so that they may be tuned to enhance experiences. The role of the individual is of utmost importance since experiences are, by their very nature, subjective. Thus, it is important to consider the holistic experience instead of focusing on individual parts that contribute to the experience. Assessment of audiovisual experiences has been historically driven from a bottom-up approach beginning from the technical specifications of the system that facilitates the experience. However, improvements in the technical capabilities of the system may not always lead to a perceptual difference (e.g., when the improvements are smaller than the just noticeable difference or beyond the thresholds of the human sensory system), rendering them insignificant for the goal of improving experiences. Therefore, we advocate a top-down approach where the idea is first studied holistically (in the cognitive domain) and then empirical relationships are forged to the technical parameters of the system (physical domain).

We view immersion from a psychological perspective (similar to Sect. 11.2.1). For the remainder of this chapter, our usage of the term immersion refers to psychological immersion unless noted otherwise. Synthesizing from the descriptions of immersion appearing in literature, we propose the following definition of immersion that can be applied to a wide range of applications:

Immersion is a phenomenon experienced by an individual when they are in a state of deep mental involvement in which their cognitive processes (with or without sensory stimulation) cause a shift in their attentional state such that one may experience disassociation from the awareness of the physical world.

We consider immersion to be a normal occurrence of focused attention (on the activity) during waking consciousness. During immersion, the mind is absorbed in the current motivated activity and conscious attention is focused on the features of the situation that are related to the achievement of the intended goal. Still, during most normal circumstances the mind can easily be disturbed by extrinsic factors (e.g., noise in the environment), intrinsic dynamic tendencies (e.g., unfinished tasks or obligations), and random noise. Unlike hallucinations and dreaming during sleep states, the mind is still attentive or watchful (to some degree) to the occurrences in the world and monitors the present state of the body when immersed in a construction built by intrinsic factors. When something of significance for the maintenance of the subject’s life and well-being occurs, the perturbations may usually rather easily destabilize the current state, change the focus of attention, and propel the mind into another and more stable attractor of orientation and search for the nature of the disturbance. For detailed discussions of consciousness, the reader is referred to [15, 18, 20, 38, 50].

Involvement in the current view necessitates an interaction between the subject and the system not only in a physical sense (the completion of a series of actions and operations upon the system) but also in a psychological sense (the interaction between the subject’s motives for the interaction with the system and the system’s objective capabilities for the pursuit of the subject’s motives). Based on the proposed definition, immersion is a mental state which is why sensory stimulation is not required to experience immersion (e.g., daydreaming can be an immersive experience).

It is imperative to consider all sensory modalities for determining immersion since the presented stimuli may stimulate only a few senses but we continue to receive input from all the senses that can influence immersion. Therefore, all factors that can either facilitate or disrupt immersion must be considered. It is unreasonable to merely examine the stimulus or the system to determine immersion. While the system and the content can affect immersion, they are not immersive independent of the human subject. The idea of immersive potentialcan add clarity to the above explanation.

  • Immersive potential: The potential of a system or content to elicit immersion.

For a given piece of content presented by a system which does not change, the immersive potential remains constant. It does not simply increase with the betterment of the system’s technical specifications. Instead, it depends on its ability to elicit immersion. The immersive potential is barred by the human perceptual limits and the changes to a system must lead to a discernible perceptual change to alter its immersive potential.

In addition to the system and the content, immersion also depends on the state of the individual at the moment in time as well as their immersive tendency.

  • Immersive tendency [70]: An individual’s predisposition to experience immersion.

The immersive tendencycan be determined with the help of questionnaires [69, 70] to learn if certain individuals can get immersed relatively easily compared to others. It can be assumed to stay constant over the course of an experiment which is conducted within a short duration of time.Footnote 6

The five factors that can influence immersion are (1) the system (physical properties of the reproduction system and the content), (2) narrative (content), (3) environment (physical environment around the individual and the contextual conditions), (4) individual factors (affective states, mood, preference, skills, previous knowledge, expertise, goals, motivation, etc.), and (5) interaction between the individual and the experience (significance of the content to the individual, acceptance of the task, alignment of goal and motivation). These are similar to those which affect the quality of experience (QoE) [54] since immersion is an experience that is dependent on an individual’s cognitive state and preference for the content. Nonetheless, there is a noteworthy distinction between the concept of QoE and immersive experiences. This is explained in the following subsection.

3.1 Quality of Experience (QoE) and Immersion

The concept of quality of experience (QoE) was introduced in the field of telecommunication and multimedia services. It is the successor to quality of service experienced (QoSE) which is the successor to quality of service (QoS).Footnote 7 The progression from QoS to QoE has shifted the approach to quality from technology-centric to user-centric. It is important to note that this shift is consistent with the widespread acknowledgment that only the end users are capable of judging quality [49]. Although several definitions of QoE are in use, the following definition by Raake and Egger [49] (based on the definition proposed in the Qualinet white paper [47]) provides a complete and functional description of the concept:

Quality of experience (QoE) is the degree of delight or annoyance of a person whose experience involves an application, service, or system. It results from the person’s evaluation of the fulfillment of his or her expectations and needs with respect to the utility and/or enjoyment in the light of the person’s context, personality, and current state.

The act of experiencing does not constitute quality judgement [49]. Evaluating quality requires cognitive processes in addition to those engaged during the act of experiencing [49]. Please refer to [30, 48] for additional information on quality formation process. QoE is a two-step process comprising of experiencing and forming a quality judgement. This is a major point of distinction between immersion and QoE. Immersion is the state of being mentally absorbed in an experience whereas QoE is the evaluation of quality for any experience, immersive or not.

An immersive experience is an experience where immersion is elicited. The quality of such an experience may be determined by methodologies inspired by QoE evaluations. Thus, we place immersion on a level below QoE in the hierarchy. Immersion may be a factor that can influence QoE but the scientific evidence is yet to emerge.

4 Differentiating Immersion from Interchangeably Used Terms

The preceding section presented a detailed explanation of immersion that is synthesized from a non-exhaustive literature review. To establish the terminology and add clarity to the concept of immersion, a brief review of interchangeably used terms is presented in the following sub-subsections and the ideas are differentiated from immersion.

4.1 Presence

Presence has been an important research topic for technology mediated experiences. Initially, presencereferred to the experience of perceiving the physical environment and did not entail the use of technology [64]. However, presence is used in a much broader sense today. It is generally understood as “a psychological state or subjective perception in which even though part or all of an individual’s current experience is generated by and/or filtered through human-made technology, part or all of the individual’s perception fails to accurately acknowledge the role of the technology in the experience” [17]. This definition refers to what is known as physical presence (also called place illusion). Presence is also classified as social presence (the experience of being together with others) and co-presence (being together in the same physical space). The discussion here is limited to physical presence since it is the one that is often confused with immersion.

Place illusion (physical presence) and plausibility illusion Footnote 8 are required for realistic behaviors in virtual environments [63]. Place illusion is a technology mediated illusion where the user has the feeling of being in a real space which is not the actual physical space they are in. Slater [62] views place illusion as a subjective response to system immersion. He explains that “if immersion [system immersion] is analogous to wavelength distribution in the description of color then “‘presence’ is analogous to the perception of color.” In this sense, presence is a perceptual attribute that is directly influenced by the properties of the system. To extend this in the context of the filter model, liking or the quality of presence would represent the overall impression of the experience in the cognitive domain.

When explaining why people often report the sense of “being there” when engaging with systems possessing low system immersion, Slater hypothesized that the reported presence experiences were qualitatively different from those encountered due to objectively better systems (higher system immersion) [44]. He [63] asserted that presence due to superior systems is caused because of the exposure to the sensory stimuli while presence experienced due to relatively inferior systems requires focused attention and deliberate learning [63]. Slater [63] goes on to state that “[the feeling of presence due to low system immersion] it is not simply a function of how the perceptual system normally works, but is something that essentially needs to be learned, and may be regarded as more complex.” This explanation is at odds with the psychophysics-based description he has provided using the analogy to color perception. Although it has been argued that cognition plays a role in determining presence [59], the sensory information delivered by the system is paramount [62]. Please refer to [44] for an overview of presence theories.

At this stage, it is important to distinguish between place illusion and our definition of immersion which was presented in Sect. 11.3. Foremost, immersion is mental absorption in the activity whereas presence is the feeling of being in an unmediated environment even when the contrary is true. It follows that immersion resides in the cognitive domain whereas descriptions of presence suggest that it is a perceptual attribute. Secondly, presence requires technologically mediated experiences whereas immersion can be experienced even without sensory stimulation from the system.

We follow Jennett et al.’s [31] notion that the two concepts are independent and a double dissociation exists between immersion and presence. For participatory activities, immersion can be experienced when playing abstract games such as Pac-Man on a mobile phone but it is unlikely that the user will feel that they are present in the game environment. Similarly, a high fidelity audiovisual reproduction of an uninteresting movie in virtual reality can deliver the illusion of being in an alternate environment but will fail to deliver an immersive experience. Nonetheless, it is important to note that presence and immersion can coincide as is often the case for engaging virtual reality experiences, for example.

4.2 Flow

The concept of flowwas developed in the 1960s through a series of studies conducted to understand why people pursue arduous and often dangerous activities in the absence of discernible extrinsic rewards [12]. Multiple definitions and descriptions of flow have been presented including, “the holistic sensation that people feel when they act with total involvement”  [10]; “a subjective state that people report when they are completely involved in something to the point of forgetting time, fatigue, and everything else but the activity itself”  [12]; and “the state in which people are so involved in an activity that nothing else seems to matter; the experience itself is so enjoyable that people will do it even at great cost, for the sheer sake of doing it”  [11]. Csikszentmihalyi [12] identified eight components of flow: clear goals, direct and immediate feedback, altered sense of time, loss of self-consciousness, concentration, balance between ability and challenge, sense of control, and escape from everyday life. However, researchers have not yet established the conditions that must be fulfilled for an experience to qualify as flow [66].

There is an evident overlap between flow and immersion, but the two are not synonymous. Immersion is a graded experience [2, 8] whereas flow is an “all-or-nothing” experience [9]. Flow is an optimal experience that is always enjoyable whereas enjoyment is not mandatory for immersion, i.e., an individual can experience negative emotions when immersed but it will not qualify as a state of flow since it is not pleasant. Additionally, the concept of flow is limited to interactive activities because flow components such as clear goals and immediate feedback are not applicable to passive activities. It has been argued that immersion is a precursor to flow, but flow is not simply the highest degree of immersion [31]. For instance, a passive, unpleasant experience can be highly immersive but will fail to qualify as flow due to a lack of enjoyment and the interactive components that constitute flow.

4.3 Envelopment

Envelopment is an important topic in concert hall acoustics and spatial audio. It is classified as listener envelopment (sense of being surrounded by the reverberant sound field) [56] and source-related envelopment (envelopmentby sounds placed around the listener) [19]. It is clear from the literature [33] that envelopment is strictly a perceptual attribute. However, it continues to be confused with immersion. There are two reasons that can explain the replaceable usage: (1) use of the common analogy: “feeling of swimming underwater” to illustrate immersion as well as envelopment, and (2) approaching immersion as perceptual immersion (see Sect. 11.2.1.1) makes the two synonymous.

The predominant difference between envelopment and immersion is that the former is perceptual while the latter is affective since it is an integrative measure that accounts for cognitive factors. A double dissociation exists between immersion and envelopment. For example, monophonic reproduction in a non-reverberant environment will not elicit the feeling of envelopment but it can be immersive. Similarly, an accurate reproduction of a soundscape is unlikely to be immersive due to a lack of an engaging narrative but will be reported to be highly enveloping. Nevertheless, it should be noted that envelopment and immersion can coexist. Further, envelopment can lead to immersion in an experience since sense of being surrounded is one of the reasons that can lead to psychological immersion (see Fig. 11.1).

5 Subjective Assessment of Immersion: An Exploratory Study

Quantification of immersion is the immediate step following the theoretical conceptualization of the topic. Nevertheless, a lack of established experimental paradigms for the assessment of immersion is the greatest challenge in developing our understanding of the topic. A lack of a consensus on the idea of immersion, fragile nature of immersive experiences [43], and limited information about the factors and their influence on immersion add to the complexity of quantifying immersion. Methodologies for assessing immersion can be classified as subjective and objective measures.Footnote 9 An outline of these is presented below.

5.1 Subjective and Objective Measures

Subjective measurement paradigms ask the participants to reflect on their experience and form a conscious judgement. Questionnaires, focus groups, think aloud paradigms, and interviews are examples of subjective measures. These are conducted post-experience in order to avoid infringing on the experience. Thus, they are less susceptible to the emotional and physiological idiosyncrasies. Subjective measures are attractive as they are non-invasive and easy to interpret for the participants. They allow researchers to explore multiple facets of immersion (e.g., emotions, mental and physical awareness, liking, etc.) as the areas of interest can be multiple items on a questionnaire or be verbally questioned in an interview. Moreover, subjective measures are excellent for determining individual differences as the responses can be directly compared and analyzed.

The simplicity and effectiveness of subjective measures is appealing but the drawbacks must be considered to select suitable experimental paradigms. Foremost, the post-experience nature of these measures can lead to inaccurate recall and recency effect. These can be particularly problematic when longer stimuli are used for evaluations. The retrospective recall also restricts the evaluation of temporal variations in immersion. Finally, for subjective measures that are based on a set of predefined questions (e.g., questionnaires), there is a risk of failing to capture all the aspects of the immersive experience that are beyond the scope of the listed items.

In contrast to subjective measures, objective measures attempt to record the user’s response without requiring conscious evaluation and correlate those responses to immersion and/or its attributes. Behavioral and physiological measures are the two types of objective methods used for assessing immersion. The former includes measures such as secondary task reaction time(STRT)Footnote 10 while the latter involves the use of biological sensors to measure physical response to the stimulus (e.g., electroencephalography, eye tracking, and electrodermal activity). These methods do not allow for the direct measurement of immersion. Instead, the recorded response is correlated to immersion or the suspected to be attributes of immersion.

The objective and non-intrusive aspect of objective measures yields an accurate, time-variant measurement of concept under evaluation. Since the deliberate judgement formation process is eliminated unlike subjective measures, the measurements are not influenced by the various biases associated with subjective evaluations. The single most important criticism of objective measures is the lack of established relationship(s) between immersion and what is measured. Hence, there is a risk of measuring an idea that may not be related to immersion or differently related than assumed. In addition to the lack of one-to-one mapping, physiological signals can be highly sensitive, require specialized equipment in controlled environments, and may need extensive data analysis procedures.

5.2 Research Questions

An experiment was conducted to develop and test a suitable methodology for assessing immersion as a necessary first step.Footnote 11 Answers to the following research questions were sought in the study:

RQ1 :

How can immersion in an audiovisual experience be quantified through subjective testing?

RQ2 :

Is immersion a binary (all-or-nothing) or graded experience?

RQ3 :

What is the influence of immersive tendency on immersion ratings?

5.3 Experimental Strategy and Design

Subjective and objectives tests each have their advantages and disadvantages as discussed in Sect. 11.5.1. The fundamental issue with physiological measures is the lack of established links between what is measured and immersion. Thus, one cannot be certain if what is being measured is immersion or is related to immersion in a quantifiable way. Experimental designs that incorporate behavioral responses such as STRT are potential alternatives but have failed to yield conclusive results. The limitations with objective measures limit us to subjective assessment of immersion [40].

Subjective assessment of immersion has been predominantly conducted using questionnaires. However, since the questionnaires often have in excess of 25 items, administering them for each experience adds to the experimental time and the workload for the participants. Further, questionnaires fail to capture the unexpected aspects of immersion or those that are unaccounted for in the set of questions [71]. Jennett et al. [31] compared the results from a questionnaire to that of a single question on the immersion experienced by the participants. Their experiments revealed that “people can reliably reflect on their own immersion in a single question” when grading immersion on a categorical 10-point scale. This is an important finding as it implies that immersion experiments may be conducted as rating experiments. Since rating experiments are the norm for audiovisual assessment and participants are familiar with the general paradigm, it was decided to conduct the experiment as a rating experiment.

Before the experimental design could be developed, it was necessary to outline the theoretical implications on the experimental paradigm. First, the participants cannot be permitted to switch between stimuli for making comparative judgments as it will destroy the state of immersion [3]. Similarly, the evaluations must be made post-experience. Second, it is hypothesized that individuals require time to return to their base or initial state after an immersive experience. Distractor tasks can be incorporated to shift attention away from the experience between consecutive presentations. Third, the experiment must be completed in a single session since participants experience fatigue faster in non-participatory tasks [29] and time can alter individual factors such as mood. Finally, each participant should be limited to one instance of any stimulus due to limited information regarding the effect of repetition on immersion.

With the implications in mind, a pilot test was conducted as a randomized complete block design to aid with the selection of stimuli and to test the protocol. Six participants each graded the same set of 5 stimuli. The pilot test results suggested that the session should be limited to 75–80 min in order to avoid participant fatigue. Since a large number of stimuli had to be tested (particularly for RQ2) and repetitions were prohibited, a balanced incomplete block (BIB) design was determined to be the most appropriate choice for the main study. A major drawback of a simple BIB design is that as the number of stimuli to evaluate increases, the number of participants required increases drastically (provided the number of evaluations each participant performs does not change). Thus, precision had to be traded to reduce the number of participants required for the experiment [35].

The simple BIB design was reduced to a BIB design with 21 blocks (participants) and 15 treatments (stimuli). Every participant evaluated a subset of 5 stimuli from the set of 15. The stimuli were allocated such that each pair of stimuli (e.g., A and F) would only appear together in two blocks (i.e., only two participants would get both A and F). In total, there were 7 instances of each of the 15 stimuli that yielded 105 total observations as 21 participants graded 5 stimuli each. The allocation of the stimuli to the different blocks is shown in Table 11.1.

Table 11.1 Allocation of stimuli to the experimental blocks for the balanced incomplete block (BIB) design used in the study. Reproduced from [2]

5.4 Methods

5.4.1 Program Material

There are various implications for selecting the program material for assessing immersion (see [3]). Foremost, the relevance of the program material to the participant plays a role in determining immersion and can vary among participants. Thus, it should not be assumed that any given stimulus can immerse all participants. Additionally, since knowledge and expectations may change with every trial of a stimulus, an assessor may not experience immersion during repeated presentations of the same stimulus.

It is important to select audiovisual excerpts with lengths sufficient to elicit immersion. It has been recommended that stimuli that are at least 10 min long must be used [29], but there is limited information regarding the temporal nature of immersion. The recommendation is focused on participatory activities, and we suspect that the length of the stimulus can be lower and is dependent on the narrative. Thus, excerpts ranging from 4 to 12 min were selected for this study.

Given the lack of knowledge regarding the effect of familiarity on immersion, it is suggested that content that is unfamiliar and that does not require additional background information must be selected. However, this stipulation limits the amount of content that can be selected. Therefore, it was decided to provide the participants with a short synopsis (1–2 sentences) regarding the narrative before each presentation. These were constructed only to include any relevant information required to make sense of the story and did not disclose any additional information.

Finally, to select the technical specifications of the excerpts, an informal survey of the domestic media landscape suggested that ultra high-definition (UHD), high dynamic range (HDR) visuals and spatial audio are emerging for domestic consumption. These are being incorporated by broadcasters, streaming platforms, and movie studios alike. Thus, it was decided to use a 7.1.4 audio rendering system coupled with an UHD HDR enabled screen. The 7.1.4 audio reproduction system was chosen as it was revealed to be the most common spatial audio reproduction system beyond traditional surround sound for domestic applications.

Audiovisual excerpts of different lengths and narratives that can elicit spatial, emotional, and temporal immersion were chosen for the experiment. An active effort was made to select stimuli that were distributed across the immersion scale as the results are directly dependent on the stimuli. The selection was made based on the pilot experiment and comments received from the pilot test participants because the technical specifications could not be used to choose the excerpts. A list of the excerpts and the genres is presented in Table 11.2.

The fundamental challenge with selecting stimuli that has UHD HDR visuals coupled with spatial audio was a lack of freely available content. Hence, commercially available content with Dolby or DTS audio had to be used for this experiment. Fifteen audiovisual excerpts that fulfilled the above-stated conditions were selected. The resolution, native aspect ratio, and chroma sub-sampling were not changed for reproduction. The audiovisual signals were not processed at any stage.

Table 11.2 Audiovisual excerpts used in the experiment. Reproduced from [2]

5.4.2 Reproduction Setup

The audiovisual excerpts were presented directly from the Blu-ray player to every participant due to legal limitations. An HDCP compliant video switcher and the Genelec loudspeaker manager (GLM) were used to control the video and the audio respectively. The complete audiovisual signal chain is depicted in Fig. 11.3.

A 7.1.4 audio rendering system was used for audio reproduction. The audio was decoded by the Marantz AV7704 and the decoded channels were mapped to the corresponding loudspeaker channels. A phantom reproduction of the center audio channel was used since it was not feasible to have a physical loudspeaker due to the screen. The Genelec loudspeakers were distributed on a hemisphere with a 2 m radius around the listening position. The placement of the loudspeakers was in accordance with Dolby guidelines [13].

The loudspeakers were level calibrated and time aligned with respect to each other. To achieve approximately equal loudness among the stimuli, the level was varied such that the audio was equally loud at the listening position as determined by ear. All excerpts were auditioned by two experienced listeners to ensure that the stimuli were at comfortable loudness and that the audio was intelligible during the quieter segments.

Fig. 11.3
figure 3

Audiovisual reproduction signal flow. The different line types refer to: HDMI 2.0 and HDCP 2.2 connection (―), analog audio feed over XLR (- - - -), remote loudspeaker control over Ethernet (. . . . . . . ), and HDMI 1.4 connection (•)

A 65-inch LG C9 OLED screen was used to reproduce the visuals. The screen was centered with respect to the participants to obtain a zero degree viewing angle horizontally and vertically. It was placed at a distance of 2 m (same as the loudspeakers) following the design viewing distance in ITU-R BT.2022 recommendation [26]. To balance the judder of 24p video signal while exploiting the high dynamic range (HDR) capabilities of the screen, the screen brightness was lowered to nearly 120 nits and the environmental illuminance was less than 10 lux. The screen settings were tuned by two experienced viewers in part to get the chromaticity coordinates closer to the D65 value [27] and in part based on experience.

The audiovisual reproduction took place in an IEC 60268-13 [25] compliant listening room. All equipment except the screen and the loudspeakers were placed outside the room. The loudspeakers were hidden behind acoustical transparent curtains to limit visual influence.

5.4.3 Distractor Tasks

It was hypothesized that some time is required to return to the initial or base psychological state after an engaging experience. Presentation of stimuli in quick succession may lead to the preceding stimulus biasing the result of the following stimulus. In the absence of formal guidelines and conclusive evidence regarding the gap in time between presentation of stimuli, distractor tasks were incorporated in an attempt to shift attention away from the preceding stimulus by requiring active participation.

A 11 piece LEGO® unicorn puzzle (only instruction was to create a unicorn), an image for free interpretation,Footnote 12 a matchstick rearrangement puzzle, and a memory puzzle were the four distractor tasks. One task was chosen at random to be completed within four minutes between each successive presentation. The assessors were made aware of the correct solution for the matchstick and memory puzzle tasks before proceeding (Fig. 11.4).

Fig. 11.4
figure 4

All images reproduced from [2]

The four distractors tasks: a Matchstick rearrangement puzzle b Memory task (7 \(\times \) 6 tiles) c LEGO® puzzle d Image for free interpretation (obtained from the New York Times).

5.4.4 Immersive Tendency Questionnaire

Questionnaires are the primary tool for gauging immersive tendency. Reduced version of widely used [23, 32, 34, 42] Witmer and Singer’s [70] immersive tendency questionnaire (ITQ) was used for this study (see Table 11.3 for questionnaire items). Nonetheless, a few modifications were made to the existing questionnaire. The seven point categorical scale was substituted by the graphic line scale (also used for rating immersion) to obtain continuous data; middle word anchor from the categorical scale was dropped as it has been shown that scores can cluster around the verbal anchor [72]; and the terminal verbal anchors were modified to be perfect antonyms (similar modification was made in [23]). The order of questions was randomized for the participants. All assessors answered the ITQ.

Table 11.3 Witmer and Singer’s [70] Immersive Tendency Questionnaire (ITQ). The items in the reduced version of the questionnaire and the corrected item-total correlations from the present study are shown below. Please refer to [2] for analysis of the questionnaire data. Taken from [2]

5.4.5 Assessors

The participants were considered a blocking factor (see Sect. 11.5.3) in the experimental design. Twenty-one assessors (blend of experienced and inexperienced) were each assigned to a block at random. Audiovisual assessment expertise was not required since immersion is a cognitive concept. For this study, experienced assessor refers to participants who had experience participating in audiovisual tests, were under continuous weekly training and evaluation exercises, and participated in product development or research activities at Bang & Olufsen. Inexperienced assessors may have participated in audiovisual tests before but were not familiar with subjective evaluation, did not have formal training, and were not actively focused on the technical aspects of audiovisual products or experiences. In total, fifteen males and six females participated in the experiment. The mean age of the participants was 37.7 years (SD \(=\) 14.28). Auditory and visual acuity was self-reported by the participants.

5.5 Procedure

The experiment included two phases: rating part and administration of the immersive tendency questionnaire. Both were completed in a single session of approximately 90 min.

The participants were introduced to the experimental procedure and asked to confirm visual and auditory acuity before participating. The instructions were delivered verbally and in writing. For the rating phase, the participants were given the following description of immersion as stated in [2]:

“Immersion, also known as deep mental involvement, can be described as being mentally lost (absorbed) in the experience. Immersion is encountered when the experience is involving and absorbs you mentally by capturing your attention. For example, immersion may be experienced when reading a book, playing video games, watching a movie, etc.”

The participants were asked to rate overall immersion on a graphic line scale. The motivation for the scale is found in sensory analysis. It is a 15 cm long line scale where the participants are instructed to insert an intersecting line to denote their perception. The distance from the left end of the scale is considered to be the score (e.g., 6.8 cm would equal to a rating of 6.8). The scale was chosen as it offers the participants infinite steps (in theory) to indicate the intensity of the idea under evaluation. The lack of numbers and verbal anchors (other than those near the endpoints) reduce the bias associated with them. The scale used for the test is shown in Fig. 11.5.

Fig. 11.5
figure 5

(from [2])

Graphic line scale for evaluating immersion. Same scale with different verbal anchors was used for the immersive tendency questionnaire

In addition to rating, familiarity with the content was documented by asking assessors to state if they had experienced the excerpts previously. An excerpt that could elicit immersion was shown before the test in an attempt to exemplify immersion. However, it was explicitly mentioned that it was only an attempt to illustrate immersion, should not be used as a reference, and may not lead to immersion for the participants. The participants were notified that there were no correct responses and that the use of the entire scale was not mandatory.

A synopsis was provided before each experience. A distractor task was chosen at random to be performed between successive presentation of excerpts. The immersive tendency questionnaire was administered at the end of the rating phase. The experiment was conducted as a pen-and-paper test and the data was collected withing three weeks.

5.6 Results

Ratings from both phases of the experiment were converted to scores between 0 and 15 (up to one decimal). The converted scores were used for analyzing the data.

5.6.1 Effect of Stimuli and Differences Between Stimuli Pairs

Data from the rating part of the experiment was analyzed using analysis of variance (ANOVA). Since the scale usage effects were confounded in the collected data and it was not feasible to account for and remove these effects, the estimated marginal means were used to estimating the effect [35]. A mixed effects model ANOVA with stimuli as a fixed factor and participants (blocks) as a random factor was used for analysis. The trials were independent of each other and the assumption of homogeneity of variances was upheld. The residuals were not statistically significantly dissimilar from the normal distribution, W \(=\) 0.99, p \(=\) 0.710 as per the Shapiro-Wilk test.

Fig. 11.6
figure 6

Visualization of the raw data (not adjusted for scale usage) from the rating phase of the experiment. The significant stimuli pairs as determined by the pairwise comparisons are shown above the box plots

The ANOVA showed that the effect of the stimuli on immersion scores was significant, F (14, 74.82) \(=\) 3.32, p \(\,<\,\) 0.001. This proves that there were distinct differences between the pairs of stimuli and that the participants were able to distinguish between them. The blocking factor (participants) was not found to be statistically significant at p > 0.05. However, the effectiveness of the blocking factor is to control for the differences between the participants and cannot simply be judged by statistical significance . Due to a lack of repetitions, the interactions between stimuli and participants factors could not be investigated.

Pairwise comparisons were made between all pairs of stimuli on the basis of the estimated least square means. From the 105 pairs of stimuli, five were found to be statistically significant (Tukey’s adjustment). These pairs are marked in Fig. 11.6 above the box plots. The results from the pairwise comparisons suggest that the stimuli fall in one of the three groups: where participants experienced high immersion (sB, sE, sL, and sM), low immersion (sA and sG), and moderate immersion (all remaining stimuli).

5.6.2 Nature of Immersion: Binary or Graded?

The distribution of raw immersion scores can reveal whether immersion is a binary or a graded concept. When a large number of stimuli are evaluated, the scores should cluster toward the ends of the scale if immersion is a binary concept, i.e., the distribution of scores should be bimodal. Hartigan’s dip test was used to determine if the distribution of immersion scores was unimodal or multimodal.

Hartigan’s dip test is based on Hartigan’s dip test statistics (HDS). This statistic is the maximum difference between the empirical distribution function (EDF) and the uniform distribution that minimizes the difference between the distributions. The uniform distribution is chosen as it is the least favorable unimodal distribution [21]. A large difference between the distributions leads to higher HDS value and signals movement away from unimodality. To compute the p-value, bootstrapped samples are generated and their dip test value is compared iteratively to the dip test value obtained from the empirical distribution. Please refer to [21, 22] for an in-depth explanation of the mathematical calculations. The distribution of the dip statistic values for the bootstrapped samples and the empirical distribution function is shown in Fig. 11.7.

Fig. 11.7
figure 7

The distribution of the Hartigan dip statistic values for the bootstrapped samples and the empirical distribution function

The average p-value was 0.862 (\(\sigma \) \(=\) 0.04) for 100 calculations at 5% significance level. The null hypothesis that the distribution of data is unimodal could not be rejected. This result implies that immersion is a graded concept.

5.6.3 Influence of Immersive Tendency on Immersion ratings

RQ3 was designed to study whether the susceptibility to become immersed in an experience has a direct influence on the immersion ratings. To this end, it was hypothesized that immersion ratings for any stimulus should increase with an increase in the ITQ total scores. Kendall’s rank order correlation was chosen to investigate if a monotonic relationship existed between the immersion and ITQ total scores. The value of Kendall’s \(\tau \) ranges between \(-1\) and \(+1\) where \(-1\) signifies complete disagreement and +1 points to a perfectly monotonic relationship. A value of 0 means that there is no monotonic relation between the two variables, but other relationships may exist.

The data and Kendall’s rank order correlation coefficients are shown in Fig. 11.8. It was found that values for Kendall’s \(\tau \) were largely insignificant. Only 2 correlations (for stimuli sD and sJ) were found to be statistically significant. This result suggests that there is no direct influence of immersive tendency on immersion ratings. This inference is based on the critical assumption that the scale usage for the rating phase and the questionnaire items is identical and that immersive tendency is captured and reflected appropriately by the ITQ total score.

Fig. 11.8
figure 8

Adapted from [2]

Kendall’s rank order correlation between immersion scores and the immersive tendency questionnaire’s total score. Kendall’s \(\tau \) was significant for stimuli D and J. Regression lines (in red) are plotted only to aid the reader. Participant’s familiarity with the content is denoted by the shape of the data points.

5.7 Discussion

There is a growing interest to study immersion for enhancing audiovisual experiences that have been enabled by technologies such as spatial audio and virtual reality. The primary challenge in investigating immersion is a lack of suitable methodologies for assessing immersion. In this study, we explored a rating experiment inspired experimental paradigm for the subjective quantification of immersion in audiovisual experiences.

In subjective testing, the instructions provided to the participants are critical since the assessors make deliberate judgments based on the provided descriptions. It is challenging to communicate the intended idea for cognitive concepts such as immersion due to a lack of standardized definitions and the inability to demonstrate the perceptual differences between stimuli. The results from the experiment show that participants were able to comprehend the provided description of immersion and distinguish between the different stimuli accordingly. The pairwise comparisons show that there were obvious differences between the statistically significant pairs of stimuli even when statistical power is limited. Additionally, the assessors did not report issues with understanding the description before or during the experiment. These results confirm that participants can reflect on the immersion they experience and convey it using a unidimensional scale as suggested by Jennett et al. [31].

It is important to understand the nature of immersion (i.e., binary or graded) to develop the conceptual understanding of the topic. Qualitative studies [8] and theoretical interpretations have conceptualized immersion as a graded concept but empirical tests have not been conducted. Results from Hartigan’s dip test show that the distribution or immersion is not multimodal; hence suggesting that immersion is a graded concept. This is consistent with the conceptual understanding of the topic. Immersion being a graded concept implies that direct comparisons can be made between experiences and systems on an interval scale.

A direct influence of immersive tendency ratings on immersion scores could not be detected in this study. Only 2 out of the 15 correlations were found to be statistically significant. However, it is interesting to note that one of those correlations was negative, implying that individuals with higher degree of immersive tendencies found the stimulus to be less immersive. We are unable to explain this finding but believe that analyzing the contents of the excerpt and the comments provided by the participants can be helpful. The correlation of scores is based on the assumption that the participants use the scale in an identical manner for the rating task and the questionnaire exercise. Although this is a reasonable assumption, it has not been tested. Additionally, we assume that the equally weighted sum of scores from the ITQ questionnaire reflects immersive tendency accurately. Given the lack of internal consistency [2] and the unexplained theoretical grounds for including items on the questionnaire, the assumption may be violated. While it is difficult to draw conclusions about the ITQ due to the limited number of observations, the questionnaire must be examined, compared with other existing questionnaires, and/or new questionnaires should be developed to assess immersive tendency.

6 Summary and Future Work

The primary focus of this chapter has been to present the different perspectives on immersion and address the inconsistent and interchangeable usage of the term. The conceptualizations of immersion gathered from the literature are categorized and clarified using the filter model. We advocate for a top-down approach to study immersion and have synthesized a definition from the psychological standpoint. The definition presented below is intentionally broad and application agnostic to aid adaptability for different applications. Although it has been used for non-interactive application in this chapter, it is applicable to interactive activities as well.

Immersion is a phenomenon experienced by an individual when they are in a state of deep mental involvement in which their cognitive processes (with or without sensory stimulation) cause a shift in their attentional state such that one may experience disassociation from the awareness of the physical world.

This definition was used as the foundation for drawing distinctions between immersion and commonly confused terms such as envelopment and presence. An exploratory experiment was performed by outlining the implications for the experimental paradigm and appraising the benefits and drawbacks of objective and subjective measures. A rating experiment inspired paradigm was chosen for evaluating immersion. The results for the experiment show that the participants were able to discriminate among stimuli even with limited statistical power. This is an important result as it demonstrates that the assessors were able to comprehend the task and reflect on the overall immersion in an experience. Another important result shows immersion is a graded concept which empirically confirms the theoretical conceptualizations of immersion.

The motivation to study and evaluate immersion is to improve the experience for the users ultimately. A key assumption in the quest to study immersion is that positive immersive experiences are preferred by users. It is critical to test this assumption before exploring the different avenues for future work. Efforts should focus on validating and optimizing the experimental paradigm presented in this in addition to overcoming the limitations of the current work stated above. Although the method was applied in the context of domestic audiovisual experiences, adapting the method for virtual and augmented reality applications can be beneficial for optimizing the general methodology. Future work should be focused on quantifying the influence of the physical characteristics of the audiovisual rendering systems on immersion. The results could then be used to improve experiences for the users. For example, determining the influence of audio spatialization can be helpful in designing appropriate sound systems for enabling immersive experiences. The filter model described in Sect. 11.3 is particularly useful for establishing relationships between the physical and the cognitive domains. Inspiration can be drawn from descriptive analysis techniques such as free elicitation [6] and open profiling of quality [65] to determine the key attributes of immersive experiences and acquire knowledge about the central ideas of immersion from the user’s perspective.