1 Introduction

Cartography has always had the purpose of generating insights from geospatial data through representation. As already indicated by the terms cartography and geovisualization (cf. Çöltekin et al. 2017), graphical representation is the method of choice. For centuries, graphical representation was not just the preferred option, but rather the only option, since non-visual communication (e.g., aural, haptic or olfactory) to complement analog mapping was practically unavailable. These conditions changed with the dawn of digital cartography, when mapmakers began to understand cartographic products no longer just in terms of static paper-print maps, but rather as dynamic multimedia interfaces (Cartwright et al. 2007).

Multimedia cartography has significantly shaped the frontline of research into geovisualization over the last two decades, with audiovisual, three-dimensional (3D) and animated maps as already well-established topics on the cartographic research agenda. In this paper, we will focus on another representation technology arising from digital cartography, which has as yet been little studied and is most consistently described as immersive virtual environments (IVE; Slater and Usoh 1993; Schnabel and Kvan 2003; Heydarian et al. 2015).

We will begin the discussion by establishing a terminological framework regarding the key concepts of immersion and spatial presence. Following the maxim of “generating insights”, these concepts will then be linked within a process model that offers an explanation of how IVE can facilitate cartographic communication. While this model was proposed only recently in a rather general form by Hruby et al. (2018), the focus of this paper will be on the technical and cognitive possibilities of representing geospatial phenomena in IVE using auditive (together with visual) input.

2 Geovisualization Immersive Environments

Virtual reality (VR) has become a frequently emphasized factor in literature with a cartographical research agenda (Çöltekin et al. 2017; Griffin et al. 2017; Virrantaus et al. 2009; cf. also Huang et al. 2018). However, VR is currently defined and operationalized in ambiguous ways, e.g., as virtual globes (Yang et al. 2018), VRGIS (Boulos et al. 2017) or Virtual Geographic Environments (VGE; Chen and Lin 2018), applying different understandings of virtuality in each case. To avoid confusion with existing concepts, we refer to geovisualization immersive environments (GeoIVE; Hruby et al. 2018; Edler et al. 2018; cf. also MacEachren and Kraak 2001) in the following explanation, indicating explicitly the highly immersive visualizations of geospatial data we are interested in.

2.1 Immersive VR

In this document, and in line with current research (Cummings and Bailenson 2016; Skarbez et al. 2017), immersion is understood as a technological, hence objective, quality of media describing “[…] the extent to which the computer displays are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant.” (Slater and Wilbur 1997, p. 3).

To create such an illusion, IVE consist of at least three hardware components (Schulze et al. 2011): Firstly, a 3D stereo head-mounted display (HMD), which provides the user with stereoscopic depth perception in virtual space; secondly, a tracking system to monitor the physical movement of the user, which is, thirdly, translated into corresponding stereo images by a high-performance graphics processing unit (GPU), rendering these images onto the HMD at a suitable frame rate (Fig. 1). The exact definition of suitable in this context, in matters of realism, motion blur and sickness in IVE, is a separate topic beyond the scope of this paper, but interested readers are referred to Kim et al. (2018) for further reading references.

Fig. 1
figure 1

Basic hardware components of an immersive VR system (Schulze et al. 2011)

While the aforementioned definition of immersion rather relies on stereoscopic vision, it can be easily extended to other forms of perception. In the case of spatialized sound, for instance (cf. Sect. 3), stereophonic input (e.g., from headphones integrated in the HMD) needs to be provided in real time depending on the user’s position in VR space.

2.2 Spatial Presence

Spatial Presence is the user’s feeling of “being there” within a virtual environment (Skarbez et al. 2017). Two subdivisions of spatial presence, i.e., self-location and action possibilities can be distinguished. Consequently, spatially present IVE users have a subjective experience of being situated in virtual space and being able to interact directly with the virtual environment (Hartmann et al. 2015). These two aspects can be formalized in the two-step model of the formation of the spatial presence proposed by Wirth et al. (2007) and enhanced by Schubert (2009) and Wirth et al. (2012): Within this theoretical framework, a mental spatial model of the IVE is first developed as a function of user- (e.g., interest, attention, spatial ability) and application-specific features (e.g., spatial cues, consistency). Secondly, users must accept this mental model of the virtual environment as their “primary egocentric reference frame” Wirth et al. (2007), so that the spatial presence can emerge. Measures to access self-location and action possibilities were presented by Hartmann et al. (2015).

Immersive technology has been a driving force in the present research since the 1990s (Steuer 1992), based on the observation that IVE users have a stronger sense of the spatial presence, compared, for example, to those of low-immersive media types such as videos or desktop applications (Seibert and Shafer 2018; Kim et al. 2014). Moreover, it has been argued that an increased feeling of the presence improves the effectiveness of VR applications (e.g., as tools for learning and training, entertainment or therapy; Cummings and Bailenson 2016; Hartmann et al. 2015; Makowski et al. 2017). This effectiveness is considered to be particularly high when users experience the IVE from the perspective of an avatar, i.e., as a functional part of the represented environment (Bailey et al. 2016). In a series of tests, Ahn et al. (2016) demonstrated how such body transfer (Shapiro 2014) produces higher levels of identification and engagement with the phenomena being visualized, compared to when the subjects simply watch a video. To measure the interconnectedness between (the user’s) self and (virtually represented) nature, Schultz´s (2001). Inclusion of Nature in Self (INS) scale proved suitable (Ahn et al. 2016).

2.3 Geovisualization Immersion Pipeline

To integrate the aforementioned concepts, Hruby et al. (2018) proposed a general model of what they called a geovisualization immersion pipeline (GIP). Basically, the GIP argues that geovisualization immersive environments (GeoIVE) can further the formation of the spatial presence and body transfer, thus facilitating the generation of insight and involvement in geospatial phenomena. GeoIVE are defined as a subcategory of IVE, modeling generalized geographic environments, but in a realistic form, at a 1:1 scale. While the original GIP focuses on the relationship between VR system and VR user, we propose to extend the pipeline to consider also the aforementioned 1:1 ratio between physical and virtual reality. This relationship between physical reality and virtual environment can be described through multisensory stimuli, so that a GeoIVE not only may look, but also sound like the true environment being represented and, in the future, probably also feel, smell and even taste accordingly (Fig. 2).

Fig. 2
figure 2

Geovisualization immersion pipeline (modified from Hruby et al. 2018): from geospatial reality to the spatial presence and user involvement via multisensory 1:1 representation with GeoIVE

Visualization at a 1:1 scale is as yet atypical and conceptually challenging for cartography, but is a logical consequence of immersion and spatial presence, defined above as a first-person experience. At a 1:1 ratio between virtual and real space, users perceive a VR representation of a real place at a level of detail comparable to that which they would experience by physically being there, allowing them to unambiguously (i.e., 1:1) assign virtually experienced models to the corresponding real objects. This implies that GeoIVE are essentially qualitative visualizations, where additional, e.g., quantitative, information can be added via text- or sound-based input, just as in real space where one would use measuring tools or even just a smartphone to obtain data that cannot be immediately perceived, e.g., air pollution or degrees of temperature (Fig. 3).

Fig. 3
figure 3

1:1 ratio between real (left) and virtual space (right) as a defining criterion of GeoIVE. An example from an audiovisual GeoIVE of a Caribbean coral reef. Indicates spots, where additional information (about the species visualized) can be accessed

The idea of a 1:1 scale in GeoIVE applies to space, but also to the visualization of temporal changes (Magallanes et al. 2018). Moreover, perceptual data from different human senses must be considered to fully emulate the real world in VR, so that “the scenario corresponds to reality to the maximum extent possible” (Skarbez et al. 2017, p. 96:5). The rest of this paper will focus on the importance of sound, both for the realistic representation of physical space in VR and for the formation of the spatial presence among GeoIVE users, thus examining the GIP in the audio channel.

3 Sound in GeoIVE

3.1 Sound in Cartography

In the 1990s, mapmakers began to communicate geospace not just in the visual, but also in the auditive communication channel, leading to the emergence of audiovisual cartography (MacEachren 2004). Audio and vision have been used, on the one hand, in a rather reciprocal way, producing maps that graphically visualize acoustic features of the environment (e.g., noise: Kornfeld et al. 2011) and applications, where data attributes (e.g., elevation: Schito and Fabrikant 2018; Thebpanya 2010) or uncertainty (Ballatore et al. 2018) are translated into sound. The majority of audiovisual products, however, combine visual and audio information, either in a redundant form with graphics and sound transmitting the same information or in a complementary manner (Edler et al. 2012).

Probably, the most influential theoretical framework for low-immersive (generally speaking: desktop-based) cartography has been built around a set of nine sound variables, presented by Krygier (1994), and following Bertin´s (2010) concept of visual variables. These sound variables comprise location, loudness/volume, pitch, register, timbre, duration, rate of change, order and attack/decay. Moreover, Krygier (1994) draws a distinction between abstract and realistic sounds, where the latter can be further divided into sounds of anthropophonic (of human activities, incl. speech), biophonic (of non-human animate beings) and geophonic (e.g., the rushing sea) origins (Papadimitriou et al. 2009).

A wide range of topics has been covered with audiovisual maps, both by cartographers and mapmakers from neighboring domains, using sound to transmit both quantitative and qualitative data. Since representation of geospatial qualities is the primary focus of GeoIVE (cf. Sect. 2.3), quantitative aspects of sonic maps will be excluded from further discussion (referring the interested reader to Edler et al. (2012) and Brauen (2014) for further reading references).

3.2 High- vs. Low-Immersive Audiovisual Cartography

The main differences between low- and high-immersive audiovisual applications can be exemplified by the scenario of an audiovisual hiking map (Laakso and Tiina Sarjakoski 2010): In scenario A, a digital large-scale 2D base map showing topographic features via signs is supplemented by a series of play-button symbols, which give interactive access to georeferenced soundscapes recorded on site, e.g., singing birds in forest areas or traffic noise along roads. If we now, hypothetically, translate scenario A into a GeoIVE (scenario B), users could perceive main topographic features tridimensionally at 1:1 scale, as they would by physically being there. Navigating through the virtual environment, they would perceive all of the georeferenced soundscapes within hearing range.

While oversimplified in terms of a wide variety of additional setting options, the scenario described above allows us to establish some fundamental differences between low- and high-immersive audiovisual products in cartography. Firstly, as already indicated in Sect. 2, maps displayed on non-stereoscopic computer monitors offer an external, i.e., third-person perspective on a spatial representation that is scaled down to a 1:x (with x > 1) level (cf. Kraak and Fabrikant 2017). In contrast, GeoIVE offer a first-person, thus 1:1, experience where the user’s physical environment is completely excluded in favor of the virtual reality displayed on a HMD.

With respect to sound, the difference between the two exemplary settings is less obvious, since scenario A already uses realistic soundscapes, sonificating the environment at a 1:1 scale just as the user would experience by actually being there. Consequently, these sound recordings comply with the requirements established for GeoIVE in Sect. 2. In addition, the use of stereophonic sound is possible in both low- and high-immersive applications with appropriate audio equipment (e.g., stereo headphones). However, even transmitted in stereo, the correspondence between 3D sound space and map space will be limited if the map is shown on a non-stereoscopic display. For instance, you could hear birds singing above in a tree without being able to raise your head to see them. GeoIVE, by contrast, align sound space and map space such that, for example, you can turn your head to see an approaching car that has already been located aurally by the engine noise.

3.3 Sound Localization

As we stated in Sect. 3.2, alignment of map space and sound space is a defining characteristic of audiovisual cartography with GeoIVE. We can trace this issue back in the GIP (Fig. 1) to the ability of the user to locate a sound source in terms of direction and (to a certain extent) distance. As noted by Schnupp et al. (2011), “audition is particularly useful […] because it can convey information from any direction relative to the head, whereas vision operates over a more limited spatial range”. The basic mechanisms of sound location are differences in arrival time and intensity between both ears. For example, engine noise coming from your left will reach the left ear both earlier and at a higher intensity than the right ear, which is further from the sound source and attenuated by the head (Fig. 4). While sound orientation derived from interaural delay and level differences is of particular utility for right–left discrimination, front–back and vertical localization rely instead on outer ear spectral cues (Carlile et al. 2005).

Fig. 4
figure 4

Interaural delay ∆t as a basic mechanism of sound location in real and virtual space: Birdsong arriving earlier at the right ear (t1) than at the left ear (t2) indicates that the bird is on the right side of the user

Audiovisual cartography with GeoIVE has to reproduce the aforementioned perceptual constellation (cf. Fig. 4) to facilitate sound location within a VR system. A typical characteristic of physical reality, and thus of GeoIVE, is the blending of different sound sources reaching the ear from different directions. Surround sound, i.e., sound emitted by a series of loudspeakers located around the user is a well-established technique to rebuild a given sound space. In immersive VR systems, however, headphones prove more convenient than loudspeakers to simulate sound location, as they allow the complete exclusion of noise from the physical environment for the benefit of a fully controlled soundscape in the GeoIVE.

Since headphones produce sound already within or very close to the ear, the aforementioned interaural cues of sound location no longer apply. To overcome this drawback, sound played over headphones can be controlled by the so-called head-related transfer functions (HRTF), which “aim to accurately reproduce the waveform at the listener’s eardrum that would normally be produced by an external sound source” (Steadman et al. 2017). Consumer-oriented VR systems (e.g., Oculus Rift, HTC Vive) already include audio software development kits to include HRTF, thus approximating an average form of the user’s head as closely as possible.

In this section, we provide a very rough framework of key terms of sound localization to better inform the following discussion of empirical research on audio in IVE. Readers with a particular interest in sound localization are referred to recent textbooks on this subject, e.g., Moore (2012). Moore et al. (2010), Schnupp et al. (2011).

4 Empirical Research on Audio in IVE

We have argued in Sect. 2 how immersive VR systems can facilitate spatial presence, followed by an overview on sound and sound localization in GeoIVE in Sect. 3. We now link these aspects through a meta-analysis of empirical studies on the impact of sound on the formation of spatial presence. Following a top–down approach, we will begin with studies on the basic question of sound vs. no sound, followed by a discussion of tests on more specific auditory conditions in virtual environments.

4.1 Sound vs. No Sound in IVE

To date, the test setting closest to the idea of GeoIVE was built by Poeschl et al. (2013), who investigated a highly detailed IVE, where users were asked to cross a realistically modeled forest clearing on a predefined path (Fig. 5). A no-sound condition was compared with a sonificated set-up of ten sound sources assigned to the scene in both a static (e.g., a rushing waterfall) and a dynamic (e.g., croaking frogs) manner. The results indicate that users develop a significantly stronger feeling of the spatial presence under the sound condition.

Fig. 5
figure 5

High-immersive IVE test scene from Poeschl et al. (2013; kindly provided by the authors)

Another realistic, but indoor, IVE was designed by Larsson et al. (2007) to study the effect of auditory space on the spatial presence on the basis of a 3D model of Örgryte New Church in Gothenburg, Sweden. Participants were asked to find and cross five (numbered) parts of the nave under different auditive conditions (including no sound). The results of this experiment also showed the significant positive impact of sound on the users’ sense of spatial presence.

At a lower level of immersion, but with a rather strong reference to geovisualization, Lindquist et al. (2016) compared different visual and sound conditions based on 3D scenes taken from Google Earth (Fig. 6). The findings of this study suggested “that coupling the appropriate sound with a corresponding visualization can be an effective way to more accurately simulate environmental experience when using 3D landscape visualization”. It should be noted that the low-immersive visualizations taken from Google Earth produced only low realism ratings by probands in this experiment, indicating a stronger impact of high-immersive applications in terms of GeoIVE on spatial presence formation.

Fig. 6
figure 6

Low-immersive test scene from Lindquist et al. (2016; published under CC BY-NC-ND 4.0)

While we have mentioned only a few studies showing a relatively close connection to the subject matter of GeoIVE, the general findings regarding the positive effect of sound on spatial presence are supported by a series of other experiments, e.g., Fryer et al. (2013), Pettey et al. (2010), Salski and Whitbred (2010), Serafin and Serafin (2004), Hendrix and Barfield (1995).

4.2 Sound Parameters in IVE

Beyond the principle question of implementing audio in IVE, the importance of using spatialized vs. non-spatialized sound has been a recurrent issue of empirical research. A study by Larsen and Pilgaard (2015), which was relatively closely designed to what we have defined as GeoIVE, compared stereo vs. 3D (i.e., HRTF-based) audio within the setting of an immersive computer game. While no significant differences between both conditions were observed in a self-evaluation presence questionnaire, 3D audio produced a significantly increased phasic electrodermal activity in the IVE users, indicating a stronger feeling of spatial presence at least on a subconscious level.

While Larsen and Pilgaard (2015) provided both stereo and 3D audio via headphones, other experiments tested sound spatialization effects via external speakers only, which we have excluded from a typical GeoIVE setting for a reduced level of sound control and immersion (cf. Section 3.3). For instance, Salski and Whitbred (2010), again using a gamification approach, compared 5.1 surround against two-channel stereo sound from external speakers, concluding that surround sound “almost universally impacted outcomes of interest, including several dimensions of presence and enjoyment”. However, the transferability between results obtained from external loudspeakers and headphones remains an open question (cf. Lindquist et al. 2016).

4.3 Ancillary Studies on Sound in IVE

To conclude this section on sound-related studies in the realm of IVE, several experiments are worth mentioning to include additional aspects of audiovisual cartography with VR that have not been covered above.

Mapmakers usually approach audiovision from the visual viewpoint, understanding sound as a rather complementary element of the graphics. However, people with impaired vision represent an important user segment of spatial information, who completely or partly rely on non-visual media. Audio description is a well-established method to make audiovisual content available to users with special needs, by means of narrative description (Piety 2004). While pure audio is a low-immersive media, audio description can provide a visually impaired audience with the same level of presence as an audiovisual stimulus (without narration) does among sighted people (Fryer and Freeman 2012). However, adding sound effects (e.g., animal noises) to audio description has not been proven to affect the spatial presence significantly for users with or without impaired sight (Fryer et al. 2013). The inclusion of users with special needs is, therefore, challenging not only for VR-based cartography in particular, but also for digital geovisualization in general (Hennig et al. 2017; Thebpanya 2010). The incorporation of other senses (e.g., touch; Tatsumi et al. 2015) into next-generation IVE may facilitate the feeling of being there among the broad range of geovisualization users.

Another aspect of sound in audiovisual applications is the congruity of the visual and the auditive environment. It is assumed that the feeling of being there in mediated space will be intensified by matching inputs of sound and vision (Larsson et al. 2007). Focusing on a geospatial scenario, Lindquist et al. (2016) attached, in different combinations, recordings of human speech, road traffic and bird call sounds to the scenarios shown in Fig. 6. The results of the study indicate that presence ratings are significantly influenced by the congruence of the visual and auditory stimuli so that, for example, scenes showing primarily vegetation will generate stronger presence when paralleled by birdsongs than by traffic noise. Associated with congruity, the utility of spatialized sound for navigation purposes has also been investigated: Bormann (2005) found 3D audio to influence navigation performance (but not spatial presence) positively in a low-immersive (desktop) VR application, while Ruminski (2015) observed spatial sound to be an important cue for searching tasks within indoor augmented reality (AR) scenes.

5 Conclusion

In a much-quoted survey, Balmford et al. (2002) found UK primary pupils to be more familiar with Pokémon characters than with local fauna and flora. Since then, distance and estrangement from nature by people spending less time in natural environments and ever more time indoors have been a frequently emphasized research topic (Capaldi et al. 2014), prominently framed as “nature-deficit disorder” by Louv (2005); (for a critical review of the connection-with-nature discussion cf. Fletcher 2017). The present article shows how distance between user and physical realities can be reduced by the spatial presence, such as the feeling of “being there” in a mediated environment. GeoIVE have been discussed as a conceptual model to frame VR systems to the particular needs and objectives of cartography.

While IVE in general, and GeoIVE (just like other geovisualization products) in particular, have focused on vision, we directed our attention to how auditory perception can facilitate the spatial presence within VR systems. In both high- and low-immersive environments, user studies on sound effects indicate the positive effect of a combined audiovisual vs. exclusively visual presentation. This makes audiovisual cartography with IVE a very rich field of research, raising issues of not only spatial presence, but also a variety of other cognitive aspects, e.g., how sound affects information overload (cf. Makransky et al. 2017), collaboration tasks (cf. Khadka et al. 2016) or the accuracy of object-location memory (cf. Lammert-Siepmann et al. 2017).

However, it should not be overlooked that many of the experiments on audio in IVE conducted to date are in fact ad hoc studies using different scenarios in different levels of immersion and with different measures of spatial presence. This diagnosis is symptomatic of the state of empirical cartography in general, which still lacks a research agenda that can effectively coordinate the often fragmented current research projects.

In this paper, we analyzed the potential of sound in immersive environments, which is quite distinct from research on low-immersive audiovisual cartography, to stress the potential and particularities of a recently adopted technology for geovisualization purposes. However, it should be noted that, while adopting this position, we do not claim that GeoIVE provide a generally deeper or “better” understanding of spatial phenomena than desktop-based mapping applications. Instead, we argue that mapmakers should be aware of all of the relevant available technologies to select the most appropriate tools in each case to generate insights from geospatial data through representation. We have shown that audiovisual GeoIVE represent one such tool.

6 Final Note

Due to the format of publication chosen for this paper, we approached immersive VR and sound only in a non-immersive and soundless manner. In addition to the bibliography below, interested readers are referred to the following GeoIVE application: http://www.biodiversidad.gob.mx/region/descargas/L_2017jl25.7z. Provided in an executable, but also generic, file format, this application may serve not only as an example with which to experience the feeling of “being there” but also as a test material for further studies on the spatial presence with GeoIVE.