1 Introduction

When talking about augmented reality (AR), most people and also the press have a visual version in mind (Behrendt 2015), like for instance, Google Glass which displays a visual information overlay onto a visual live feed. Other examples are Microsoft HoloLens, Vuzix Blade AR and many more. Locative audio or placed sounds could be considered as the acoustic equivalent of visual AR. We would like to argue, that this kind of audio AR could achieve the same or even more immersion in everyday situations than its visual equivalent. We understand immersion primarily in the sense of Witmer and Singer (1998) as a psychological state when perceiving the environment’s steady stream of stimuli as enveloping and including oneself. At the same time, the perspective of Slater (1999), defining immersion as an objective and technological characteristic of virtual reality systems, is essential for facilitating the experience of immersion according to Witmer and Singer (1998).

Admittedly, in real situations, no sense is functioning alone: humans are multi-sensual beings, and everyday sensual experiences are formed by a combination of sound, smell, sight, taste and touch. While visual AR offers the most dramatic effect, it is easy to tell the real and augmented parts apart. This could be different with augmented auditory information. Of course, audio information cannot augment the reality to such extent as visual AR, but rather subtle changes, when done correctly, can lead to a seamless experience, as geographical locations are far more than points on a map—they are defined by the subjective perception of the human agents (Popplow and Scherffig 2013).

Therefore, it makes sense to combine the positioning capabilities of smartphones with their audio playback capabilities. This provides the ability to deliver location-specific audio content. This approach can also be applied to tour guides, treasure hunts and more, as already realized in several professional global web- and mobile-enabled platforms (mainly tourism oriented) as well as in experimental art and academic research projects. Besides, several platforms allow the production and reproduction of georeferenced sound files. Nevertheless, all of these locative media platforms merely organize the audio data based on the location. Linear or branching narrative structures are only partially supported.

To create something with originality and an added novelty value, other aspects besides geolocation and audio playback have to be implemented. This work attempts to achieve just that by developing a prototype that combines different elements of media playback, geolocation and various sensor capabilities of smartphones to allow the creation of interactive geolocated storytelling and at the same time increase the narrative potential of the application. Interactive storytelling allows a gaming experience in which the form and content of the story is adapted to the player in real time, providing a sense of control over the course of the narrative (Bostan and Marsh 2010). This kind of media can be described as a non-linear location-based audio book or an interactive audio-based augmented reality platform.

The linkage of cartography and sound is no novelty, especially in the context of navigation for blind or visually impaired people (e.g., Golledge et al. 1998) Apart from that, recent research attempts this connection called audiovisual cartography for further purposes. Bartie et al. (2018) developed a virtual tour guide for hands- and eyes-free route guidance instructions to tourist destinations by landmarks visible from the user’s current location, with the help of a spoken dialogue user interface. Schito and Fabrikant (2018) transformed discrete and continuous digital elevation models into acoustic information by applying three parameter mapping sonification methods. However, Schiewe (2015) investigated the auditory options for sound maps representing quantitative data. Krygier (1994) introduced nine sound variables enlarging the visual capabilities of cartography for representing nominal and ordinal data. Also the other way around, the visualization of sound in space is an object of research. Schiewe and Kornfeld (2009) displayed the spatiality of city acoustics and Kornfeld et al. (2011) presented solutions for the visualization of sound in large-scale environments using visual encodings and mappings of acoustic parameters. However, audiovisual design is not firmly established yet in cartography (Edler et al. 2012; Brauen 2013).

2 Locative Sound Media

The term locative media was established in the early 2000s associated with the movement of locative art as a descriptor for devices and technologies embedded at or in a particular location (Leorke 2014). Locative media can be described as the convergence of geographic space and data space (Hemment 2006) and be seen as a new field worth exploring, with mobile, networked, location-aware devices helping to create new social interfaces to places, and through artistic interventions transforming the geographical space into a canvas for experimental interactions. Thus, locative sound media create a soundscape as defined by Schafer (1977), i.e., an acoustic ‘envelope’ that surrounds people in certain places.

2.1 Classification and Taxonomy

A comprehensive taxonomy of locative sound art and media has been developed by Behrendt (2010). The taxonomy is based on a thorough survey of about 200 mobile sound works and consists of four categories:

  • Placed sounds: In this category, artists place sounds in an environment and participants effectively create their own version or remix of the piece when they choose a certain route, thus playing back the sounds in different orders or combinations. Different themes or sub-genres can be distinguished in this category. These include more narrative ones such as historical, touristic, educational, fictional, games, and less narrative ones such as music and experimental sound.

  • Sound platforms: This category gathers mobile sound art works with a specific platform provided for the audience to contribute, edit and place sounds in space. That means, compared to the previous category, a different kind of interaction takes place: it is the task of the audience to choose or record sounds and assign them to locations.

  • Sonified mobility: This category describes artworks that utilize the audience’s mobility to influence the sound the audience is listening to. If the audience is not moving, it cannot experience the piece. While in the category of placed sounds, artists augmenting space with audio are mainly concerned with changing the location of the audience. Examples from the category of sonified mobility are primarily concerned with the mobility, i.e., the trajectory of the audience through space.

  • Musical instruments: This is a category that uses existing mobile media—and especially mobile phones—as musical instruments.

2.2 Sound and Context

Another aspect to consider is the context in which the mobile locative media experience takes place. Usually, this would be an urban setting, with users navigating through public spaces. The interactions between the users and their urban surroundings, including non-participants, on the one hand, and the locative media content on the other hand, present some interesting aspects to examine.

2.2.1 Sound and Space

As the visual sense constitutes 83% of human perception (Steiner 2011), we mostly perceive space through our vision. Nevertheless, the role of hearing in our spatial perception of environment should not be underestimated. Indeed, our everyday perception of space has a strong sonic component, because we are surrounded by natural sounds, rural noises or music (Schafer 1977). The direction, volume and fading of sounds are constantly and unconsciously giving us information of the location, distance and direction of the different objects in our surroundings (Schnupp et al. 2011). Even large objects that do not actively emit sound can be detected using echo. Quite often, just by listening to the background noise, we can determine in what kind of place/landscape we are, and what kind of objects/persons are there and where in relation to us. The most important characteristics in the scope of this paper are the spatial characteristics of sound and its pervasiveness. Sounds are spatial elements: sound waves that emanate from a central source and dissipate, as long as they are not obstructed, in a radial fashion. This has implications for the visualization of sound and the impact of distance from the sound source (increasing or decreasing volume). Using these tenets, it should be possible to specialize abstract audio data.

2.2.2 Sound and Information

Not only spatial properties, but also the information content of audio can have a strong influence on the subjective perception of space. The subjective perception of space never corresponds to its technical presentation as a map (Popplow and Scherffig 2013). In a hybrid space [i.e., a digital information layer, bound to a physical location and thus intertwining the digital and real worlds (de Souza e Silva 2006)], the changes in the digital layer also change the perception of the real or spatial layer. Although the visual, tactile, olfactory and gustatory sensual inputs stay the same, a superimposed audio information layer will make the user perceive the world differently, dependent on the content of audio information. For example, the emotional background might change, or the attitudes towards certain landscape objects of even other non-participants might be augmented by music which is the emotionally most powerful auditory signal (Steiner 2011). The narratives and attitudes conveyed will stay embedded in the cognitive processing of the users and can potentially define not only the present, but also the future perception of a specific place or landmark.

2.2.3 Sound and Interactivity

Still another aspect of how mobile locative audio media can influence the perception of space is through the necessity for an increased audience participation. Thus, a move from pensive reception to active participation (Behrendt 2010) is necessary to enable the consumption of locative audio media. The breaking down or at least diminishing of the barrier between the audience and the narrative is another crucial implication to consider when discussing locative sound art. Users themselves become actively involved in the unraveling of the narrative, co-forming their own experiences. We would like to argue that, when done correctly, this fact could be used to provide a much more immersive narrative that connects at a personal level, and thus transforms the subjective perception of space.

2.2.4 Narrative Structures

Sound can tell a story with a narrative structure, which is the framework holding the order and manner in which a story is presented to the reader, viewer or (in our case) listener. Following that structure, narratives describe relationships of cause-and-effect between events within a particular time period impacting certain characters (Dahlstrom 2014).

Interactive locative audio media gives the possibility to develop semi-linear narratives that have a progressing but branching story line allowing to go through events in different orders with plenty of choices, unlocking story nodes and thus places step by step. The semi-linear narrating is a parallel with contemporary video games; likewise the techniques in these games to animate and visualize maps reflect the mutual impacting and converging development with cartography, also in terms of increased realism and virtual worlds (Ahlqvist 2011, Edler and Dickmann 2017, Edler et al. 2018). Moreover, maps can be another means to represent a story’s relationships with places and its spatio-temporal structure (Caquard and Cartwright 2014).

Indeed, several users of previous locative sound media prototypes have suggested in interviews (conducted in the scope of user testing) that the technology could be used to implement stories (Nyre et al. 2017). Usually, locative audio media applications are not able to implement linear or semi-linear narratives. It is also a challenge to the authors as branching narratives are much more complex to implement (Röber et al. 2006). A typical structure is illustrated in Fig. 1. The story starts at the root (start) node and then traverses down the branches until a terminal node. Theoretically, this allows the construction of complex narratives. In practice, it probably might be best to focus on several main, intertwining and closely connected storylines. Special attention should be paid to the interactive nodes where the storyline splits. These nodes should present the user a clearly defined choice and enough incentives and information to make a decision. Similarly with other game elements of interactive locative media, the storyline branching should always be meaningful and rooted in the narrative.

Fig. 1
figure 1

Branching narrative structure (based on Röber et al. 2006)

3 Existing Audio-Locative Mobile Applications

There are multiple mobile locative audio media applications currently on the market. These applications use different platforms such as web and native applications, offer different content or allow to produce own location-based stories by the use of authoring tools. Some of these are paid services, while others are free of charge and/or non-commercial. It can be differentiated between two major types of services: content providers and platform providers. The first group provides a ready-made application with defined content for user consumption. The second group provides the tools to either (a) participate in an open platform where location-based content can be shared, or (b) create own applications using the authoring interface provided. There are also a number of software development companies specializing in developing customized mobile location-based audio solutions.

Most of the times, the content of the content provider group is either professionally made or edited. Many location-based audio guide applications fit this category. The main audience for these applications are usually tourists who want to experience quality locative content. Three selected locative audio media solutions (content providers) currently on the market are:

  • OnSpotStoryFootnote 1 OnSpotStory is a Scandinavian company specialized in locative audio-based guide development which markets its products as mobile storytelling. They offer an OnSpot community app that allows to experience many of their guides, although the company focusses on custom solutions. Most of the clients of OnSpotStory are museums, anyhow they have a sizable collection of outdoor locative audio media: historic, fictional and touristic.

  • PocketGuideAppFootnote 2 PocketGuideApp is traditional service that offers location-based audio tours. The respective mobile app currently offers location-based audio tours in more than 150 cities worldwide, commissioning local guides to provide the content.

  • Zombies, run!Footnote 3 This gamified jogging application serves a good example of a slightly different approach to location-based audio storytelling. Technically speaking, it should not be labeled as location-based but rather as kinetically aware storytelling. As long as the player is moving (in this case: jogging), the story advances. Occasionally, the player has to increase his speed to evade the zombies chasing him. Although different landmarks are frequently mentioned in the narrative, they are not associated with features in the real world. Instead, they are purposefully kept generic (e.g., school, hospital, square), so as to make use of user’s subjective interpretation. Solely, the movement of the user creates the link between the storyscape and landscape.

The second types of services, the platform providers, have less influence on the locative audio that is being published. Thus, a high-quality standard of the published material cannot be guaranteed. On the other hand, these services provide effective possibilities for spatial documentation, either by facilitating crowdsourcing of location-based content or by providing authoring tools. The platforms that offer authoring (and not just publishing) and application development tools, enable a multitude of content types to be developed and published individually, allowing the user to create mobile applications himself. The audience of these services are users with a distinct interest in locative audio, possibly even professional interest in the case of authoring tools. Two selected examples of platform providers are:

  • Radio AporeeFootnote 4 Radio Aporee is a worldwide platform for collecting, archiving and sound mapping. A broad community of phonographers, artists and individuals working with sound and field recording has crowdsourced a comprehensive corpus of sounds from all over the world. Radio Aporee is primarily web based, but also has a mobile interface for playback. Miniatures for Mobiles is a separate web-based authoring tool to create location-based sound walks with a common theme.

  • MotiveFootnote 5 Motive is a professional toolkit for location AR. It is a comprehensive authoring tool, giving its clients the possibility to rapidly develop and publish their own customized AR applications. Motive does not limit itself to audio-based AR, but primarily offers more conventional visual interfaces, as well as many gamification elements. Motive is a professional-grade application development tool aimed at clients that want to develop AR applications, but do not have the resources to develop it from the scratch.

4 Linking Sound, Space and Time Aiming at Immersion

The proposed application is based on two core ideas: (1) maximizing interactivity and immersibility of locative audio media and (2) enabling semi-linear narratives. Thus, the following list of ideas and features has been generated, being a first attempt, to suggest a catalog of design proposals:

  • The application should playback sounds based on the user location and the location of a virtual sound point. This is the most basic tenet of locative in situ media. Sound always travels over space in time, emanating from the source, distributed over space and eventually fading (Behrendt 2012). Thus, notional coordinates need to be associated with a sound piece which is played back when the user is in the vicinity of these coordinates in the real world (Behrendt 2015; Nyre 2015). For example, the user distance from the sound source triggers sound playback. This will enable creating background soundscapes. Extremely large radii could cover a whole city, to set the general mood of the narrative, while smaller radii could serve as background soundscapes for different neighborhoods. Still smaller radii can then be used for the narrative elements of the story. While some sounds might need to be played only once, others might need to be played in a loop. Examples would be an explosion of a bomb or a pop of a balloon that are played only once versus the chirping of birds or the sound of rain that are played as constant background sounds. A delay between repetitions aims to create a more dynamic and natural background or environmental sounds. If looping is implemented, some sounds might benefit from intermissions (e.g., train station announcements being repeated in regular intervals), whereas other sounds might benefit from irregular intervals, such as sounds of nature or other random environmental sounds. A feature of sound movement enables sounds that move over time on predetermined routes and speeds. For example, a sound recording of a tram could be set to follow the actual tram tracks. This would also open up possibilities of moving narratives, forcing the user to move in the environment, following the sound nodes.

  • Playback should be determined based on modifiable parameters. To enable linear- and context-based narratives, the sound playback should depend on extrinsic factors. These factors will ensure the interactivity of the application and simultaneously enable the implementation of game-like elements in the application. Research suggests that especially story-based and playful elements are facilitating the engagement in exploratory and participative experiences (Hutzler et al. 2017).

  • The decision about playback of certain sounds could depend on the speed of the user. If the user speed is not within a certain speed range, the sound is not played. This enables narratives that are directly influenced by the movement speed of the user. For example, a narrative might request the user to run and then the user can hear different sounds based on his movement speed. The direction, from which the user approaches the sound, could play a role to determine if and which sounds will be played. This can be used to direct the narrative and/or the user. For example, the narrative develops differently depending on the path the user has chosen to approach the sound node. It is easy to imagine a crime story or spy story in which the user has to approach a certain sound node from a defined direction, in order not to be seen. If the user succeeds, the narrative continues, otherwise the story might follow a less fortunate path. More immersive soundscapes could be enabled by the possibility of setting the timespan when a certain sound piece can be heard. Sounds that change dynamically according to not only spatial, but also temporal parameters will blend more seamlessly with the natural sonic background. The most obvious example would be the day/night rhythm of urban landscapes: being hectic during the day but much more quiet and mysterious during the night. This example has been suggested by Nyre (2015). Another use of temporal playback limitation would be the possibility to create daytime-dependent quests, i.e., pieces of the narrative that are only available during the night. To enable the possibility of a linear or semi-linear narrative, the order needs to be determined, in which the different sound nodes will be played. Therefore, two conditions should be considered: (1) sound nodes that must be played before the current node and (2) sound nodes that must not be played before the current node. This feature is an essential element of the proposed application and allows the possibility of creating structured narratives.

  • Sounds should be visualized in a map realizing audiovisual cartography. Although the main focus of the proposed locative audio media prototype is on audio interfaces, a visual graphic user interface is inevitable, for example, to visualize for the user where sounds are located, or to provide play/pause control. Depending on the different roles of sound nodes, they could also be displayed differently. Sound nodes should be graphically displayed on a map on the user device, to facilitate easier orientation. This is a standard feature of most location-based services. Visualization should represent the actual sound node trigger radii (as a kind of sound bubble). Different colors could be assigned to the representations of the sound nodes. This provides a simple and fast graphical overview of the different sound groups. For example, if a narrative presents several options for the user to choose from, they should be color coded to facilitate easier and faster orientation and decision-making. One can imagine a case, in which the complete narrative is being told from the viewpoint of several distinct protagonists, each being represented with markers of certain colors on the graphical user interface. An example would be an application addressing the Monday demonstrations in Leipzig (Germany) in 1989, being told from the viewpoints of the demonstrators or the regime, each side distinctly color coded. Not all sound nodes should be marked on the graphical user interface. Especially background or ambient sounds should not clutter the interface. These sounds contribute to the immersion, but since they are not directly relevant for the storyline, they should not be visible to the user.

  • Volume of playback should depend on distance to the audio source. To increase the subjective spatial perception of the sound nodes, a realistic sound volume–distance model is necessary. Sounds that are closer to the user should be louder. The changing sound volume will convey a sense of distance to the user, thus placing the sounds in the actual landscape. On the other hand, it should be possible for the user to disable this feature for some sounds to achieve uniform volume over the whole area.

  • Sounds with 3D qualities let the user be able to tell from which direction the sound is coming. Sound directionality can be established based on the direction the user is facing. Assuming a stereophonic sound reproduction (e.g., headphones), the volume of the left or right headphone speaker could change according to the user bearing relative to the relevant sound node (Fig. 2). This feature will further serve to establish the spatiality of the placed sounds and to contribute to the immersiveness of the whole experience. This aspect was explored and implemented early in the development of locative audio media by Helyer et al. (2009).

    Fig. 2
    figure 2

    Direction-based binaural volume gradients

  • A choice between instantly or manually initiated playback should be possible to distinguish between automatic, purely proximity-based playback and proximity-and-user-initiated playback. User-initiated playback should provide the usual controls available on common audio media players. In information-dense narratives, the users might want to rewind some of the audio pieces, and listen to some parts again. Such an option should be provided with an interface that is already familiar to the user.

  • To create dynamic soundscapes, the ability to playback several independent sounds simultaneously is essential. This will enable a multi-layered experience (Bradley 2012). As the user moves through space, new sound nodes should seamlessly come into the aural perception sphere of the user and others should fade away. This overlapping swelling and fading away also forms an important part of the spatialization of digital audio.

  • The progress of the narrative should be possible to be saved to enable a return to the same point in the narrative, if the user has to interrupt the current session.

Table 1 compares the features of the existing apps introduced in the previous chapter with the just suggested features, and reveals that all of them contain a map as they are location based, but other features are only partially or not implemented. The ideas of delay between repetitions, moving sounds, playback dependent on the approach direction and color-coded sound nodes in the map cannot be found in any of the apps.

Table 1 Summary of features of existing locative media applications

5 Application Prototype

To realize some of the ideas for interactive locative audio media developed conceptually and outlined above, a test application was developed. The goal of this first implementation was to test the basic functionality of the features and to facilitate debugging. Thus, a simplistic narrative structure (but with no specific narrative content) was designed, with two intertwining storylines, represented in blue and green in Fig. 3. Each storyline possesses several interactive elements. The following interactive constraints were implemented:

Fig. 3
figure 3

Narrative structure of the prototype application

  1. 1.

    The blue story point number 2 consists of two sound nodes. The approach direction determines which one is played back, thus achieving different narrative developments based on user approach direction.

  2. 2.

    The knot story point also consists of two sound nodes. Playback is determined by user speed.

  3. 3.

    The end story point number 2 is only visible between 1 PM and 7 PM.

  4. 4.

    Furthermore, there are two background audio tracks with different properties: siren sounds in irregular intervals of 30–60 s and sound of rain, looping constantly.

Altogether, 12 audio tracks were created for the test prototype. Different sound node radii were implemented as well as different sound node visualization colors, and different delays between playbacks and different volume modifiers were also applied. The sound nodes were positioned on the TU Dresden campus to facilitate easy access for testing purposes.

The objective of the prototype’s user interface is to enable user-friendly access to the full functionality of the application. Furthermore, since the proposed mobile locative audio media prototype is biased towards sound, the visual interface should be as unobtrusive as possible. The less the user has to interact with the screen of the mobile device, the better. A map is provided to help the user orientate himself/herself in the landscape and find relevant positioned narrative points. A first sketch of the design is proposed in Fig. 4. It consists of only one screen page with a map to keep the user interface structure as minimalistic as possible. Access to a list of narrative points is enabled through a side drawer solution. Clicking on any entry in the list of narrative points displays information about the point. This provides a very tactile and smooth experience, which should guarantee a fast and intuitive first experience with the application. For the same reason, it is a good idea to emulate the look and feel of Google Maps, as most users are already familiar with the corresponding interface.

Fig. 4
figure 4

First draft sketch of the user interface

After launching the application, the user sees only the yellow start sound node. This serves as an introductory node, launching the narrative. After the user has entered the yellow sound bubble, the first narrative element launches. Simultaneously, the next two narrative options become unlocked (Fig. 5a), and by approaching one of them, the user determines which storyline will unfold (Fig. 5b, c). The storylines then branch as indicated in Fig. 3 and the user determines the audio content that is delivered based on the sound nodes he decides to visit, but also on his speed and approach direction. The narrative ends as soon as one of the end nodes is reached (Fig. 5d).

Fig. 5
figure 5

Screenshots of the prototype application

The following list specifies implemented methods that are crucial for the prototype application and are intertwining location, time and sound:

  • Verifying if the user is within the playback radius of the sound node.

  • Determining if the user speed is within the necessary speed range for playback.

  • Verifying if the user is approaching the sound node from the correct direction.

  • Adjusting volume for simulating the spatial properties of sounds depending on two factors: user distance and bearing.

  • Verifying if sound should be played based on the time of day.

Regarding the four categories of locative sound media described in Sect. 2.1, the category-placed sounds match the developed mobile locative audio media prototype the best. However, the aspects of the category sonified mobility are also present, albeit to a much lesser degree.

6 Summary and Outlook

As shown previously, there are many location-based audio applications on the market today. Most of them are intended for tourism industry, providing a ‘tour guide in your pocket’. Tours for most of the bigger cities can be found on different platforms. There are also several art-centered locative sound projects, with a rather large audience. In addition, quite a lot of more or less experimental ‘one-shot’ locative audio projects have been completed. However, a comprehensive platform for audio-based locative storytelling is lacking. The main obstacle is the inability to implement linear and semi-linear narrative structures in otherwise unstructured locative audio data.

In the scope of this work, a prototype of purpose-built interactive locative audio storytelling application has been developed. The main goal was to design a media prototype that would allow the implementation of location-based audio books: linear or semi-linear narratives that can be experienced only in certain localities. Additional interactive elements were also implemented to allow for more narrative branching. The resulting prototype could be aptly described as a story game with spatial properties allowing an aural embedding of the virtual objects into our perceived reality, creating an audio-based augmented reality experience. To further develop this locative storytelling media, it is indispensable to conduct user tests and to produce quality content. Therefore, collaboration between authors, screenwriters and cartographers would be beneficial.