Keywords

1 Introduction

Sensations accompanying walking on natural ground surfaces in real world environments are rich, multimodal and highly evocative of the settings in which they occur [110]. For these reasons, foot-based human–computer interaction represents a new means of interacting in Virtual Reality (VR), with potential applications in areas such as architectural visualization, immersive training, rehabilitation or entertainment. However, floor-based multimodal (visual, auditory, tactile) information displays have only recently begun to be investigated [108]. Research work has remained limited as there has been a lack of efficient interfaces and interaction techniques capable of capturing touch via the feet over a distributed display. Related research on virtual and augmented reality environments has mainly focused on the problem of natural navigation in virtual environments [56, 81, 96]. A number of haptic interfaces for enabling omnidirectional in-place locomotion in virtual environments have been developed [46], but known solutions either limit freedom in walking, or are highly complex and costly.

The rendering of multimodal cues combining visual, auditory and haptic feedbacks has rarely been exploited when walking in a virtual environment. Many aspects of touch sensation in the feet have been studied in prior scientific literature, including its roles in the sensorimotor control of balance and locomotion over different terrains. However considerably less is known about how the nature of the ground itself is perceived, and how its different sensory manifestations (touch, sound, visual appearance) and those of the surroundings contribute to the perception of properties of natural ground surfaces, such as their shape, irregularity, or material composition, and our movement upon them. Not surprisingly then, in human–computer interaction and virtual reality communities, little research attention has been devoted to the simulation of multisensory aspects of walking surfaces in ways that could parallel the emerging understanding that has, in recent years, enabled more natural means of human computer interaction with the hands, via direct manipulation, grasping, tool use, and palpation of virtual objects and surfaces.

The present chapter proposes to review the recent interactive techniques that have contributed to develop multimodal rendering of walking in virtual worlds by reproducing virtual experiences of walking on natural ground surfaces. These experiences are enabled primarily through the rendering and presentation of virtual multimodal cues of ground properties, such as texture, inclination, shape, material, or other affordances in the Gibsonian sense [37]. The related work presented in this chapter is organized around the hypothesis that walking, by enabling rich interactions with floor surfaces, consistently conveys enactive information that manifests itself through multimodal cues, and especially via the haptic and auditory channels. In order to better distinguish this investigation from prior work, we adopt a perspective in which vision plays a primarily integrative role linking locomotion to obstacle avoidance, navigation, balance, and the understanding of details occurring at ground level. That is why we will not detail the visual rendering of walking over virtual grounds itself.

This chapter includes the presentation of (1) multimodal rendering techniques for the interactive augmentation of otherwise neutral (i.e., flat, silent, and visually homogeneous) ground surfaces; (2) multisensory effects and cross-modal illusions, involving the senses of touch, kinesthesia, audition, and vision, that were made possible by novel interfaces.

The chapter is organized as follows. Section 12.2 is dedicated to auditory rendering. Section 12.3 begins with the description of haptic rendering approaches, before presenting multimodal systems and cross-modal approaches. Section 12.4 concludes this chapter.

2 Auditory Rendering

2.1 Introduction

A walking task can be said to be intimately linked to a corresponding auditory task. Not only do walkers constantly hear most of their own footsteps and foot movements, but they are typically also aware of other persons walking in a shared auditory scene. In parallel, the same scene may be populated by passive listeners who, while standing or sitting, and not necessarily visually attending to pedestrians in their surroundings, may nevertheless perceive the footsteps as part of the ambient soundscape.

These simple considerations already say much about the importance of the auditory cues in informing ones perception and action loop during a walking task, and, furthermore, in conveying information that can have social relevance when contributing to form a soundscape that is shared by several listeners.

As with any other type of non visual, ecological feedback, footstep sounds can occupy the periphery of the attention. In other words, we need to make conscious use of this feedback unless it brings to our ears salient cues, either familiar or unexpected. A similar process happens for instance when a car driver’s attention may be triggered by an almost imperceptible change in the sound of the engine signaling potential malfunctioning of the car, even after hours on a long trip along a monotonous highway [75]. We do not need much quantitative science to establish these observations in an empirical way: the use of footstep sounds as an auditory warning has long been recognized by movie directors, who used to ask their Foley artist for preparing the right walking sound when a new character entered the movie stage, or for sonifying night-time chase actions that were typical of the “noir” genre.

It should be clear, at this point, that walking sounds are expressive. Through the long familiarity with our own and others’ footsteps, we built subjective mental maps linking such sounds to corresponding physical attributes and gestures of the walking person. Some of these links are obvious, and have been exploited for instance in early computer game designs. All vintage electronic game players probably remember the use of iconic footstep sounds to render the number and moving speed of the enemies in Space Invaders™, a popular computer game of the late 1970s: the designers of that game arrived at a successful design by making effective use of extremely simple sound elements, whose staticity expressed well the martial attitude of the adversarial squadron.

2.1.1 Psychoacoustic Measurements

The expressivity of footsteps has been analyzed from a scientific perspective as well. On the experimental side, Pastore et al. have adopted an ecological approach to the auditory perception of footsteps. Their experiments investigated the ability of listeners to recognize walkers’ gender from walking sounds [61], as well as different kinematics of the gait in people walking with either normal upright or stooped posture [78]. Experiments have been also conducted on the recognition of familiar individuals from their footstep sounds [28]. In all such investigations, an effort has been devoted to identify the acoustic invariants that are responsible for the subjective decisions. Arguably such invariants necessarily span a multiplicity of auditory cues. In particular, the demonstrated dependency of these cues on specific spectral features such as spectral slopes, moments, and centroids, can make such perceptual research especially informative for auditory rendering purposes.

A parallel thread in the acoustic analysis of footsteps has concerned their recognition with respect to specific characteristics of the ground. Although starting from an engineering perspective, this thread has introduced even deeper arguments in favor of an ecological approach to these experiments. Cress measured, and hence modeled the acoustic response of outdoor ground sites to individuals who were crawling, walking, and running: not only did he establish the dependence of the response spectra on the ground characteristics of the site; he also showed the relative invariance across frequency of the bands of spectral energy with respect to the walking activities [17]. These conclusions did not contradict earlier assessments made by Watters, who had found dependence on the floor type of impact force values measured from a single hard-heeled female footstep on various floors [113]. Stimulated by these experiences, Ekimov and Sabatier searched broad-band components of footstep sound signatures for different floor materials and walking styles: although the high-frequency band of these signatures contains most of the information about the frictional (i.e. tangential force) components giving rise to the footstep sounds, the same band has been shown to be relatively invariant with respect to changes in both floor covering and walking styles [25]. Irrespectively of their conclusions, overall these studies have called for introducing the floor dimension in the psychophysics of footstep recognition.

Research in this area has, consequently, begun to reveal the mechanisms underlying the active recognition of footsteps over different grounds. In such cases, subjects are engaged in a perception and action (walking) task, i.e., they are not just passive listeners, and thus the recognition process involves also use of the tactile sensory channel. In another investigation, by masking the tactile channel using active shoes capable of generating vibrational noise at sole level, Giordano et al. were able to study walkers’ abilities to identify different ground surfaces comprising both solid materials (e.g., marble, wood) and granular media (e.g., gravel, sand) when alternately auditory, haptic, or audio-haptic information was available. The authors found that walkers could perform this perceptual task through a variety of different sensory modalities [39].

2.1.2 Premises for an Auditory Rendering of Grounds

The latest experiment is even more interesting, since the walkers’ experience was augmented with elements of synthetic feedback, specifically to mask tactile cues of real ground materials. This design strategy opens new scenarios, in which the non visual “ground display” (as it is perceived by walkers) is contaminated with synthetic cues that mix with the rest of the floor feedback. Although the previous experiment is clear in posing limits to the salience of the auditory feedback when it does not match with the simultaneous (in that case noise-masked) tactile cues, yet it leaves room to sound as a mean for enriching the information brought by these cues. Specifically, one may think to mould an otherwise neutral tactile feedback, such as that experienced while walking on a silent, homogeneous flat and solid floor, using auditory cues reporting about a different type of ground; likewise, one may try to bias a multimodal stream of ground cues by altering some of their auditory parameters through the use of virtual sounds, without breaking the coherence of the feedback overall. In both such cases, however, an artificial perturbation of the auditory feedback has a chance to shape the recognition of a floor without disrupting the perceived realism of the multimodal percept, only if this perturbation elicits some form of cross-modal (specifically, audio-tactile) illusion.

Several cross-modal tactile effects induced by auditory cues have been discovered [8, 47]. In the following, we will report on recent studies that have investigated partial or total sensory substitutions of ground attributes in walkers, who were presented virtual auditory cues of the ground using different techniques, reproduction methods, as well as experimental setups, methodologies and tasks. Preliminary to these studies, a state of the art of the models for the rendering of footstep sounds is surveyed starting from the early experiences, until current developments. The section concludes by providing guidelines to the sound designer, who is interested in the realization of interactive floors including the auditory modality as part of their multimodal feedback.

2.2 Footstep Sound Synthesis

For what we have previously seen, the acoustic reproduction of walking requires one to render at least at two levels the auditory information: a low level, accounting for the sonic signatures of a single footstep, and a high level, that conversely reports about the frequency of the walking cycle and its fluctuations across time. Further cues would be needed to render the spatial movement across a walking area: although necessary to define a realistic soundscape, such cues are closely related to the spatialization features of sound reproduction, an issue that raises questions of 3D audio, a research and application field whose specific links to the rendering of walking sounds will be treated later in Sect. 12.2.3.

High-level cues are intuitively not too difficult to be rendered, as soon as a sufficiently large collection of data is put available for inferring a convenient statistical model for the walking cycle of a homogeneous population. More interesting are the constraints among instances of such cycles taking place in collective contexts, giving rise to entrainment effects [104]: for these effects the exact role of sound is currently unknown, in spite of a conspicuous number of works dealing with the relationships existing between gait cycle and rhythmic (especially musical/dance) sonic patterns [93].

Low-level cues represent an even more challenging design issue. By bringing information on the interactions taking place during the contact between the foot and the ground, they mainly report about the materials the floor and the shoes are made of. For this reason, the accuracy of their reproduction depends on the ability to embed this information within a sound synthesis model. Normally, these models must keep parametric control of the temporal as well as spectral features of the synthesis: as we will see in Sect. 12.2.2.2, the former are especially important for determining the correct particle density during the reproduction of aggregate grounds such as those made of crumples, ice, snow, creaking wood; conversely, the latter provide a unique color to the contact events, hence becoming crucial in interactions with solid floors, where the entire footstep sound is represented by one or very few contact events.

Further information, concerning several characteristics of a walker (weight, height, age, sex) results from the interplay of low- and high-level cues, and the information they provide about foot gesture, postural habits and locomotion style of the walking person: a credible rendering of footstep sounds must account also for this interplay, for which a comprehensive collection of kinematic and biomechanical data is not available yet [22]. This and other knowledge gaps currently make the design of interactive walking sound synthesizers a difficult task.

2.2.1 Early Models

The first systematic attempt to synthesize walking sounds was proposed by Cook in 2002 [14]. In this pioneering system, engineered on an STK-based sound engine known as Bill’s Gait, the author introduced research elements that are still stimulating nowadays. In particular, Bill’s Gait successfully implemented a number of solutions that are still largely state-of-the art in the realm of real-time sound processing: he detailed an analysis procedure which included Linear Predictive Coding for the extraction of footstep color, a Wavelet analysis for estimating the particle density, and an envelope following of the gait sequence for informing the higher-level statistics on amplitude and frequency of the walking cycle. His model could store data on footstep signatures from sound signals, which were recorded during foot interactions with diverse floors. The same signatures could be reproduced online essentially by reversing this procedure, i.e., by mapping the predictor onto the coefficients of a parametric re-synthesis filter, and by feeding this filter with signals having the temporal density and envelopes calculated during the analysis.

Especially innovative and rewarding, in this modeling approach, was its tight interactivity with non musical sound events. Not only did this system allow straightforward connection of floor interfaces like sensing mats; it also put a palette of controls available to the users, who could manipulate the synthesis parameters for trimming the results of the analysis, and furthermore introduce their own taste to the footstep sounds. A similar interaction design approach was followed by Fontana and Bresin one year later in form of C external code for the Puredata realtime environment, limitedly to the interactive simulation of aggregate grounds [30]: as opposed to Cook, their model was completely independent of pre-recorded material, instead relying on a physics-based impact model simulating a point-wise mass colliding against a resonant object through a nonlinear spring. This model was employed to generate bursts of micro impacts in real time, whose individual amplitude and temporal density followed stochastic processes taken from respective physical descriptions of crumpling events. Such descriptions expose macro parameters (respectively of amplitude and temporal density) that, for the purpose of this model, could be used for user control. Finally, an amount of potential energy could be set which was progressively consumed by the micro impacts during every footstep: this feature made it possible to trigger a footstep on a specific floor directly, i.e. with no further information needed, and allowed the authors to reproduce slow-downs taking place at the end of a run, based on assumptions on human movement having links to musical performance.

Both such models have imposed the closed-loop interaction paradigm to the specific area of interactive walking simulation. This paradigm is even more constraining in the case of acoustic rendering, as only few milliseconds are allowed to the system for displaying the response signal in front of an action of the foot in contact with a sensing floor, or wearing an instrumented shoe. From there, further experiences have aimed at refining the maps linking foot actions to the synthesized sound. In particular, an attempt to integrate some biomechanical parameters of locomotion, particularly the ground reaction force, in a real time footstep sound synthesizer was made by Farnell in 2007 [27]. The result was a patch for Puredata that was furthermore intended to provide an audio engine for computer games, in which walking is interactively sonified.

2.2.2 Current Approaches to Walking Sound Synthesis

The synthesis of walking sounds has been recently centering around multimodal, interactive contexts where users are engaged in a perception and action task. In fact, for the mentioned lack of robust maps linking biomechanical data of human walking to dynamic contact laws between the foot and grounds having different properties, if the listener is not physically walking then the synthesis model can be conveniently resolved by a good dataset of footstep sounds recorded over a multiplicity of grounds, that is managed by an intelligent agent capable of understanding the context: this happens e.g. in recent videogames, where the scenarios and situations in which the game characters are engaged provide the ground parameters and the kinematic data enabling the selection of appropriate elements of a knowledge base. Extremely accurate collections of walking sounds exist, especially in commercial repositories like http://sounddogs.com, for creating such a dataset.

Somehow closer to an interactive synthesis paradigm, a hybrid model has been recently proposed based on a simplified version of Cook’s method, relying on the temporal envelope control of filtered noise [31]. As Fig. 12.1 shows, every footstep results by weighing the output of a series of linear filters through a temporal envelope function. Both such filters coefficients and this function report of a characteristic locomotion style on a specific ground material, whose sonic signature is extracted from a set of recorded samples: the former obtained by Linear Predictive Coding of these samples, the latter created by defining a force-dependent stochastic process on top of the same recorded information.

Fig. 12.1
figure 1

Hybrid synthesis of footstep sounds [31]

The approach based on datasets or hybrid generation has fewer points when the auditory feedback must tightly follow the locomotion and foot gestures of the walkers. As we previously said, in this situation users are not passive listeners, conversely they are engaged in a perception and action task. However in this case, also on the light of the psychoacoustic experiments previously described, the applicability of interactive sound rendering is necessarily limited since real walking cannot be substituted with a virtual experience, nor can the auditory cues contradict the tactile perception through the feet. For this reason, the synthesis models which are currently receiving most attention are those capable of rendering aggregate grounds. The related cues, in fact, can conveniently “overwrite” the feedback provided by flat and homogeneous, sufficiently silent floors such as those covering normal buildings and other urban spaces. For these floors, interesting augmentations can be realized especially if companion vibrotactile cues of aggregate ground material are provided underfoot, simultaneously with the corresponding auditory feedback.

The Natural Interactive Walking EU project, active until fall 2011, has put major emphasis on the audio-tactile augmentation of otherwise neutral floors through the use of active tiles as well as instrumented shoes. Both such interfaces, detailed in Sect. 12.3.3, were designed based on the fundamental hypothesis that a credible, however informative augmentation of a flat, solid floor could be realized via the superposition of virtual audio-tactile cues. As noted in Sect. 12.2.3, in practice these cues had to guarantee an especially “strong” characterization to walkers having normal sensory abilities, mainly to counterbalance the unavoidable bias caused by the visual appearance of a ground surface: silent floors, then, were augmented so to sound either as aggregate grounds, or strongly coloring (such as wooden) surfaces.

Effective audio-tactile simulations of aggregate and resonant ground categories have been obtained through physically-based sound synthesizers, whose low-level core made use of the same dynamic impact model as that used by Fontana and Bresin [30]. In phenomenological sense, physics-based models have the fundamental advantage to provide a coherent multimodal feedback: since they reproduce force and velocity signals, then their response can be inherently used to mechanically excite the resonant body, in our case a floor; once this excitation is known, along with the resonance properties of the same floor, then it is not difficult to get sounds as well as vibrations from it. Specifically, a footstep sound can be considered to be the result of multiple microimpacts between a shoe and a floor. Either they converge to form a unique percept consisting of a single impact, in the case of solid materials, or conversely they result in a more or less dispersed, however coherent burst of impulsive sounds in the case of aggregate materials.

An impact involves the interaction between two bodies: an active exciter, i.e., the impactor, and a passive resonator. Sonic impacts between solid surfaces have been extensively investigated, and results are available which describe relationships between physical and perceptual parameters of the objects in contact [52, 103]. The most simple approach to the synthesis of such sounds is based on a lumped source-filter model, in which a signal \(s(t)\) modeling the excitation is passed through a linear filter with impulse response \(h(t)\) modeling the resonator, and resulting in an output expressed by the linear convolution of these two signals: \(y(t) = s(t)*h(t)\). A more accurate reproduction of the contact between two bodies can be obtained by simulating the nonlinear dynamics of this contact: a widely adopted description considers the force \(f\) between them to be a function of the compression \(x\) of the exciter and velocity of impact \(\dot{x}\), depending on the parameters of elasticity of the materials, masses, and local geometry around the contact surface [3]:

$$\begin{aligned} f(x,\dot{x}) = \left\{ \begin{array}{c l} -k x^\alpha - \lambda x^\alpha \dot{x}\,,&{}\quad x > 0 \\ 0\,,&{} \quad x \le 0\\ \end{array} \right. \end{aligned}$$
(12.1)

where \(k\) accounts for the material stiffness, \(\lambda \) represents the force dissipation due to internal friction during the impact, \(\alpha \) depends on the local geometry around the contact surface. When \(x \le 0\) the two bodies are not in contact.

Friction is another crucial category at the base of footstep sound generation [36]. This phenomenon has been synthesized as well, by means of a dynamic model in which the relationship between relative velocity \(v\) of the bodies in contact and friction force \(f\) is represented as a differential problem [4]. Assuming that friction results from a large number of microscopic elastic bonds, also called bristles, the velocity-to-force \(f(\ldots ,v,\ldots )\) relationship is expressed as:

$$\begin{aligned} f(z,\dot{z},v,w) = \sigma _0 z + \sigma _1 \dot{z} + \sigma _2 v + \sigma _3 w \end{aligned}$$
(12.2)

where \(z\) is the average bristle deflection, the coefficient \(\sigma _0 \) is the bristle stiffness, \(\sigma _1 \) the bristle damping, and the term \(\sigma _2 v \) accounts for linear viscous friction. The fourth component \( \sigma _3 w\) relates to surface roughness, and is simulated as fractal noise.

2.3 Walking Sounds and Soundscape Reproduction

The algorithms described in the previous section provide faithful simulations of walking sound on different surfaces. In order to achieve realistic simulations of virtual environments, it is important to provide a context to such sounds, i.e., to be able to render them as delivered in specific locations.

“Spaces speak, are you listening?” asks the title of a book by Blesser and Salter, which explores the topic of aural architecture from an interdisciplinary perspective considering audio engineering, anthropology, human perception and cognitive psychology [7]. Indeed listening to a soundscape can provide useful information regarding the size of the space, the location, the events happening. The sounds associated to a place can also evoke emotions and memories. Moreover, when exploring a place by walking, at least two categories of sounds can be identified: the person’s own footsteps and the surrounding soundscape. Studies on soundscape originated with the work of Murray Schafer [87]. Among other ideas, Schafer proposed soundwalks as empirical methods for identifying a soundscape for a specific location. During a soundwalk it is important to pay attention to the surrounding environment from an auditory perspective, while physically blocking the input from strong sensorial modality like vision, by walking blindfolded. Schafer claimed that each place has a soundmark, i.e., sounds which one identifies a place with.

When reproducing real soundscapes in laboratory settings several challenges are present, both from the designer’s point of view and from the technologist’s point of view. From the designer’s point of view, the main challenge is how to select the different sonic events that combined together produce a specific soundscape. From this perspective the scientific literature is rather scarce. The approach usually adopted is merely based on the artistic skills and intuitions of the sound designer. However, an exception is the work of Chueng [11], who suggested to design soundscapes based on users’ expectations. Her methodology consists of asking people which sounds they associate to specific places, and then use their answers as a starting point to create soundscapes. Chueng also proposes discrimination as an important parameters in soundscape design. Discrimination is defined as the ability of a soundscape to present few easily identifiable soundmarks. In her approach, this is also called minimal ecological sound design.

Studies have shown the importance of auditory cues in virtual reality simulation, and how they can lead to measurable enhancement in what is called the feeling of presence. In [86] it is reported how sound contributes to user’s sense of presence, as evidenced by electrodermal activity and temperature measurements, as well as questionnaire’s scores. Moreover, significant differences were noticed when measuring delivered sound through headphones or surround sound (5.1) using loudspeakers. Other studies show how ratings of presence are enhanced by either the addition of bass or the increase of volume. On the other hand, an increase on number of channels does not increase ratings of presence [32]. The role of self-produced sounds to enhance sense of presence in virtual environments has also been investigated. By combining different kinds of auditory feedback consisting of interactive footstep sounds created by ego-motion using the techniques described in the previous section with static soundscapes, it was shown how a person’s motion in a virtual reality environment is significantly enhanced when moving sound sources and ego-motion are rendered [74].

Concerning delivery of footstep sounds, they can be conveyed to the walker by means of different hardware devices, such as headphones, loudspeakers or through bone conduction. The choice of delivery methods depends on several factors, for example if the soundscape has to be part of a mobile or augmented reality installation, or if it is part of a virtual reality laboratory setting. An ecologically valid solution consists of placing loudspeakers at the shoes’ level, since this faithfully reproduces the equivalent situation in real life, where footstep sounds come at the level of the interaction between a shoe and a floor. As an alternative, sounds can be conveyed by means of a system of multichannel loudspeakers. In this case a problem arises regarding how footstep sounds can be rendered in a 3D space, and how many loudspeakers should be used and where they should be placed.

Sound rendering for virtual environments has reached a level of sophistication that it is possible to render in realtime most of the phenomena which appear in the real world [34]. 3D spatialized audio in immersive virtual environments remains however still challenging. In delivering through multichannel speakers, the choice of rendering algorithms is fundamental. As a matter of fact, various typologies of soundscapes can be classified: static soundscapes, dynamic soundscapes and interactive soundscapes. Static soundscapes are those composed without rendering the appropriate spatial position of the sound sources. In static soundscapes the same content is delivered to every channel of the surround sound system. The main advantage of this approach is the fact that the user exploring the virtual environment does not need to be tracked, since the same content is displayed to every speaker no matter where the user is placed. The main disadvantage is the fact that the simulation does not represent a real life scenario, where different sonic cues are received depending on where a person is placed. Dynamic soundscapes are those where the spatial position of each sound source is taken into account, as well as their eventual movements along three-dimensional trajectories. Finally, interactive soundscapes are based on the dynamic ones where in addition the user can interact with the simulated environment generating an auditory feedback as result of actions. This last situation ideally represents the scenario with augmented footstep sounds, where each step of the user must be tracked and rendered while the user is walking in the virtual environment, without any perceivable latency, in order to recreate for example the illusion of walking on a surface different from the one the user is actually stepping upon, or to allow the user to interact with objects of the virtual environment.

Sound delivery using headphones can also be performed using two general approaches: the simple mono or stereo delivery and a solution based on binaural synthesis. One of the main issue in combining footstep sounds and soundscape design is to find the right amplitude balance between the two. One approach can be empirical, by asking subjects to walk freely while interactively producing the simulated footstep sounds and hearing the reproduced soundscape through multichannel speakers. Subjects are then able to adjust the volume of footstep sounds until they find a level which they considered satisfactory. After describing the possibilities offered by hardware technologies, the next section describes available software packages for footstep sound design.

2.4 Footstep Sound Design Toolkits

A specific treatment on the use of the above models for foot-floor interaction purposes has been presented by Serafin et al. [88], along with pointers to sources of software, sound, and other documentation material. Implementing such models is not straightforward, but real-time software modules realizing impact and friction interactions are available, that are open and flexible enough for inclusion in more general architectures for the synthesis of footstep sounds. In particular, the Sound Design ToolkitFootnote 1 (SDT) [21] contains a set of physically-consistent tools for designing, synthesizing and manipulating ecological sounds [36] in real time. SDT consists of a collection of visual programs (or patches) and dynamic libraries (or externals) for the software Puredata, which is publicly available, and Max/MSP, which is easier to work with although commercial. SDT provides also examples, allowing users to launch these patches and see them at work in both such visual environments.

Public software is also available, which implements footstep sound synthesis models that are ready for use. Farnell accompanied his work with a patch and an external for Puredata, both referenced in the related paper [27]. Fontana’s crumpling model for Puredata has been integrated in SDT: examples of this model at work can be found, among others, in the Natural Interactive Walking project website.Footnote 2 The same website collects sound examples resulting from alternative instantiations of the physically-based approach, based on a sound synthesis engine that has not been put available in the public domain [102]. Furthermore, it contains footstep sounds that have been generated using the aforementioned hybrid model descending from Cook’s synthesis technique.

3 From Haptic to Multimodal Rendering

3.1 Introduction

3.1.1 Walking and Haptic Feedback in Virtual Environments

Virtual reality applications aim at simulating digital environments with which users can interact and, as a result, perceive through different modalities the effects of their actions in real time. Current VR applications draw primarily on vision and hearing. Haptic feedback—which aims to reproduce forces, movements and other cutaneous sensations felt via the sense of touch—is rarely incorporated, especially in those VR applications where users are enabled to walk.

A heightened sense of presence can be achieved in a VR simulation via the addition of even low-fidelity tactile feedback to an existing visual and auditory environment, and the potential gains can, in some cases, be larger than those obtained by improving feedback received from a single existing modality, such as the visual display [91].

High-frequency information in mechanical signals often closely links the haptic and auditory modalities, since both types of stimuli have their origin in the same physical contact interactions. Thus, during walking, individuals can be said to be performing simultaneous auditory and haptic probing of the ground surface and environment. As demonstrated in recent literature, walkers are capable of perceptually distinguishing ground surfaces using either discriminative touch via the feet or audition [39]. Thus, approaches to haptic and auditory rendering like those reviewed in this chapter share common features, while the two types of display can be said to be partially interchangeable.

An important component of haptic sensation is movement. Walking is arguably the most intuitive means of self-motion within a real or virtual environment. In most research on virtual environments, users are constrained to remain seated or to stand in place, which can have a negative impact on the sense of immersion [90]. Consequently, there has been much recent interest in enabling users of such environments to navigate by walking. One feasible, but potentially cumbersome and costly, solution to this problem is to develop motorized interfaces that allow the use of normal walking movements to change position within a virtual world. Motorized treadmills have been extensively used to enable movement in one-dimension, and this paradigm has been extended to allow for omnidirectional locomotion through an array of treadmills revolving around a larger one [49]. Another configuration consists of a pair of robotic platforms beneath the feet that are controlled so as to provide support during virtual foot-ground contact, while keeping the user in place. Another configuration consists of a spherical cage that rotates as a user walks inside of it [46]. The reader could refer to the chapter by Iwata in this volume for further discussion of these scenarios. The range of motion, forces, and speeds that are required to simulate omnidirectional motion make these devices intrinsically large, challenging to engineer, and costly to produce. In addition, while they are able to simulate the support and traction supplied by the ground, they cannot reproduce the feeling of walking on different materials.

Lower-cost methods for walking in virtual environments have been widely pursued in the VR research community. Passive sensing interfaces have been used to allow for the control of position via locomotion-like movements without force feedback [94]. Walking in place is another simple technique, in which movements of the body are sensed, and used to infer an intended movement trajectory [96]. For virtual environments that are experienced via an audiovisual head mounted display, a user’s locomotion can be directly mapped to movements in a virtual environment. The real walkable workspace is typically much smaller than the virtual environment, and this has led to the development of techniques, such as redirected walking [81], that can engender the perceptual illusion that one is walking in a large virtual space.

The auditory and tactile experience of walking on virtual materials can be simulated by augmenting foot-ground interactions with appropriate sounds or vibrations. Although vibrotactile interfaces are simpler and lower in cost to implement than haptic force feedback devices [62], they have only recently been used in relation to walking in virtual environments. Auditory displays have been more widely investigated, and walking sounds are commonly used to accompany first-person movements in immersive games, although they are rarely accompanied by real foot movements. Cook developed a floor interface (the Pholiemat), for controlling synthesized walking sounds via the feet, inspired by foley practice in film [14, 15], and other researchers have experimented with acoustically augmented shoes [77]. Research on the use of vibrotactile displays for simulating virtual walking experiences via instrumented shoes [89] or floor surfaces [107] is still in its infancy.

Although tactile displays have, to date, been integrated in very few foot-based interfaces for human–computer interaction, several researchers have investigated the use of simple forms of tactile feedback for passive information conveyance to the feet. Actuated shoe soles used to provide tactile indicators related to meaningful computing events [85, 105], and rhythmic cues supplied to the feet via a stair climber have been found to be effective at maintaining a user’s activity level when exercising. In automotive settings, tactile warning cues delivered via the accelerator pedal have been studied for many years [67], and eventually appeared in production vehicles. Tactile stimulation to the feet has also been explored as an additional feedback modality in computer music performance [84].

3.1.2 Haptic and Acoustic Signals Generated by Walking Interactions

During walking interactions, several touch interactions are involved with the virtual ground. Stepping onto a natural or man-made surface produces rich multimodal information, including mechanical vibrations that are indicative of the actions and types of materials involved. Stepping on solid floors in hard-soled shoes is typified by transient signals associated with the strike of the heel or toe against the floor, while sliding can produce signals such as high-pitched squeaking (when surfaces are clean) or textured noise. In indoor environments, the operation of common foot operated switches, used for lamps, dental equipment, or other machines, is often accompanied by transient clicks accompanying the engagement solid mechanical elements. The discrete quality of these mechanical signals contrasts with the more continuous nature of those generated by a step onto natural ground coverings, such as gravel, dry sand, or branches. Here, discrete impacts may not be as apparent, and can be accompanied by both viscoelastic deformation and complex transient, oscillatory, or noise-like vibrations generated through the inelastic displacement of heterogeneous materials [45]. A few of the processes that can be involved include brittle fracture and the production of in-solid acoustic bursts during rapid micro-fracture growth [1, 2, 45], stress fluctuations during shear sliding on granular media [5, 19, 72, 73], and the collapse of air pockets in soil or sand.

A series of mechanical events can be said to accompany the contact of a shoed foot with the ground. There may be an impact, or merely a soft landing, according to the type of shoe, the type of ground, and the stride of the walker. Once the initial transitory effects have vanished and until the foot lifts off the ground, there may be crushing, fracturing, or little movement at all if the ground is stiff. There may also be slipping if the ground is solid, or soil displacement if the ground is granular. There may be other mechanical effects, such as the compacting of a compressible ground material (e.g., soil, sod, snow). The question of what form of haptic signal to reproduce in virtual reality applications is therefore not so simple to answer. The sense of touch is nearly as refined in the foot as it is in the hand. It has, in fact, great discriminative acumen, even through a shoe sole [39]. However, like vision or audition, in accordance to the perceptual task, it may be satisfied by relatively little input. In the case of foot, our habit to wear shoes plays in our favor since shoes filter out most of the distributed aspects of the haptic interaction with the ground, save perhaps for a distinction between the front and back of the foot at the moment of the impact. In that sense, wearing a shoe is a bit like interacting with an object through a hand-tool. The later case, as is well known, is immeasurably easier to simulate in virtual reality than direct interaction with the hand. When it comes to stimulating the foot, the options are intrinsically limited by the environmental circumstances. While it is tempting to think of simulating the foot by the same methods as those used to stimulate the hand [43, 44], this option must be discarded in favor of approaches that are specific to the foot. In particular, options involving treadmills, robot arms and other heavy equipment will remain confined to applications where the motor aspects dominate over the perceptual aspects of interacting with a ground surface [10, 20, 48, 79].

Fig. 12.2
figure 2

Walking in real environments produces rich, step-dependent vibromechanical information. Shown: vibration spectrogram \(a(t,f)\) and low-frequency normal foot-ground force \(F(t)\) measured at the hard sole of a men’s shoe during one footstep of a walker onto rock gravel, together with the corresponding foot contact states within the gait cycle (author’s measurements). The dark vertical stripes in the spectrogram correspond to discrete impact or dislocation events that are characteristic of dynamic loading of a complex, granular medium

Broadly speaking, then, foot-ground interactions can be said to be commonly accompanied by mechanical vibrations with energy distributed over a broad range of frequencies (see Fig. 12.2). High-frequency vibrations can originate with a few different categories of physical interaction, including impacts, fracture, and sliding friction. The physics involved is relatively easy to characterize in restricted settings, such as those involving homogeneous solids, but becomes more complex to describe when disordered, heterogeneous materials are involved.

3.2 Touch Sensation in the Feet

The sense of touch in the human foot is highly evolved, and is physiologically highly similar to that in the hand, with the same types of tactile receptor populations as are present in the former, including the fast-adapting (FA) type I and II and slow-adapting (SA) type I and II cutaneous mechanoreceptors [50, 100], in addition to proprioceptive receptors including Golgi organs, muscle spindles, and joint capsule receptors in the muscles, tendons, and joints. The sole is sensitive to vibrotactile stimuli over a broad range of frequencies, up to nearly 1000 Hz [109], with FA receptors comprising about 70 % of the cutaneous population. Several differences between tactile sensation in the foot and hand have been found, including an enlargement and more even distribution of receptive fields in the foot, and higher physiological and psychophysical thresholds for vibrotactile stimuli [50, 114], possibly related to biomechanical differences between the skin of the hands and feet [115]. Further comparisons of the vibrotactile sensitivity of the hand and foot were performed by Morioka et al. [68].

Self motion is the key function of walking, and most of the scientific research in this area is related to the biomechanics of human locomotion, and to the systems and processes underlying motor behavior on foot, including the integration of multisensory information. During locomotion, sensory input and muscular responses are coordinated by reflexes in the lower appendages [83, 95, 116], and prior literature has characterized the dependence of muscular responses on both stimulus properties and gait phase. The vibrotactile sense in the foot has been less studied in this regard, presumably because it is not a primary channel for directly acquiring information about forces and displacements that are required for the control of locomotion and balance.

Perceptual abilities of the foot are essential to the sensorimotor loop involved in the control of locomotion, but have been less studied than those of the hand. Prior literature has emphasized perceptual-motor abilities related to the regulation of locomotion and balance on slippery, compliant, or slanted surfaces [23, 29, 41, 51, 63, 66, 69, 70]. The stepping foot is able to discriminate materials distinguished by elasticity [53, 82] or by raised tactile patterns [16, 54], as demonstrated in research aimed at evaluating the utility of these features for aiding visually impaired people in walking or navigating safely and effectively.

Although walking on natural ground surfaces generates rich haptic information [25, 35, 110], little research exists on the perception of such materials during locomotion. Giordano et al. investigated a setting in which walkers were tasked with identifying man-made and natural walking surface materials in different non-visual sensory conditions, while wearing shoes [38]. Better than chance performance was observed in all conditions in which tactile information was unmodified. Performance was worse when tactile information was degraded by a vibrotactile masking signal supplied to the foot sole. Although the latter could have affected haptic information in multiple ways (by perturbing high- and low-frequency cutaneous tactile information and/or information from deeper joint and muscle proprioceptors) subsequent analyses indicated that this information was highly relevant for discriminating walking grounds. Furthermore, the results suggested that similar high frequency information was communicated through both auditory and tactile channels.

3.2.1 Vibrotactile Rendering of Footsteps

Due to the highly interactive nature of the generation of haptic stimuli in response to foot-applied pressure, the display of haptic textures, in the form of high frequency vibrations simulating the feel of stepping onto heterogeneous solid ground materials [107], is a significant challenge to be overcome in the multimodal rendering of walking on virtual ground surfaces. During a step onto quasi-brittle porous natural materials (e.g., sand or gravel), one evokes physical interaction forces that include viscoelastic components, describing the recoverable deformation of the volume of the ground surrounding the contact interface; transient shocks from the impact of foot against the ground; and plastic components from the collapse of brittle structures or granular force chains, resulting in unrecoverable deformations [24, 92]. Combinations of such effects give rise to the high frequency, texture-like vibrations characteristic of the feel of walking on different surfaces [26]. Figure 12.3 presents an example of force and vibration data acquired by the authors from one footstep on a gravel surface.

Fig. 12.3
figure 3

Vibration spectrogram \(a(t,f)\) and normal force \(F(t)\) measured from one footstep onto rock gravel (Authors’ recording). Note the discrete (impulsive) broadband impact events evidenced by vertical lines in the spectrogram

3.2.2 Stepping on Disordered Heterogeneous Materials

Due to the continuous coupling of acoustic and vibromechanical signals with force input in examples such as that described above, there is no straightforward way to convincingly use recorded footsteps for acoustic or vibrotactile rendering, although more flexible granular sound-synthesis methods could be used [6, 18]. For the modeling of simpler interactions, involving impulsive contact with solid materials, recorded transient playback techniques could be used [55].

A simple yet physically-motivated approach that can be used in the haptic synthesis of interaction with complex, compressible surfaces is based on a minimal fracture mechanics model [108]. Similar approaches have proved useful for modeling other types of haptic interaction involving damage [42, 64]. Figure 12.4 illustrates the continuum model and a simple mechanical analog used for synthesis.

Fig. 12.4
figure 4

Normal force texture synthesis. a A fracture mechanics approach is adopted. A visco-elasto-plastic body undergoes shear sliding fracture due to applied force \(F_e\). b A simple mechanical analog for the generation of slip events \(\xi (t)\) in response to \(F_e\). c For vibrotactile display, each slip event is rendered as an impulsive transient using an event-based approach [108]

In the stuck state, the surface has stiffness \(K = k_1 + k_2\), effective mass \(m\) and damping constant \(b\). It undergoes a displacement \(x\) in response to a force \(F\), as governed by:

$$\begin{aligned} F(t) = m \ddot{x} + b \dot{x} + K (x - x_0), \; \; \; x_0 = k_2 \xi (t) / K \end{aligned}$$
(12.3)

In the stuck state, virtual surface admittance \(Y(s) = \dot{x}(s) / F(s)\) is given, in the Laplace-transformed (\(s\)) domain, by:

$$\begin{aligned} Y(s) = s (m s^2 + b s + K)^{-1}, \; \; \; K = k_1 k_2 \xi / (k_1 + k_2) \end{aligned}$$
(12.4)

where \(\xi (t)\) represents the net plastic displacement up to time \(t\). A Mohr-Coulomb yield criterion is applied to determine slip onset: when the force on the plastic unit exceeds a threshold value (which may be constant or noise-dependent), a slip event generates an incremental displacement \(\varDelta \xi (t)\), along with an energy loss of \(\varDelta W\) representing the inelastic work of fracture growth.

Slip displacements are rendered as discrete transient signals, using an event-based approach [55]. High frequency components of such transient mechanical events are known to depend on the materials and forces of interaction, and we model some of these dependencies when synthesizing the transients [110]. An example of normal force texture resulting from a footstep load during walking is shown in Fig. 12.5.

Fig. 12.5
figure 5

Example footstep normal force and synthesized waveform using the simple normal force texture algorithm described in the text. The respective signals were captured through force and acceleration sensors integrated in the vibrotactile display device [108]

3.3 Multimodal Displays

Several issues arise in the rendering of multimodal walking interactions, including combinations of visual, auditory and haptic rendering to enable truly multimodal experiences. A model of the global rendering loop for interactive multimodal experiences is summarized in Fig. 12.6. Walking over virtual grounds requires the use of specific hardware devices that can coherently present visual, vibrotactile and acoustic signals. Several devices dedicated to multimodal rendering of walking over virtual grounds are described below, in Sect. 12.3.4, while examples of multimodal scenarios are discussed in Sect. 12.3.5. Multimodal rendering is often complicated due to hardware and software constraints. In some cases, crossmodal perceptual effects can be exploited to allow one modality (for example, vision) to render sensations that would normally be presented via another modality that may not be feasible to reproduce (e.g., via haptic force feedback). Some of these approaches are described at the end of the Sect. 12.3.5.

Fig. 12.6
figure 6

Global loop for multimodal rendering approaches of walking over virtual grounds. The user (left) is interacting with the virtual world (right) through specific hardware devices. Rendering techniques allows multimodal feedback by taking into account the user’s input and the virtual environment. The feedback are provided through visual, acoustic and haptic interfaces

3.4 Display Configurations

In this paragraph, we discuss two types of devices capable of the generation of multimodal cues for the interaction with virtual grounds, and corresponding to two different approaches: actuated floors, an array of sensors an actuators laid out on a given space transmitting the different cues to the user stepping on them; and actuated shoes, mobile devices worn by the user with sensors and actuators embedded in the shoes.

The two approaches stimulate the foot with the simulated high-frequency mechanical feedback, viz. 30–800 Hz, from foot-ground interactions. As it turns out, a wide variety of sensations can be produced this way, including those that would normally be ascribed to kinesthesia [109]. Auditory feedback is also generated by the resulting prototypes, as a by-product of the vibrotactile actuators aboard them, or via associated loudspeaker arrays, and visual feedback may be supplied via top-down video projection systems. One approach is to tile a floor and actuate each tile according to the movement and interaction of the walker or the user. Another approach is to provide the walker with shoes augmented with appropriate transducers. In addition to the devices described in other sections of this chapter, the vibrotactile augmentation of touch surfaces has been widely investigated for HCI applications [33, 71, 80], although design issues affecting their perceptual transparency have often been neglected. As case studies, two approaches are described below, starting with floor-based stimulator and continuing with a shoe-based stimulator.

3.4.1 Actuated Floors

Floor-based systems for providing multimodal feedback to the foot offer the advantage of easy accessibility, since users are not required to wear any special footwear or equipment in order to use them. Furthermore, they can be readily designed with an extensible architecture, which allows them to be networked and powered easily, as they can be integrated within existing room infrastructures. However, on the negative side, such systems can be said to be somewhat invasive, since they require modifications to the existing floor infrastructure of a building, thus requiring a comparatively permanent installation space. The workspace available to users—that is, the amount of real space within which they can interact—depends on the size of the actuated floor, with a larger workspace inevitably entailing higher costs and complexity.

The vibrotactile floor tile interface developed by Visell et al. [106, 107, 110, 111] represents the first systematically designed device of its type for haptic human–computer interaction. Passive floor-based vibrotactile actuation has been used to present low frequency information in audiovisual display applications, for special effects (e.g., vehicle rumble), in immersive cinema or VR settings [99]. The fidelity requirements that must be met by an interactive haptic display are, however, higher, since users are able to actively sample its response to actions of the feet. The device of Visell et al. is based on a high fidelity vibrotactile interface integrated in a rigid surface, with visual feedback from top-down video projection and a spatialized, eight-louspeaker auditory display. The main application for which it was envisioned is the vibrotactile display of virtual ground surface material properties for immersive environments. The device consists of an actuated composite plate mounted on an elastic suspension, with integrated force sensors. The structural dynamics of the device was designed to enable it to accurately reproduce vibrations felt during stepping on virtual ground materials over a wide range of frequencies. Measurements demonstrated that it is capable of reproducing forces of more than 40 N across a usable frequency band from 50 to 750 Hz. In a broader sense, potential applications of such a device include the simulation of ground textures for virtual and augmented reality simulation [112] or telepresence (e.g., for remote planetary simulation), the rendering of abstract effects or other ecological cues for rehabilitation, the presentation of tactile feedback to accompany the operation of virtual foot controls, control surfaces, or other interfaces [111], and to study human perception. In light of the latter, an effort was undertaken to ensure a high fidelity response that would avoid the reproduction of vibromechanical stimuli.

Fig. 12.7
figure 7

Vibrotactile floor interface hardware for a single tile unit. Left (View) showing main components. Right Side view with top dimension

The interface of the device (Fig. 12.7) consists of a rigid plate that supplies vibrations in response to forces exerted by a user’s foot, via the shoe. The total normal force applied to the plate by a user is measured. It can be assumed to consist of two components: isolated transients with high frequency content, generated by foot impacts with the plate, and low-frequency forces generated by active human motions, limited in bandwidth to no more than 10 Hz [9, 110]. A haptic simulation provides feedback approximating the vibration response felt during interaction with a virtual object. The rendering algorithms are of admittance type, computing displacements (or their time derivatives) in response to forces applied to the virtual object. Force sensing is performed via four load cell force transducers (Measurement Systems model FX19) located below the vibration mount located under each corner of the plate. Although the cost for outfitting a single-plate device with these sensors is not prohibitive, potential applications of this device to interaction across distributed floor surface areas may involve two dimensional \(m\times n\) arrays of tiles, requiring a number \(N = 4 m n\) of sensors. As a result, in a second configuration, four low-cost resistive force sensors are used in place of load cells. After conditioning, the response of these sensors to an applied force is nonlinear, and varies up to 25 % from part to part (according to manufacturer ratings). A linearization and calibration of force sensing is performed [112] ensuring a response accurate to within a few percent. Analog data from the force sensors is conditioned, amplified, and digitized, and used as an input to drive a physically-based simulation of a ground surface such as sand, snow, or ice. Vibromechanical feedback is provided by a single Lorentz force type inertial motor (Clark Synthesis model TST429) with a usable bandwidth of about 25 Hz to 20 kHz, which is driven using standard digital and analog audio hardware. The Fig. 12.8 provides an overview of the system.

Fig. 12.8
figure 8

Photo of an actuated tile with large mens’ shoe, showing representative size. The model shown is based on the low-cost force sensing resistor option. The cable in the foreground interfaces the sensors with the data acquisition unit

Fig. 12.9
figure 9

Photo of the actuated shoes [77] with loudspeakers mounted on the top. The cable in the foreground interfaces the sensors in the shoes with the data acquisition unit

3.4.2 Actuated Shoes

Actuated shoes provide a mobile solution to foot-floor interaction setups, not requiring the use of large floors laid out in specific spaces. However, the realization of a mobile device delivering the same cues as a static actuated floor poses serious technical questions. While the size of the device needs to remain small enough not to impair the natural walking gait of the user, the intensity of the signals it delivers must allow the rendering of perceivable interaction cues. Power supply as well as computation units might be too large and cumbersome to be located on the wearable device itself, requiring their offload to other parts of the body.

Papetti et al. [77] addressed the design of multimodal actuated shoes through a first prototype delivering vibrotactile and acoustic signals. This device is illustrated in Fig. 12.9. Force data acquisition is made through two force sensing resistors (Interlink FSR model 402) located under the insole, one at the toe and one at the heel position. Vibrotactile feedback is produced by two vibrotactile transducers embedded in the front and the rear of the shoe sole respectively [16] (Haptuator, Tactile Labs Inc., Deux-Montagnes, Qc, Canada). Two cavities were made in the soles to accommodate these broadband vibrotactile actuators. These electromagnetic recoil-type actuators have an operational, linear bandwidth of 50–500 Hz and can provide up to 3G of acceleration when connected to light loads. They were bonded in place to ensure good transmission of the vibrations inside the soles. When activated, vibrations propagated well in the light, stiff foam. In addition to vibrations, each shoe emits sounds from one Goobay Soundball Mobile battery loudspeaker mounted on the top buckle. These devices are provided with on-board micro-amplifiers, hence they can be connected directly to the audio card. As any small, low-power loudspeaker device, they exhibit unavoidable performance limits both in the emitted sound pressure level (2.4 W RMS) and low frequency cutoff (about 200 Hz).

An evolution of such shoes concept has made use of vibrotactile exciters, such as those capable of making an entire desk sound and vibrate like a musical soundboard once they are firmly attached to it. In the case of the actuated shoes, two Dayton Audio DAEX32 exciters were secured inside the sole of each sandal, respectively under the toes and the heel: together, they provided a more coherent audio-tactile feedback beneath the respective areas of the feet, furthermore eliciting some low resonance energy from the floor that was otherwise impossible to obtain using small speakers such as those mentioned previously. Moreover, by employing lightweight power amplification (in this case a pair of Class T battery-powered digital stereo amplifiers) and a low latency connection to and from the host, respectively to transmit force data and to receive the audiotactile signals, a good compromise between realism of the feedback and wearability of the prototype could be achieved at least for some materials such as frozen ponds, muddy soil, aggregate grounds and, if supported by headphones providing the necessary auditory spaciousness to a walking listener, also metal grates [76].

3.5 Interactive Scenarios

3.5.1 Description

We will now briefly present examples of multimodal rendering of ground materials. The examples correspond to two categories of ground materials that exhibit strong high-frequency components: granular materials and fluids. Footsteps onto granular (aggregate) ground materials, such as sand, snow, or ice fragments belie a common temporal process originating with the transition toward a minimum-energy configuration of an ensemble of microscopic systems, via a sequence of transient events. The latter are characterized by energies and transition times that depend on the characteristics of the system and the amount of power it absorbs while changing configuration. They dynamically capture macroscopic information about the resulting composite system through time. On the other hand, liquid-covered ground surfaces, such as water puddles and shallow pools, have an important kinesthetic component due to pressure and viscosity forces within the fluid, and may, at first, seem to lack high frequency mechanical responses. However, important high frequency components exist, as generated by bubble and air cavity resonances, which are responsible for the characteristic sound of moving fluids.

The two examples presented in this section utilize the fact that vibrotactile and acoustic phenomena share a common physical source by designing the vibrotactile models based on existing knowledge of fluid sound rendering. Both types of ground materials exhibit very interesting high frequency features adequate for their restitution through an actuated vibrotactile floor: as opposed to rigid surfaces, the overall signal is not reduced to transients at the moment of impact, but can produce a signal during the entire foot-floor contact duration. Although mainly focused on the vibrotactile modality, the approaches described here are multimodal. The synthesis models are also capable of generating acoustic feedback, due to common generation mechanisms and physical sources. The visual modality is an absolute requirement on its own, since interacting with virtual environments without visual feedback is of little interest, except in very specific cases.

3.5.2 Frozen Pond and Snow Field

In a multimodal scenario, Law et al. [57] designed a virtual frozen pond demonstration in which users may walk on the frozen surface, producing patterns of surface cracks that are rendered and displayed via audio, visual and vibrotactile channels. Audio and vibrotactile feedback accompany the fracture of the virtual ice sheet underfoot. The two are derived from a simplified mechanical model analogous to that used for rendering basic footstep sensations (see Sect. 12.3.2.2).

Based on the floor tile interface described in Sect. 12.3.4.1, the authors designed a virtual frozen pond demonstration that users may walk on, producing patterns of surface cracks that are rendered and displayed via audio, visual, and vibrotactile channels. The advantage of this scenario is that plausibly realistic visual feedback could be rendered without detailed knowledge of foot-floor contact conditions, which would require a more complex sensing configuration.

Vibrotactile and acoustic feedback are generated through the simplified fracture model described in Sect. 12.3. The visual rendering of crack surfaces on the ice is generated with sequences of line primitives on the ice texture. Cracks originate at seed locations determined by foot-floor contact, as illustrated in Fig. 12.10. In another application [57], using the same interface, the authors simulated a snow field, as also shown in Fig. 12.10. Users were enabled to leave footsteps onto virtual snow, with acoustic and vibrotactile similar to the feeling of stepping onto real snow.

Fig. 12.10
figure 10

Example of a multimodal foot-floor interaction. (Left) The frozen pond scenario generates vibrotactile, acoustic and visual feedback. (Right) The snow field is modified according to the user steps, providing multimodal feedback [57]

3.5.3 Walking on Fluids

Cirio et al. [13] proposed a physically-based vibrotactile fluid rendering model for solid-fluid interaction, allowing “splashing on the beach” scenarios. Since fluid sound is generated mainly through bubble and air cavity resonance, they developed a physically-based simulator generating real-time bubble creation and solid-fluid interaction and synthesizing vibrotactile feedback from interaction and simulation events. The vibrotactile model proposed by Cirio et al. is divided in three components, following the physical processes that generate sound during solid-fluid interaction [12]: (1) an initial high frequency impact, (2) small bubble harmonics and (2) a main cavity oscillation. A real-time fluid simulator based on Smoothed-Particle Hydrodynamics and enhanced with bubble synthesis gathers the physical simulation process.

Based on the fluid vibrotactile model [12] and using the floor tile array described in Sect. 12.3.4, Cirio et al. [13] designed two multimodal scenarios generating haptic, acoustic and visual feedback. An active shallow pool scenario allowed the user to walk about a virtual pool with 20 cm of water, splashing water as they stepped on the pool. A passive beach front scenario allowed users to stand still and feel waves washing up at their feet on a sandy beach. The floor rendered the vibrotactile feedback to the user’s feet through the appropriate vibrotactile transducers. Acoustic feedback was also be provided through speakers or headphones. The user’s feet were modeled as parallelepiped rigid bodies and tracked through the floor pressure sensors. Visual feedback was generated by a GPU meshless screen-based technique optimized for high frequency rendering [12] appropriate to the underlying particle based simulation. The Fig. 12.11 shows the two scenarios.

Fig. 12.11
figure 11

Interacting with water with multimodal feedback. (Left) a shallow water pool. (Right) a wave washing up on a beach [13]

3.5.4 Augmenting Footsteps with Simulated Multimodal Feedback

The enhancement of walking sensations over virtual grounds is not necessarily limited to immersive virtual reality setups. Some applications should be able to run in desktop mode, i.e. when the user is seated and is using a basic computer. This includes training applications that need to be massively deployed, or video games. To give the sensation of walking, video games use auditory feedback intensively and footstep sounds to simulate steps. Visual information can also be used to enhance the sensation of walking.

In this desktop VR context, Terziman et al. [98] introduced a set of cues to augment virtual footsteps with artificial (exaggerated) multimodal feedback, called “King-Kong Effects”. These sensory cues are inspired by special effects in movies in which the incoming of a gigantic creature is suggested by adding visual vibrations/pulses to the camera at each of its steps. Visual, tactile and acoustic signals artificially enhance each footstep detected (or simulated) during the virtual walk of the user sitting in front of the computer. The system leverages the tiles presented in Sect. 12.3.4.2 located under the user’s feet, for vibrotactile rendering of foot-floor impact, in addition to the visual camera vibrations and acoustic rendering of footsteps. The authors studied the use of different kinds of feedback cues based on vertical or lateral oscillations, physical or metaphorical patterns, and one or two peaks for heal-toe contacts simulation. They showed that for a seated user, the sensation of walking is increased when the different modalities are taken together, and strongly recommend the use of such a multimodal simulation for an improved user immersion.

3.5.5 Pseudo-Haptic Rendering of Virtual Grounds

Pseudo-haptic feedback leverages the crossmodal integration of visual and kinesthetic cues giving rise to an illusion of force feedback [58]. Pseudo-haptic feedback was initially obtained by combining the use of a passive input device with visual feedback, simulating haptic properties such as stiffness or friction [59]. For example, to simulate the friction occurring when inserting an object inside a narrow passage, researchers proposed to artificially reduce the speed of the manipulated object during the insertion. Assuming that the object is manipulated with an isometric input device, the user will have to increase his pressure on the device to make the object advance inside the passage. The coupling between the slowing down of the object on the screen and the increasing reaction force coming from the device gives the user the illusion of a force feedback as if a friction force was generated.

Marchal et al. [65] brought the concept of pseudo-hatic feedback to walking interaction in immersive VR, inspired by the use of virtual camera motions [60, 97] to improve the sensation of walking in a virtual environment. The modification of the subjective visual feedback of the user, combined to the real kinesthetic cues of the user walking on the real ground surface, gives rise to the illusion of walking on uneven terrain. The authors base their study on the modification of the user viewpoint by changing height, speed and orientation of the virtual subjective camera according to the slope of the virtual ground. While the user walks on the flat real ground, these camera effects are injected in the virtual environment and rendered through a head-mounted display. Experimental results showed that these visual effects are very efficient for the simulation of two canonical shapes: bumps and holes located on the ground. Interestingly, a strong “orientation-height illusion” is found, as changes in pitch viewing orientation produce perception of height changes (although camera’s height remains strictly the same in this case).

Other pseudo-haptic effects could be envisioned to improve the sensation of walking over virtual grounds. One promising direction would be the simulation of pseudo-haptic materials with the King-Kong effects: the current simple visual vibration patterns could give way to physically based patterns representing the impact on different materials (wood, rubber, metal), as demonstrated in previous work in a hand-based interaction context [40]. Extension of the pseudo-haptic walking has also been performed for auditory rendering by Turchet et al. [101] for simulating bumps and holes on different ground surfaces.

4 Conclusion

The present chapter proposed to review interactive techniques related to multimodal rendering of walking over virtual ground surfaces. We successively detailed existing auditory, vibrotactile and then multimodal rendering approaches. As for today, high-end VR setups and devices dedicated to multi-sensory walking in virtual environments could succeed in providing realistic acoustic and haptic feedback corresponding to complex scenarios. It becomes indeed possible to walk over snow, beaches, or dead leaves, and hear and feel the corresponding walking sensations using sonic shoes or haptic floors. Besides, some cross-modal effects enable to fool the senses and perceive changing ground properties.

Through the description of different rendering approaches, the chapter provided some concrete examples of how sensations accompanying walking on natural ground surfaces could be rich, multimodal and highly evocative of the settings in which they occur. We believe including multimodal cues when exploring virtual environments could bring major benefits in various applications, such as for medical rehabilitation for gait and postural exercises, training simulations, and entertainment, for an improved immersion within rich virtual environments and compelling interaction with realistic virtual grounds.