Virtual Reality

, Volume 10, Issue 1, pp 4–10

Affordances in the design of enactive systems


    • School of KinesiologyUniversity of Minnesota
  • Benoît G. Bardy
    • Faculty of Sport and Movement SciencesUniversité Montpellier I
  • Bruno Mantel
    • Faculty of Sport and Movement SciencesUniversité Montpellier I
Original Article

DOI: 10.1007/s10055-006-0025-7

Cite this article as:
Stoffregen, T.A., Bardy, B.G. & Mantel, B. Virtual Reality (2006) 10: 4. doi:10.1007/s10055-006-0025-7


Enactive interfaces must incorporate intuitive activity that characterizes naturalistic perception. However, the manner in which information is presented is not more important than the contents: what information is presented. In this contribution, we address the contents of perception. We argue that people perceive affordances, that is, the possible actions that are available in any given situation. We further argue that enactive interfaces should be designed to optimize presentation of information about the possible actions that are available to a person using the enactive interface. The design of enactive interfaces might be guided by an extension of the theory of ecological interface design (Vicente in Hum Factors 44:62–78, 2002) to include multimodal information that is accessed through fast, intuitive exploratory movement. We review two empirical studies that illustrate our arguments. Careful analysis of affordances, together with our increasing understanding of the enactive perception of affordances, should influence the design of enactive interfaces.


AffordancesPerception-actionMultimodal perceptionEnactive perception

1 Introduction

In this paper, we focus on the content of perception. We make a claim about what it is that people perceive, and we use this claim as the basis for an argument about how enactive interfaces may best be designed.

2 How versus what

Historically, interface design has been strongly influenced by developments in display technology. In effect, technology has driven the design process. One example is the design of multimodal enactive interfaces, which depend upon the existence of technologies that permit closed-loop interaction with hardware and software that can stimulate multiple perceptual systems. One negative consequence of the focus on display technology has been a relative lack of interest in the content of interfaces; what they display, rather than how they display it.

It is possible to argue that what is displayed and how it is displayed cannot be separated. Ultimately, we believe that such an argument is correct. In this context, we believe that the appropriate contents of displays (what information is presented) will dictate the appropriate technology and design of displays (how information is presented).

3 Affordances

We begin by asking what it is that is perceived. When we look out the window, or stroll through the park, or drive a sports car, or play tennis, what are the contents of our perception?

Classical theories have assumed that the contents of perception are reductionist elements, such as lines, colors, pitches, pressures, and so on. These contents are meaningless, and classical theories have assumed that meaning is assigned through cognitive processing. In many classical theories, the world itself is meaningless, and meaning is a de novo creation of mentation.

An alternative view of the contents of perception has been developed within the Ecological Approach to Perception and Action (Gibson 1986; Sanders 1997; Stoffregen 2003b; Turvey 1992). In the Ecological Approach, meaning is not a construct of the mind. Rather, meaning exists outside the mind, as a real part of the animal–environment interaction. For example, whether an object is within reach depends upon the position of the object relative to the person, and upon the length of the arm. If the object is within arm’s length, then it can be touched. Object position is an objective fact (i.e., not a construct); the same is true of arm length. Similarly, the higher-order relation of object position and arm length is itself an objective fact. If the person can perceive this higher-order relation, then they will have veridical information about whether the object can be reached. This information is meaningful in the sense that it has consequences for behavior: Information about “reachability” can be used to organize behavior in the successful furtherance of behavioral goals.

Reachability is an example of an affordance. Affordances are potential behaviors that are available to a given animal in a given situation or environment. The availability of behaviors is determined by relations between properties of the animal (e.g., arm length) and properties of the environment (e.g., object position).

As another example, consider a ball that is in flight, and a person who wants to catch the ball before it hits the ground (Oudejans et al. 1996). The ability to catch the ball will be influenced by the amount of time that the ball will be in the air and the place it will fall (both of these are properties of the environment), and by the speed with which the person can run to the landing place (a property of the animal). The affordance for “catchability” emerges from relations between these three properties. If running speed is sufficient to get to the landing place in the time remaining before the ball hits the ground, then catching is possible, that is, catching is afforded. The ball’s flight time and trajectory and the person’s running abilities are all non-mental properties of reality. Thus, it might be possible, in principle, to perceive the relations between these parameters that determine whether the ball can be caught. As in the previous example, information about this ability is meaningful.

Affordances are not properties of the animal as such, nor are they properties of the environment as such. Affordances are properties of relations between the animal and its environment. Accordingly, affordances are emergent properties of the animal–environment system (Stoffregen 2003b).

Within the Ecological Approach to Perception and Action, it is generally agreed that the contents of perception include affordances (e.g., Chemero 2003; Gibson 1986; Stoffregen 2003b; Turvey 1992). Some scholars have gone further, explicitly claiming that affordances are the sole contents of perception (Sanders 1997; Stoffregen 2003a). If this is true, then an affordance-based analysis of display content will be imperative.

4 Affordance perception in daily life

Perception of affordances would be highly adaptive, given their strong and direct relation to the success of behavior. Because they are aspects of reality, affordances might be perceived, rather than being constructed in the head (for discussions about how affordances might be perceived, see Stoffregen 2003b; Stoffregen et al. 1999, 2005). Research has shown that people often exhibit accurate knowledge of their action capabilities and that in many cases this knowledge appears to be perceptual, that is, based on immediate perceptual information, rather than being based on secondary, cognitive operations. This is true for healthy adults (e.g., Warren 1984), for the elderly (Konczak et al. 1992), and even for infants (Yonas and Hartman 1993). We are also capable of perceiving affordances for other people, that is, we can look at a person and detect what they can do (Stoffregen et al. 1999). The existing research suggests that knowledge of action capabilities is acquired simultaneous with the action capabilities, themselves (Yonas and Hartman 1993).

Enaction is a term originated by Varela, who argued that perception exists primarily for the guidance of action, and that cognition emerges from recurrent patterns of sensorimotor stimulation that enable action to be perceptually guided (Varela et al. 1991). We endorse this concept, with two important qualifications. The first concerns the contents of perception: As discussed above, in the Ecological Approach to Perception and Action the contents of perception are affordances, rather than properties of the animal or the environment, as such (Chemero 2003; Gibson 1986; Sanders 1997; Stoffregen 2003a, b). Second, in the Ecological Approach, perceptual guidance of action and the resultant emergence of cognition are possible because properties of the animal–environment system are specified in potential sensory stimulation, where specification consists of lawful, 1:1 relations between static and dynamic properties of the animal–environment system and patterns in ambient energy arrays (Gibson 1986; Stoffregen and Bardy 2001). The existence of recurrent sensorimotor patterns is a consequence of (and, therefore, is logically secondary to) these lawful relations. For us, enaction is a term that describes the intimate, fundamental linkage that exists between how we move and what we perceive (Gibson 1986; Varela et al. 1991). Enactive perception is an instance of embodied cognition (Thelen and Smith 1994; Varela et al. 1991).

Enaction plays an important role in the perception of affordances. In fact, the documented role of enaction provides evidence that affordances are perceived rather than derived through cognitive operations. Perception of one’s ability to catch a falling object by running to its landing point is improved when observers are permitted to begin running before making their judgments, as opposed to watching the flight of the ball while standing still (Oudejans et al. 1996). Research by Mark et al. (1990) indicates that movement is central to perception even for action capabilities that, presumably, are well-practiced and relatively stable. Mark showed that knowledge of ones ability to sit on chairs depends upon the ability to engage in subtle exploratory actions while viewing the chair. Stoffregen et al. (2005) quantified this effect, revealing that postural motion when subjects are instructed to perceive affordances differs from postural motion when they are not. Bingham et al. (1989) documented exploratory patterns of hand and arm movement that are used in assessing one’s ability to throw a ball to a specified distance.

Research relating movement to the perception of affordances is in its infancy, but already it seems clear that in daily life, in both novel and well-practiced situations, the perception of affordances is enactive.

5 Human–machine affordances

Human–machine systems are a subset of animal–environment systems. Human–machine systems have affordances, and these need to be perceived and acted upon by users. For simulation, virtual reality, and teleoperation systems, these affordances fall into a minimum of two classes. Each system has both types of affordances, and users of a system need to perceive and act upon both types of affordances.

6 Affordances of simulated systems

Affordances of a human–computer interface system can replicate those in the simulated system (e.g., a driving simulator can replicate some of the affordances of driving a car). A good example is research on the perception of reaching abilities in a simulated environment (Mantel et al. 2005). Here the intent is to mimic, in the virtual environment, the multisensory patterns of stimulation that naturally occur in real reaching environments. Human–computer interface systems can also provide action capabilities that are not available (for physical or social reasons) in the simulated system (e.g., touching sculpture, as in a virtual museum).

In both of the above cases, affordances that are to be presented to users are known in advance and are, to some extent, independent of the simulation system that is used. Affordances of the simulated system are the main point or purpose of simulation and virtual environment systems.

7 Affordances of simulation systems

These affordances relate to the interface system as a thing in itself, separate from the simulated system. Affordances of simulation systems are concerned with how to operate the human–machine system, as such. The tasks that can be accomplished using a human–machine system often differ from the method of operating the system. For example, using a computer mouse to click on a spot on the screen is one way to send an email message, but the activity of grasping, guiding, and clicking the mouse differs qualitatively from the actual sending of the message. Similarly, pressing one’s foot onto a pedal on the floor can be used to stop an automobile, but the action of the foot on the pedal differs qualitatively from the generation of accelerative force that actually stops the car.

Thus, in addition to providing information about what tasks can be executed with a system (e.g., sending emails, stopping a car), interfaces need to provide information about how the human–machine system can be operated to achieve these tasks (e.g., clicking a mouse, pressing on the brake pedal).

8 Ecological interface design

Enactive interfaces should make it possible for users to obtain information about the affordances of the interface, and the affordances of the overall system. One well-established theory of interface design is based on the idea that interfaces can and should be designed to provide information about affordances available to users. This theory is Ecological Interface Design, or EID (Vicente 2002).

EID has been developed (and implemented) primarily in the context of the monitoring and control of physical systems, such as nuclear power plants and aircraft. To date, there have been few attempts to use the principals of EID in developing interfaces for simulations and virtual environments. There also have been few attempts to use EID in the design of enactive interfaces. Finally, the application of EID to multimodal displays is in its infancy. Further theoretical and applied development is urgently needed in each of these areas.

EID has been applied to systems with long timescales (e.g., the control of industrial processes). Research has shown that people can perceive affordances from existing EIDs, as reviewed by Vicente (2002). This leaves open the question of whether people can perceive affordances from displays that involve processes with short timescales, and intuitive movement. This question is addressed in the two following examples.

9 Perceiving affordances from fast, intuitive movement

Affordances are opportunities for action. In ordinary life, affordances are perceived, and this perception is enactive. Human–machine systems often attempt to simulate users’ interactions with the environment. Beyond this, human–machine systems have affordances of their own. These considerations suggest that affordances should be of central interest to designers of human–machine systems in general, and of enactive interfaces, in particular.

As suggested above, human–machine systems have many of the affordances of non-technical systems but also have affordances that are peculiar to technical systems. One example of direct relevance to interface design is a study by Stoffregen et al. (1999), who showed that affordances can be perceived, with great accuracy, from dynamic video images. They created kinematic displays, in which reflective markers were attached to major body joints of tall and short actors who were filmed near an experimental chair that was also outfitted with reflectors (Fig. 1). The camera aperture was adjusted so that only the reflective markers were visible in an otherwise black environment (Fig. 2). In different experiments, the actors were filmed walking in place next to the chair, or walking back and forth along the camera’s line of sight, with the chair at the mid-point of the walk. As the actor moved, the seat pan of the chair was slowly raised or lowered. Participants viewing the videotapes were asked to judge the maximum or preferred sitting height of each actor (tall and short actors were viewed separately) by stopping the videotape when the seat pan was at what they judged to be the correct height. In these experiments, judgments of maximum and preferred sitting height were accurate. One might suppose that accurate judgments could be made simply, based on the position of a reflective marker on the hip relative to the placement of a reflective marker on the seatpan. Such a strategy might have worked for judgments of maximum sitting height, in which there was a direct relation between the reflective marker on the hip and the marker on the seatpan. However, it is unlikely that subjects simply set the seatpan at the height of the hip marker. For most healthy adults, maximum sitting height is about 90% of leg length. Thus, the simple strategy described above would tend to lead to consistent overestimations of maximum sitting height. Such overestimations did not occur. Moreover, the plausibility of a strategy based on the vertical location of the hip marker is even less credible in the context of judgments of preferred sitting height. Preferred sitting height [as defined by Stoffregen et al. (1999)] was approximately 63% of leg length, yet judgments of preferred sitting height were accurate, and were not biased in the direction of the reflective marker on the hip (or another one on the knee).
Fig. 1

The experimental set-up used by Stoffregen et al. (1999) in creating kinematic displays. The figure shows the experimental chair and the relative position of the chair and the human actor. Reproduced by permission from Stoffregen et al. (1999), JEP:HP&P
Fig. 2

Sample frame from the videotape stimulus used by Stoffregen et al. (1999). Reflective dots attached to the experimental chair apparatus appear on the left. On the right appear reflective dots attached to the human actor

Motions of the actor along the line of sight produced dramatic changes in the scale of the pattern of dots corresponding to the actor, relative to the pattern of dots corresponding to the experimental chair. At his or her closest approach to the camera, the dots corresponding to the actor filled the vertical height of the monitor screen, while at the actor’s greatest distance from the camera the dots extended over only 55% of the vertical height of the screen. Judgments of both maximum and preferred sitting height were accurate (separately for tall and short actors) despite these dramatic changes in on-screen dimensions. This finding indicates that judgments were not based on any simple scalar relation between the angular or on-screen size of the dot patterns related to the actor and chair.

Of critical importance, Stoffregen et al. found that this robust perception occurred when the displays included information about relations between the person and the environment (that is, when the actor and chair occupied the same physical space), but not when these relations were absent from the displays. In their Experiment 6, the videotape showed only the actor. The tape was shown on a monitor that was next to the actual experimental chair, which was raised and lowered as in the other experiments (Fig. 3). In this experiment, dynamical properties of the chair were apparent in the physical device that was present, and dynamic properties of the actor were preserved in the videotape in the same way as in previous experiments. Scalar relations between the actor and chair were changed (the angular size of the dot pattern corresponding to the actor was always smaller than the pattern corresponding to the chair), but this difference was simple and constant. Despite the simplicity and constancy of the scalar relation between the actor and the chair, judgments of maximum sitting height were grossly inaccurate. Being able to see the person, as such, and the environment, as such, was not sufficient. Being able to see higher-order relationships between the person and the environment (that is, being able to see the person–environment system, as such) was sufficient for accurate perception of affordances.
Fig. 3

Experimental set-up used by Stoffregen et al. (1999; Experiment 6). On the left, the experimental chair apparatus. On the right, a monitor showing a point-light display of the human actor. Participants were asked to adjust the height of the seatpan on the (actual) chair to the maximum sitting height of the actor in the kinematic display. Reproduced by permission from Stoffregen et al. (1999), JEP:HP&P

10 Enactive EID

The displays used by Stoffregen et al. (1999) were uni-modal (visual only), and they were open loop, that is, they were not modulated by actions of the viewer. The movements in the displays were intuitive (biological motion), but the displays were not enactive because they did not respond to intuitive movements of the viewer. There is, thus, an essential step remaining in the effort to demonstrate that principles of EID can be applied to enactive displays. That step has been taken by Mantel et al. (2005).

We studied the perception of an affordance in the context of multimodal perceptual stimulation in a virtual environment. Seated participants viewed virtual objects and were asked to judge whether each object was within reach. The central premise of our study was that enactive, multimodal displays provide information about affordances that is revealed through intuitive exploratory movement. In our analysis, the multimodal nature of the displays was essential, rather than optional. This was because information about reachability existed only in the global array. The global array, introduced by Stoffregen and Bardy (2001) is the set of patterns that extend across multiple forms of ambient energy. These patterns are lawfully related to properties of the animal–environment interaction, such as affordances. The global array is superordinate to and qualitatively different from patterns that exist in individual forms of ambient energy, such as the optic array.

In the case studied by Mantel et al., information about reachability existed in a higher-order relation extending across optics and gravitoinertial force. The global array parameter that specifies egocentric object distance exists only when there is inertial displacement of the head within an illuminated environment; thus, the opportunity to detect this information is available only in the context of an enactive perceiver. Participants sat in front of a simple virtual environment that was presented on a projection screen. The visual display depicted an object (a pink, filled circle) on a uniform black background. The participant’s task was to look at the virtual object and state (yes or no) whether it was within reach. The simulated distance of the objects varied between 90 and 110% of each participant’s actual reaching distance (measured prior to the judgment session). Participants were permitted to look at the simulated object for as long as they wished, prior to making their judgments.

Of several conditions in the study (Mantel et al. 2005), three are relevant here. In the Vision + Movement condition, participants were permitted free movement of the head and torso as they looked at the virtual environment (Fig. 4a). Head motion was tracked and used to update the optical display online. Thus, head movement yielded changes in both gravitoinertial and optical stimulation, just as would occur outside the laboratory. In the Stationary Vision condition changes in gravitoinertial force were eliminated by minimizing head movements (Fig. 4b). In this condition, participants were asked to sit as still as possible, specifically avoiding head movements. We collected data on head motion, but these data were used solely for later analyses. The Stationary Vision + Movement Playback condition was a yoked control (Fig. 4c). Participants were again asked to sit still, and the optical display was modulated based on a recording of their own head movements from the Vision + Movement condition. Optical stimulation was essentially identical in the Vision + Movement and Stationary Vision + Movement Playback conditions, but information about the egocentric distance of the virtual object was available only in the Vision + Movement condition.
Fig. 4

Experimental set-up used by Mantel et al. (2005). a Vision + Movement condition. b Stationary Vision condition. c Stationary Vision + Movement Playback condition. See text for details

The results revealed that judgments were essentially accurate in the Vision + Movement condition: The switch from “yes” to “no” answers occurred at approximately 105% of arm length (that is, targets within a simulated egocentric distance of 105% of arm length were judged to be reachable, while targets beyond this range were judged to be unreachable). In the Stationary Vision condition, judgments were essentially random (i.e., the yes/no judgments were not influenced by target distance), confirming the widely replicated finding that stationary visual displays do not contain scalar information. The critical result was in the Stationary Vision + Movement Playback condition. Here, judgments were also essentially random, despite the fact that visual motion in this condition was the same as in the Vision + Movement condition. Mantel et al. concluded that both static and dynamic optical displays were inadequate for perception of the reachability of the virtual objects. Perception of reachability was accurate only when participants had access to relations between optics and gravitoinertial stimulation, that is, to the Global Array parameter that was specific to the egocentric distance of the virtual objects. Here, we note that the relevant Global Array parameter was made available through participants’ head and body movements. These active, intuitive movements created simultaneous, coordinated changes in stimulation of the visual, vestibular, and somatosensory systems, that is, intuitive movements brought into existence the information that was required. Accordingly, the displays were enactive (due to the role of intuitive movement over short timescales), and multimodal. The displays were also Ecological, because they provided information about what the user could do within the human–machine system.

11 Potential applications

Enactive EID has many potential applications. Here, we briefly discuss two that suggest the range of applications in which enactive EID might be useful. First, enactive EID might be useful for any situation in which a quantity is naturally unstable and must be balanced through operator control. As one example, consider the flight engineer’s task of balancing the weight of fuel in aircraft wing tanks (the wings must have approximately equal weight to maintain aerodynamic stability of the aircraft). By clicking on a mouse, the flight engineer could activate a function by which lateral motion of the head or torso could be used to control the pumping of fuel between tanks (i.e., from one wing to the other). Subtle body movements would then be mapped to control of the system in an intuitive manner. An alternative approach might be to scale parameters of a controlled system to bodily affordances of the user. For example, control of a parameter (through a mouse or any other input device) could change the perceived distance of a display object. A desired value (e.g., optimal speed of a vehicle, or optimum flow rate of a liquid) would be scaled in a display to the length of the user’s arm, such that when the controlled parameter was at its target value, the display object would be seen to be just within reach.

12 Conclusion

Optimum design of enactive interfaces will begin with study of the affordances of the relevant systems. These affordances will dictate the information that should be displayed in enactive interfaces. Theoretical approaches that are based on the concept of affordances, such as Ecological Interface Design, can be adapted to guide the design of multimodal enactive interfaces. The central utility of enactive EID is to permit users to perceive and control affordances through rapid, intuitive movement. Enactive EID requires “closing the loop,” that is, the interface must monitor the intuitive movements of the user, and must be able to use those movements to update the display in realtime. But closing the loop is not sufficient. Enactive EID requires also (1) that designers understand the affordances of the controlled system, (2) that they be able to present information about those affordances in the display (these two requirements are related to EID), and (3) that information permitting detection and exploitation of affordances be linked to realtime data about the user’s intuitive movements. Requirements (1) and (2) are related mainly to EID, while requirement (3) requires a marriage of EID with a sophisticated understanding of the natural integration of movement and perception, that is, of enaction.

We have shown that affordances can be perceived from configural displays that preserve the rapid movements that characterize naturalistic human–environment interactions (Stoffregen et al. 1999). This result can be interpreted as an extension of Ecological Interface Design beyond the realm of relatively slow industrial events and processes. In addition, we have demonstrated that affordances can be perceived through intuitive, multimodal exploration of a virtual environment (Mantel et al. 2005). Our demonstration can be used as a “proof of concept” for the hypothesis that Ecological Interface Design can be extended to enactive perception in multimodal virtual environments.


Supported by Enactive Interfaces, a network of excellence (IST contract #002114) of the Commission of the European Community, the National Science Foundation (BCS-0236627), the University of Paris XI, The University of Montpellier 1, and the Institut Universitaire de France.

Copyright information

© Springer-Verlag London Limited 2006