Get out of the corner: Inhibition and the effect of location type and number on perceptron and human reorientation

Dupuis, Brian; Dawson, Michael R. W.

doi:10.3758/s13420-013-0111-0

Get out of the corner: Inhibition and the effect of location type and number on perceptron and human reorientation

Published: 25 May 2013

Volume 41, pages 360–378, (2013)
Cite this article

Download PDF

Learning & Behavior Aims and scope Submit manuscript

Get out of the corner: Inhibition and the effect of location type and number on perceptron and human reorientation

Download PDF

Brian Dupuis¹ &
Michael R. W. Dawson¹

788 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Spatial learning and navigation have frequently been investigated using a reorientation task paradigm (Cheng, Cognition, 23(2), 149-78, 1986). However, implementing this task typically involves making tacit assumptions about the nature of spatial information. This has important theoretical consequences: Theories of reorientation typically focus on angles at corners as geometric cues and ignore information present at noncorner locations. We present a neural network model of reorientation that challenges these assumptions and use this model to generate predictions in a novel variant of the reorientation task. We test these predictions against human behavior in a virtual environment. Networks and humans alike exhibit reorientation behavior even when goal locations are not present at corners. Our simulated and our experimental results suggest that angles are processed in a manner more similar to features, acting as a focal point for reorientation, and that the mechanisms governing reorientation behavior may be inhibitory rather than excitatory.

Look up: Human adults use vertical height cues in reorientation

Article 17 June 2016

Neuronal vector coding in spatial cognition

Article 06 August 2020

Population codes of prior knowledge learned through environmental regularities

Article Open access 12 January 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Any mobile agent, capable of autonomously navigating through its world, must be able to find its bearing and orientation to do so. Researchers have developed a number of paradigms to investigate this ability, the foremost of which is the reorientation task (Cheng, 1986). In a reorientation task, agents are placed inside a controlled arena that contains a specific set of cues and are trained to search a particular location for reinforcement. Following testing, the arena is reconfigured, and the changes in agent search patterns are recorded.

A typical reorientation task uses a quadrilateral arena, usually rectangular, with four locations of interest (one in each corner). While some exceptions exist (Cheng, 1986; Newcombe, Ratliff, Shallcross, & Twyman, 2010), the overwhelming majority of reorientation experiments conform to this structure (Cheng & Newcombe, 2005). The emphasis on corners has some desirable properties; for instance, in a rectangle, corners that are diagonally opposite are geometrically identical. Observe, in Fig. 1a, that Locations 1 and 3 have an identical geometry (long wall on the left, short wall on the right, both walls joined at a 90° angle). Because of this, if an agent were choosing a location on the basis of geometry alone, it would have the same likelihood of visiting Location 1 as it would of visiting Location 3. This led to the discovery that rats processed geometric cues even when they could be completely ignored (Cheng, 1986; Gallistel, 1990). In a typical study (for instance, Wall, Botly, Black, & Shettleworth, 2004), Location 1 in Fig. 1a would be reinforced when visited, to indicate that it was the “correct” location. Furthermore, a unique landmark—a nongeometric identifier—would be placed at that location. This landmark provides sufficient information for an animal to learn the “correct” location; geometric information is not required and is, in fact, less reliable, because the geometric information at Location 1 is also present at the nonreinforced Location 3. Nevertheless, if, after training, the rat is placed in a new arena in which the “correct” landmark has been moved to Location 2, its behavior will typically indicate that geometric cues were encoded. That is, it will have a high likelihood of visiting Location 2 in the new arena (indicating that nongeometric cues were learned) but will also visit Locations 1 and 3 (indicating that geometric cues were also learned). Such results led to the development of the geometric module theory (Cheng, 1986; Gallistel, 1990), which is based on the assumption that geometric cues are processed independently of nongeometric cues and that the processing of geometric cues is mandatory.

The geometric module theory is an example of an insight into navigation provided by the reorientation task. This fairly straightforward task has been a fruitful source of information about navigation and has been used to study a wide variety of organisms, including ants, fish, rats, birds, and humans (review in Cheng & Newcombe, 2005; Cheng, 2008). While this informative task is straightforward to describe and provides data that are easily analyzed, it is important to realize that it is expensive to conduct. Particularly when using animal subjects, an experiment requires considerable commitment of resources, because subjects must be run individually and it takes a fair amount of training for a subject to learn the “correct” location before being placed in a novel arena. It would be convenient if there was a less expensive medium in which to explore the reorientation task, with the aim of discovering interesting hypotheses that could then be tested with a traditional (and more expensive) experiment.

Computer simulations are one less-expensive medium that can be used to explore domains of interest. Lewandowsky (1993) has pointed out that computer simulations can provide several advantages for theory development in cognitive science. These include formalizing a theory in such a way that rigor is improved and providing more precise tools for studying concepts of interest. Additionally, implementing a working computer simulation can be used to reveal tacit assumptions hidden within a theory. Finally, a computer simulation can itself lead to serendipitous findings, particularly when it is presented with novel situations. It would seem that if one had a plausible computer simulation for the reorientation task, it could be used to explore new situations with ease and could possibly generate unexpected predictions. These new simulation-based predictions could then be tested with a traditional experiment, particularly if a researcher felt that the predictions were interesting enough to warrant the experiment’s expense.^{Footnote 1}

Fortunately, a plausible computer simulation for reorientation has been proposed (Dawson, Kelly, Spetch, & Dupuis, 2010) in the form of a simple artificial neural network called a perceptron (Rosenblatt, 1962). In its standard form, a perceptron consists of a single bank of input units that numerically encode patterns of stimuli; these input units are linked via weighted connections to an output unit, which transforms this weighted net input signal into response behavior. The strength of the connection weights is then updated following a specified learning rule, designed to minimize the difference between the output unit’s activity and the desired response to that particular input pattern. A perceptron trained with a standard learning algorithm has been shown to generate most of the interesting regularities found in reorientation task behavior (Dawson, Kelly, et al., 2010). An operant perceptron model of reorientation, which uses a more psychologically plausible learning algorithm (in which the perceptron has a chance of not visiting a location, based on the total associative strength at that location, and connection weights are adjusted only when the perceptron chooses to investigate a location), has also been proposed and has shown some promising results (Dawson, Dupuis, Spetch, & Kelly, 2009; Dupuis & Dawson, in press). The purpose of the present article is to illustrate how an operant perceptron can be used to explore reorientation by observing the model’s behavior when novel reorientation paradigms are simulated. We demonstrate that this kind of computer simulation can generate interesting predictions that can then be tested using more traditional experimental methodologies.

Part of the power of computer simulations is that, by revealing tacit assumptions about the processes responsible for phenomena of interest (Lewandowsky, 1993), they also provide the means to challenge, or even negate, those assumptions. In the case of reorientation, while the geometric module theory had a strong early impact for many years, it has recently been questioned, with some researchers arguing for it to be abandoned completely (Cheng, 2008; Twyman & Newcombe, 2010). If reorientation is accomplished without the use of a geometric module, what mechanisms might instead be responsible? One alternative to the geometric module is an appeal to the general principles of associationist learning (Miller & Shettleworth, 2007). According to this view, there is no geometric module, but both geometric and nongeometric information are treated in the same manner as being valid cues. Agents use standard learning procedures to associate the various available cues (both geometric and nongeometric) at a location with the likelihood of being rewarded at that location. A model employing this approach has been shown to be capable of simulating many reorientation task regularities without an appeal to the geometric module (Miller, 2009; Miller & Shettleworth, 2007, 2008). However, serious empirical and theoretical problems with this model have been identified (Dawson, Kelly, Spetch, & Dupuis, 2008; Dupuis & Dawson, in press). The standard perceptron avoids these problems and also models reorientation regularities (Dawson, Kelly, et al., 2010). The operant perceptron captures the same regularities as the standard perceptron but does so with a more realistic conception of learning in the reorientation paradigm (Dupuis & Dawson, in press). Despite their mathematical differences, both the purely associationist and perceptron models were developed in the spirit of challenging the assumptions about the processes underlying reorientation and as reactions against geometric modularity.

Exploring assumptions used to define the reorientation task

Because perceptrons are plausible computer simulations for studying reorientation, they enable the exploration of other challenges, concerning not just assumptions about reorientation processes, but also assumptions about the reorientation paradigm itself. For example, although the prototypical reorientation paradigm (Cheng, 1986) employed a grid spanning the entire arena for measurement, it was noted earlier that a common feature of most modern reorientation experiments is the emphasis on corners as the only locations of interest. Might performance on the reorientation task be affected if agents were trained to go to locations that are not at corners? Are corner locations special in some way?

According to associationist theories of reorientation, corner locations should not be special. These theories, based on the work of Rescorla and Wagner (1972), posit a view of learning largely based on cue competition. Applied in to the reorientation task, this suggests that a location is merely a collection of cues that can be exploited as signals of potential reinforcement. From this perspective, there should be no fundamental behavioral difference between a location of interest at a 90° corner and one along a 180° wall. In effect, a location under consideration along a single wall “divides” the wall into a left and a right segment, exactly as a corner does, with only the angle of intersection distinguishing them. Under this associative viewpoint, it would appear that locations of interest need not be constrained to corners.

Another tacit assumption that guides experimental studies of reorientation concerns the number of target locations. The vast majority of reorientation studies have used quadrilateral arenas—typically, rectangles or squares (review in Cheng & Newcombe, 2005; Cheng, 2008) and, less commonly, in kites (Dawson, Kelly, et al., 2010; Pearce, Good, Jones, & McGregor, 2004), or parallelograms (Lubyk & Spetch, 2012; Tommasi & Polli, 2004). In these studies, the corners of these arenas have been used as target locations (even if noncorner information is part of the study; e.g., Ratliff & Newcombe, 2008), and therefore, these experiments have studied reorientation using four different locations. Furthermore, nonkite quadrilateral arenas make available only two different instances of geometric cues (long wall on the left and short wall on the right of a corner, short wall on the left and long wall on the right of a corner). Only a handful of studies have used arenas that are not quadrilaterals (i.e., Newcombe et al., 2010; Sturz & Bodily, 2011) and have, therefore, made available more than four potential target locations (again assuming that locations of interest are always positioned at arena corners). To our knowledge, no experimental studies of reorientation have explicitly compared situations in which the number of target locations has been systematically varied.

However, it is important to study the effect of varying the number of target locations, because some theories predict that this variable should affect learning in the reorientation task. For example, Miller and Shettleworth’s (2007, 2008) associative model scales the rate of learning by a measure of the probability of an agent visiting a particular location in its learning equations. This measure is expressed as the net attractiveness of the location as a proportion of the total net attractiveness of every location. This proportion will obviously be affected by the number of locations that are summed in its denominator. All else being equal, this will predict (at least initially) one half the normal rate of learning for tasks with eight locations, relative to tasks with four locations.^{Footnote 2} That is, the Miller–Shettleworth model predicts that learning the reorientation task will slow down as the number of possible locations of interest increases. Do other models, such as the perceptron, also make this prediction?

Furthermore, as more locations are added to a standard reorientation arena (Fig. 1c), a greater variety of geometric cues must be processed. For instance, there are four different geometric configurations that can be distinguished in Fig. 1c (as opposed to two in Fig. 1a and b), which again would be expected to slow reorientation learning in a theory like Miller and Shettleworth’s (2007, 2008) associative model. Does varying the number of target locations, and the number of possible geometric configurations, affect human learning in the reorientation paradigm, and does it do so in a fashion predicted by computer simulations? The present article represents an attempt to begin the exploration of such questions.

The purpose of the present work is to investigate the two main issues raised above. First, we attempt to evaluate the role that the nature of a location—at a corner or at a wall—has on reorientation behavior. Second, we investigate the impact that changing the number of locations of interest has on reorientation behavior. These distinctions are depicted in Fig. 1. Figure 1a illustrates the typical position of possible target locations in a standard reorientation task that uses a rectangular arena. Figure 1b provides an analogous arena, but one in which the locations of interest are not found at corners. Figure 1c shows how one can combine the first two arenas into a third that has eight locations of interest, instead of the typical four.

This article proceeds as follows. It begins by using an operant perceptron (Dupuis & Dawson, in press) to simulate the reorientation task in the various arenas illustrated in Fig. 1. These simulations are used to make predictions about the effects of type of location and of number of locations on reorientation behavior. The results of these simulations provide two key predictions: (1) Corner locations are not inherently special, and (2) doubling the number of target locations has a negligible effect on the speed at which the model learns to reorient. Underpinning these predictions is evidence that the mechanism at work may be inhibitory rather than excitatory, which has important theoretical implications. Next, we report the results of testing these predictions, using human subjects in a virtual world. Finally, we explore the similarities and differences between the associationist model and the human data. We argue that the operant perceptron is a useful source of predictions that can be supported by experimental data. As a result, the operant perceptron appears to provide a medium in which reorientation can be plausibly explored for the purpose of seeking surprising and interesting results that can later become the focus of traditional experimentation.

Simulation

From the perspective of theories of reorientation that appeal to a geometric module (Gallistel, 1990), angle information present at a corner is typically viewed as a global, geometric property. However, from an associationist perspective, a “corner” could be perceived as a visually salient “focal point” that serves as a reference, with the angle simply being a (local) feature of that location. The intersection between walls provides a distinct boundary, from which the length of a wall can be measured. For instance, in Fig. 1b, Location 1 sits at the junction of a short wall on its left and a long wall on its right, with an intersection angle of 180°.

With this in mind, we devised a method of representing any location along the edge of an arena that treats angle information as a feature. This representation permitted us to present locations to perceptrons even when these locations were not at a corner in a reorientation arena.

The perceptron

Perceptron reorientation

As was noted earlier, a perceptron (Rosenblatt, 1962) is a simple connectionist network in which a set of input units are directly connected to an output unit via weighted connections. The input units represent stimuli; their activation causes signals to be sent through the weighted connections to produce a response in the perceptron’s output unit. Feedback can be provided to the network about its response so that it can modify its connection weights. This permits the perceptron to learn to generate a desired response to each stimulus in a set of training patterns.

To simulate the reorientation task, each location of interest in an arena is represented as a stimulus in the set of training patterns. For each of these patterns, input unit activity is used to represent which cues (geometric and nongeometric) are present at a particular location. If a location is deemed to be “correct,” the perceptron is reinforced when that location’s cues are presented. If a location is not deemed to be “correct,” the perceptron is not reinforced when that location’s cues are presented. In other words, the perceptron is trained to produce an activity of 1 to sets of cues corresponding to “correct” locations and an activity of 0 to sets of cues corresponding to “incorrect” locations.

In order to train the perceptron to learn to reorient in a particular arena, one must make design decisions about how to represent the available cues and about the learning rule that is used to modify the network’s connection weights. The details of these design decisions are provided below.

Defining the task: Stimuli

Each location identified in Fig. 1 can be defined as a collection of properties, which are presented to the perceptron as a pattern of unary-coded inputs. That is, each of the perceptron’s input units encodes the presence or absence of a specific cue. Each of these units is turned on (activated with a value of 1) when the property it encodes is present and is turned off (activated with a value of 0) when that property is absent. In the present simulation, each location of interest is defined by three types of cues: the length of the walls on either side of the location, the angle between the walls where they join, and the kind of local landmark that can be present at the location. Seventeen different input units were used to represent the possible values of these cues, as is summarized in Table 1.

Table 1 Encoding of a location’s properties and the agent’s response using an operant perceptron

Full size table

The angle units (1–2) identify the angle of intersection at the location. These units have one value for locations at corners (90° angle) and another value for locations along walls (180° angle). The feature units (3–11) represent the collection of nongeometric properties present at a given location. For parsimony with the experiments described later in the article, these units are named after colors; as such, these units can be thought of as representing the color of an object at the location.

The length configuration units (12–17) represent the specific set of wall length properties present at a location. For example, one unit is turned on for a location at the intersection of a wall of length three with a wall of length six, while another might be turned on if the location lies between walls of length two and one. This is an extension of Miller and Shettleworth’s (2007, 2008) representation for specific geometries that allows for a number of possible configurations—up to six in the current simulation. This is required when more than four locations of interest are used (Fig. 1c).

When put together, this encoding can represent any possible location of interest as a string of 0 s and 1 s reflecting the absence or presence of the corresponding cue at that location. For instance, if Location 1 in Fig. 1b (180°, two-length wall to the left, four-length wall to the right) contained a “blue” feature, the location would be presented to the perceptron as “10000100000000100.”

This particular set of design decisions defines this encoding as a purely local code: Each pattern contains only information present at the location it represents and no information from any other location. Similarly, this encoding contains no global representation of the arena, either explicitly (i.e., a principal axis, Cheng & Gallistel, 2005; or relative dimensions, Huttenlocher, 2003) or implicitly (as in Miller & Shettleworth’s [2007, 2008] model summing across all locations), save for the number of patterns presented. Indeed, the perceptron is unable to distinguish to which arena a duplicated pattern belongs; the above example location code would also be seen when Location 1 in Fig. 1c was presented, for instance, since this location is geometrically and featurally identical to Location 1 in Fig. 1b. The only information available to the perceptron at any given time is found in the cues present at the location under consideration; in order to detect additional unrepresented features in this encoding, the perceptron would require hidden units (Dawson, 2004; Rummelhart, Hinton, & Williams, 1986) or a similar architectural adjustment.

Defining the task: Response

This simulation includes a single output unit that uses the logistic activation function (Dawson, 2008) to convert the total weighted signal coming from the input units into a response that can range between 0 and 1. For locations that are reinforced, the perceptron is trained to turn on (output activity = 1); for locations that are not reinforced, the perceptron is trained to turn off (output activity = 0). Because, during learning, perceptron activity falls in the continuous range between 0 and 1, at any moment in time, the perceptron’s output can be interpreted as its estimation of the conditional probability of reinforcement at a location, given the cues at that location (Dawson et al., 2009; Dupuis & Dawson, in press).

Training method

A perceptron’s response to particular patterns of stimuli is not perfect; each generated response differs from a desired response by some error amount. This error is then used by a learning rule to adjust the perceptron’s connection weights such that subsequent presentations of that pattern of stimuli produce a smaller error. Here, we employ the gradient-descent learning rule (Dawson, 2004, 2008), which has desirable properties when working a logistic perceptron response.

This output response provides a critical distinction between neural network models and traditional associative models in the style of Rescorla and Wagner (1972). The perceptron’s output activity allows it to convert associative strength of assorted cues into a model of behavior. This stage is absent from traditional associative models, which describe only the indirectly observable associative strength. Not only does this difference allow perceptrons to produce different predictions from formally equivalent associative models (Dawson, 2008), but also it allows us to adjust the model’s learning to reflect different patterns of behavior.

In a standard perceptron model, the connection weights would be updated after presenting any location to the network; that is, there would be no model of choice behavior during learning, much like the definition of classical conditioning. In the present model, we use network output as a measure of behavior to adjust the perceptron’s learning from classical conditioning to operant conditioning, where it is allowed to “choose” whether or not to investigate a particular location, and this investigation (rather than rote presentation) governs its learning. Instead of updating connection weights after every pattern of cues is presented, the perceptron’s output response to that pattern is used as the conditional probability of updating weights on this presentation, given the cues presented. With each presentation, a random number between 0 and 1 is generated and compared with the output response; if the random number exceeds the output response, the connection weights are not updated, and the next pattern is presented. In effect, the perceptron will choose whether or not to visit a location with a probability based on how attractive the cues at that location are, and it will learn only from locations it chooses to visit. This algorithm is detailed at length in Dupuis and Dawson (in press) and in a more abbreviated form in Dawson et al. (2009).

Simulation specification

Training

The present simulation includes two experimental conditions: one with four locations of interest, and one with eight locations of interest. Each location is present at either a wall or a corner and contains a unique feature cue (i.e., a colored object). Since there can be up to eight possible locations within one condition, this demands eight unique feature cues. Within each condition, networks are trained to investigate just one location; this location is reinforced, while all others are not reinforced. The reinforced location could be present at a wall or a corner, producing a 2 (four-vs.-eight) × 2 (corner-vs.-wall) design.

All networks were initialized with all biases and connection weights equal to zero and were trained with a learning rate of 0.1. Five networks in each condition were trained to convergence. For counterbalancing, two possible reinforcement locations were used in each simulation; for example, in the four-location, corner-goal task (Fig. 1a), one group of networks is reinforced at Location 1, while another is reinforced at Location 2. No appreciable difference was found between these groups within a particular task, so their results are reported together here. (That is, each value is averaged from 10 networks.)

Testing

Testing the perceptron involves presenting patterns of cues corresponding to transformed arenas and measuring the perceptron’s output response to these novel patterns. Due to the operant nature of its training algorithm, the perceptron’s output response is both its estimation of the conditional probability of reward at the location given the cues at that location and its likelihood of choosing to visit that location.

There are two types of transformed arenas common to reorientation studies: affine transformations and “featureless” transformations. Affine transformations place feature cues and geometry cues in conflict with each other: a chosen location could be consistent with the geometry present during training or the features present during training. For instance, in Fig. 1a, a subject might find reinforcement at Location 1, along with a unique feature. When placed in an arena with an affine transformation, that unique feature might now be present at Location 2. Location 2 is consistent with training in terms of features, while Locations 1 and 3 are consistent with training in terms of geometry. Meanwhile, a featureless transformation replaces all unique feature cues with indistinguishable ones, forcing the model to base its decisions solely on encoded geometry. In the present simulation, we perform a featureless transform by simply turning off all feature units that were present during training and activating a novel “white” feature unit in their place.

With four locations, we can also test for generalization across angle cues by observing a corner-trained network’s response to a wall-locations-only arena (that is, a network trained in Fig. 1a but tested in Fig. 1b) and vice versa. In this scenario, each location now appears with novel angle and length configuration cues, as opposed to an affine transformation, which has novel length configurations but consistent angles. Since these two conditions do not share exact wall lengths, no choice can be consistent with wall length geometry from training.

With eighth locations, one can also do a partial transformation. While affine-transformed arenas have consistent angle information (targets that were present at corners are present at corners during testing), a partial transform places them in conflict. Both transformations have novel wall length geometries, as compared with the training condition.

Following training, each network was presented with probe trials in three transformed arenas. For four locations, these were affine, generalized, and featureless arenas. For eight locations, these were affine, partial, and featureless arenas. Each network’s output responses were recorded for each of these locations; these responses were averaged across the five networks present at each condition.

In the reorientation task literature, it is common to report responses in terms of the frequency with which each location is chosen. However, the perceptron responds to each location individually, producing the probability of choosing to act at that specific location; these probabilities need not sum to 1 across all locations within an arena. In order to convert the former into the latter, we divided the response to a specific location by the sum of responses to all locations within a given arena; this method has previously been used to successfully predict several key reorientation behavior regularities (Dawson, Kelly, et al., 2010).

Results

Across all conditions, networks converged after an average of 4,810 presentations of the training set (a single presentation of each pattern [location] in the training set in a random order is called a sweep), with the fastest training occurring after 4,614 sweeps of training and the slowest training requiring 4,973 sweeps. Due to the perceptron’s specified learning rate parameter of 0.1, training times of this magnitude are not uncommon for problems of this size; the more pertinent observation is that the range of training times is quite narrow and is not significantly different across all training conditions, F(3, 36) = 1.95, p = .14.

Network responses

The network model’s responses to each location in each transformed arena, expressed both as response activity and as choice frequencies, are reported in detail in the tables presented in the Appendix. Network activity refers to the activity in the network’s output unit given the cues presented at a particular location; this is interpreted as the network’s estimate of the conditional probability of reinforcement at a location given the cues present at that location. These conditional probabilities are converted into network frequency through normalization within each condition (Dawson, Kelly, et al., 2010). The tables also include a summary of human responses in similar conditions in experiments that were inspired by the simulation results. The human responses in the table are covered in more detail when the human experiments are discussed, below.

The first major prediction generated by the model is that there does not appear to be a significant difference in reorientation behavior between networks trained with locations in corners and networks trained with locations along walls. In conditions with four locations of interest and eight locations of interest, whether reinforced at a corner or at a wall, the perceptron converged after a similar number of sweeps of training. Furthermore, in all cases, the same broad pattern of behavior holds: The perceptron responds most strongly to locations containing the (unique) feature cue present during training, but that cue did not prevent the encoding of either geometric cue. That is, even within the featureless arena, the perceptron still estimates that locations with the same wall length configuration and/or angle amplitude as the training location have a greater likelihood of reward than do locations missing those cues.

Additionally, the perceptron produces characteristic rotational error behavior common to reorientation tasks (Cheng, 1986); that is, where features and geometry conflict in the same arena, the perceptron responds to the feature more frequently than to any other single location, but taken as a whole, locations with correct geometry are chosen with higher frequency. This pattern appears in both the four-location and eight-location tasks, as illustrated for the affine transformation in Fig. 2 (for more detail on the other conditions, please refer to the Appendix). Furthermore, it occurs even if the angle information changes between conditions; for instance, the generalized arenas in the location task still produce this pattern, even though the exact configuration of geometries present in this condition are novel.

Connection weights

To understand why these networks behave in this manner, we turn next to their connection weights. Since 10 networks completed each training condition, their connection weights were averaged to produce a summary of how a typical network solved that particular problem. This summary is presented in Table 2.

Table 2 Operant perceptron connection weights for each simulation

Full size table

An examination of this table reveals that, within the four-location task, the bias and reinforced angle units assume negative values, while nonreinforced angle units assume a value of 0. This informs us that, before considering wall length configuration or feature information, the network initially tends to turn off (output activity and, thus, probability of investigating a location approaching 0) at any given location. In the eight-location task, however, this is slightly different: While the bias remains negative, the reinforced angle assumes a 0 weight, while the nonreinforced angles assume a strong negative weight. Despite this difference, this pattern of weights, in the absence of other cues, produces identical behavior to the location network.

It is only after the network considers other cues that it begins to overcome this negative association and develop a moderate probability of investigating a given location. Within the four-location task, the wall length configuration corresponding to the reinforced location assumes a positive value with magnitude slightly larger than the magnitude of the bias and the angle at that location. A similar result occurs in the eight-location task, where the correct wall length configuration and the bias effectively cancel out and the correct angle has a weight of 0 (the only situation where a cue present during training assumes a 0 weight). In both of these cases, the net input is close to 0; the output unit’s logistic function translates this into a .5 probability of acting, given those cues. In other words, for both the four-location and eight-location tasks, if the networks encounter a location with the correct geometry but lacking any feature, they are as likely as not to choose to investigate that location. The overall choice frequency behavior this produces will vary depending on the number of locations (see the tables in the Appendix); however, the underlying mechanism is identical. It is interesting to note that, ignoring features, the “correct” wall length configuration is reinforced on 50 % of its presentations (the reinforced location and its nonreinforced rotational equivalent), while the “incorrect” configurations present in any condition are reinforced 0 % of the time, and the perceptrons’ responses converge to match these probabilities. The operant perceptron has already been established to match probabilities in classical choice-behavior tasks (Dawson et al., 2009); for it to exhibit this behavior in a reorientation context reinforces Miller and Shettleworth’s (2007) conceptualization of reorientation as an operant task.

The feature cue connection weights tell an unsurprising story in both the four-location and eight-location tasks. The feature that was reinforced during training assumes a very strong positive weight, while the feature rotationally opposite the reinforced location (i.e., the other location with identical geometric cues) assumes an equally strong negative weight. The positive magnitude of the weight given to the correct feature far exceeds the negative value of the bias plus any incorrect geometric cue; that is, the network has a high probability of acting when presented with the correct cue, even if both angle and wall length configuration cues are incorrect. Meanwhile, all other features take on a moderate negative weight. In the context of the geometric cues discussed above, this informs us that the network is inherently hesitant of investigating any location but that the presence of a correct feature is sufficient to overcome this hesitancy. Furthermore, the feature present at the rotational equivalent of the reinforced location during training assumes a negative strength sufficient to overcome correct geometric cues—in effect, becoming a reliable indicator of no reinforcement.

Discussion

The operant perceptron’s behavior on these simulations allows us to generate novel empirical predictions. To begin, the network used the same encoding for all conditions (four locations or eight locations and wall reinforcement or corner reinforcement) and was able to converge in all of these conditions without difficulty with the same amount of training. Therefore, the operant perceptron predicts that similar mechanisms are at work regardless of the global shape of the arena and that changing the number of locations of interest will have a negligible effect on the difficulty of the task. These predictions are broadly compatible with previous empirical work on multiple-location reorientation (Newcombe et al., 2010) but are incompatible with theories that include an implicit representation of the global environment (Miller & Shettleworth, 2007, 2008).

Furthermore, the operant perceptron does not predict any real difference between tasks where the locations of interest are found within corners and tasks where such locations are not found at corners. In both cases, networks were able to learn the task, encoding sufficient geometric cues to reorient and producing comparable behavior when presented with transformed arenas. This behavior persisted even if the cue types were completely novel, suggesting some degree of generalization—although the network predicts that the mechanism behind this generalization is inhibitory.

We can elaborate on this inhibitory mechanism by examining the connection weights in Table 2. Specifically, the networks learned that particular wall length configurations signaled that a location was not reinforced and learned that a particular color’s rotational opposite was a reliable indicator of no reinforcement. When the networks were presented with the transformed arenas, they did not respond to the novel geometry at all; they had not learned that such configurations signaled no reinforcement. Instead, the network responds at chance values to each location, except for the two locations containing the “correct” feature and its rotational opposite. Rather than developing an explanation of what the agent may be searching for in these cases, a study of connection weights informs us that we might instead be focusing on what the agent is avoiding. This tendency to emphasize excitation at the expense of inhibition when explaining learning is a tacit assumption present in many different theories of learning (Rescorla, 1967); the operant perceptron model reinforces this point and reminds us of the need to check such assumptions.

Experiments

The operant perceptron has generated some interesting predictions on novel permutations of the reorientation task. Specifically, the operant perceptron makes two broad claims: first, that there is no appreciable difference in reorientation behavior among groups trained with locations in corners or along walls, and second, that there is no appreciable change in difficulty when the number of salient locations changes. Do these predictions hold under laboratory conditions with humans? To test these predictions, we conducted a series of basic reorientation experiments using human subjects.

Our experiments are organized into three studies. Study 1 involves two groups of subjects trained on a four-location reorientation task; one group is trained on corner locations, and another trained on wall locations. The locations in these tasks correspond to those shown in Fig. 1a and b.

Study 2 is analogous to Study 1, except that the training arenas have eight locations of interest, as in Fig. 1c.

Immediately after completing Study 1 or Study 2, each participant also completed the task described in the other study; Study 1 participants completed the four-location task and then immediately progressed through the eight-location task exactly as described in Study 2, and vice versa. This allows a direct comparison of the difficulty of reorientation in arenas with four locations and eight locations. Furthermore, this manipulation can test for order effects: Did subjects learn either task faster, and did the first task facilitate learning the second? These comparisons are the focus of Study 3.

Study 1: Four locations

Method

Subjects

Subjects were 36 University of Alberta undergraduates (30 female), who received course credit for participation. Recruitment criteria required subjects to have normal color vision.

Apparatus

The environment was constructed using the fAARS-Lite platform (Dupuis, 2012; Gutiérrez, 2012; Lubyk, Dupuis, Gutiérrez, & Spetch, 2012), which simulates first-person 3-D movement in a virtual world. This virtual world contained a number of rectangular arenas (17.2 × 8.6 m), consisting of matte-gray walls and floors with black, visually obvious edges. The walls were high enough to extend beyond the default field of vision in all possible subject locations and orientations.

Subjects arrived in an arena in its center, facing a random direction (one of the eight cardinal or ordinal directions, chosen at random on each arrival). Subjects could move their perspective through these arenas with the arrow keys.

Stimuli

Attention was called to locations of interest through brightly colored cylinders (1.5-m radius, same height as surrounding walls) placed against the walls of the arena. These locations were placed in two possible configurations, with the locations of interest being set at corners or along walls. These configurations correspond to Fig. 1a and b. During training, these cylinders had one of four colored textures placed over a white background: green checkerboard, red diagonal stripes (upper-left to lower-right), yellow diamonds, or blue horizontal bars, listed in the order in which they appeared when one looked clockwise from the center. Figure 3 presents examples of these stimuli.

During testing, the positioning of these locations shifted into one of three possible configurations: affine, generalizing, or featureless. In the affine condition, each location had been shifted one “slot” clockwise; in Fig. 1a, the green location had been present at Location 1 during training and would be at Location 2 in an affine-transformed arena. In the generalizing condition, the targets were shifted to a novel geometry—that of the other group’s training condition. That is, a subject trained in Fig. 1a would experience Fig. 1b as its generalizing condition. Finally, the featureless transform removed all distinguishing information from the cylinders; in place of their brightly colored patterns, subjects simply saw blank, white pillars.

General procedures

Participants were pseudorandomly divided into four groups based on two possible categories (counterbalancing for gender). One of these divisions was based on target location: Half the subjects would be trained with locations along walls, and half the subjects would be trained with locations at corners. Similarly, subjects were split into two reinforcement groups before the experiment began: those who would receive reinforcement at a location with a long wall on its left (group A) and those who would receive reinforcement at a location with a long wall on its right (group B). The division between group A and group B also counterbalances for distance to first location (where applicable); for instance, in the wall group (see Fig. 1b), half the subjects would be reinforced at Location 1, which is close to the start location, and half would be reinforced at Location 4, which is further away.

Upon arrival, subjects were instructed on how to move around in the virtual world and were given time in a “welcome” room (a curved hallway with arrows pointing to a door at its end) to practice movement before the experiment began. Instructions were given to find a “correct location” inside each new room they saw; these instructions deliberately avoided the words “corner” and “wall.” Subjects were told that they made a choice by walking into a location, at which time they would see a display informing them whether their choice was correct or incorrect. Occasionally, they were told, the display would say “no feedback” regardless of the accuracy of their choice. To encourage a consistent strategy over time, subjects were told that they would be awarded points for correct choices (even if the display said “no feedback”) and that they should maximize their score.