The most dangerous predators of Japanese tits (Parus minor) are rat snakes (Elaphe climacophora), which climb trees, enter nesting cavities, and consume the young birds. When a snake approaches, adult tits produce a distinctive jar vocalization that spurs their fledglings to pile out of the nest hole. The chicka alarm call, in contrast, is elicited by a wider array of threats, particularly from crows or martens. Young birds respond to chicka calls by huddling silently at the bottom of the cavity. Nearby adults react to either of these alarm calls by dropping other activities, locating the offending predator, and mobbing it until it leaves the area. Suzuki (2012) showed that, like the juveniles, adults respond differentially to snake and nonsnake vocal warnings. Following playback of jar alarms, most adult tits scan the ground; following chicka alarms, they tend to move their heads horizontally, scanning the sky and nearby trees. On the basis of these distinctive responses, Suzuki (2012) contended that the two vocalizations are functionally referential, conveying contrasting information about objects or events in the outside world.

For humans, hearing a particular word (e.g. “Snake!”) elicits an intricately networked representation of related properties, events, and shades of meaning. There is little evidence from other animals of a similarly complex reaction to vocal signals (Seyfarth & Cheney, 2010). It is possible that a snake alarm could evoke something like a human cognitive representation, but the birds’ behavior could just as easily be the outcome of a chain of innate responses. Perhaps the adult birds react to the call by simply looking down, just as their fledglings innately respond by fleeing the nest. Once the birds are looking down, their chances of seeing a nearby snake are increased, and some stimulus associated with the predator—perhaps just its sinuous movement—subsequently releases mobbing behavior. Determining the meaning of an animal signal is a challenging business: Differential consequences alone are not sufficient evidence.

But the fact that adult tits respond to snake alarms by actively seeking out a predator to mob suggests that their behavior may go beyond a chain of innate action patterns. Suzuki (2018) noted that when alarm calls are played back to adult tits, the birds seem to behave from the outset as if they were focused on a particular visual task, as if they were expecting to encounter a specific set of snake-related stimuli. If the vocalizations actually evoke a representation of the appearance of the predator, they could serve as an associative cue for initiating a selective attentional search (Bravo & Farid, 2012; Goto, Bond, Burks, & Kamil, 2014). In brief, the call would directly activate a rat snake searching image.

The transformation of a cognitive representation into a focus for visual attention is almost intuitive for humans: You ask a child to go and get a grape, and she runs to the fruit bowl, searching for something small, round, green, and attached in bunches. To the best of my knowledge, Suzuki (2018) is the first to propose this process in a nonhuman species. It is an empirically tractable hypothesis. The control of visual attention has been extensively researched in both humans and animals, so Suzuki’s (2018) searching image model ought to be open to testing in wild birds.

Attention is, however, very challenging to investigate quantitatively. It is fragile and fleeting, subject to interference from many psychological and environmental factors (Goto et al., 2014). It can be studied successfully in the field, but it requires rigorous controls and systematic consideration of alternative hypotheses. So far, Suzuki’s (2018) experimental designs have failed to do justice to his elegant conjecture. He conducted a pilot experiment involving playback of either chicka or jar calls accompanying a simulated snake—a moving stick pulled by a string. The birds closely examined this rough-and-ready model, but only when it was doing snake-like things, and only in response to playback of jar calls. The results do not take us far beyond our original account of chained action patterns. If the innate response to chicka calls is to stare at the sky, while that to jar calls is to scan the ground, it does not seem surprising that chicka birds would tend to overlook terrestrial moving sticks. And if a snake-mobbing action pattern is released by long things making sinuous movements, the enhanced response to sticks behaving like snakes is not really evidence of a searching image.

There are two approaches that would begin to validate Suzuki’s searching image hypothesis, one through analysis of the component features of the presumed referent (the snake stimulus itself) and the other through testing the cueing mechanism. An effective searching template for any natural object must operate along multiple stimulus dimensions—objects can be seen from many angles and distances and may have unique color patterns and behavior—so it would necessarily incorporate a range of stimulus features (Bravo & Farid, 2012). To test for use of a searching image, the tit experiment has to deploy a fully realized model snake, something that reliably elicits jar calls and mobbing even without playback.

One possibility might be something like a jointed “wiggle snake” toy, which makes sinuous snake movements when pulled along by a thread. The model could be painted realistically in the colors of rat snakes, which would yield a set of stimulus attributes—color, size, proportion, movement pattern, location—that could be independently modified to deconstruct their contribution to the searching template. Once the model referent has been validated, and it has been shown that multiple attributes contribute jointly to the alarm response, the hypothesis of associative cuing could be tested: How well does alarm playback evoke the full details of the template?

There are four treatment types: (1) uncued = no call, just the realistic snake model; (2) correctly cued = jar call, accompanied by the correct model; (3) miscued = jar call, accompanied by something that violates the cued expectation, perhaps a model of a different snake species; and (4) false alarm = jar call, accompanied by nothing at all (Goto et al., 2014). Comparisons of time to initiate a response, direction of orientation, extent of approach, and persistence of response would provide indications of the degree of attentional focus. It would need to be a within-subject, repeated-measures design, but because attentional effects of artificial stimuli habituate rapidly, the trials would need to be spaced out over time and regularly intermixed with occasional, highly stimulating exposures to real snakes.

Suzuki’s hypothesis would predict that cued trials should elicit superior performance to uncued ones, uncued should be better than miscued, and all should be superior to false alarms. Most importantly, cued trials should be effective even with model snakes that lack one or more stimulus dimensions, while uncued trials should be far less forgiving of incompletely realized models. If these predictions were confirmed, the results would strongly support Suzuki’s (2018) approach to the analysis of alarm vocalizations and open a new perspective on studying searching images in the field.