We are most grateful to all the authors for their stimulating commentaries. At times, it may seem like any one experiment is a lens through which different observers may see very different things. We hope in this response to be able to clarify our study and to move forward the discussion as to what might constitute canine theory of mind.

Movement toward an interactive approach

We agree wholeheartedly with Miklósi and Topál (2011) that “there is evidence to suggest that domestication is a genetic process,” one that has led to a number of morphological, behavioral, and possibly even social changes in the domestic dog (see Udell, Dorey, & Wynne, 2010b, for more on canine domestication and socialization). To argue, as we do, that domestication is neither necessary nor sufficient to explain domestic dogs’ sensitivity to attentional state or responsiveness to human gestures is not to say that genetic domestication has had no effect on the behavior of domestic dogs (see also Udell et al., 2010b). Instead, we suggest that the results in Udell, Dorey and Wynne (2011), like prior pointing studies (e.g., Udell, Dorey and Wynne 2008), demonstrate that domestication is not necessary to account for dogs’ responsiveness to human stimuli, because a number of nondomesticated species demonstrate the capacity to succeed on such tasks. These include not only wolves (Gácsi, Gyori, Virányi, Kubinyi, Range, Belenyi, & Miklósi, 2009; Udell et al., 2008; Udell et al., 2011), but also bats (Hall, Udell, Dorey, Walsh, & Wynne, 2011), dolphins (Pack & Herman, 2004), fur seals (Scheumann & Call, 2004), and jackdaws (Von Bayern & Emery, 2009).

To say that domestication is not sufficient to explain domestic dogs’ performance on perspective-taking tasks is simply to say that domesticated animals are not born automatically responsive to specific human stimuli. The fact that some domesticated dogs fail to utilize human gestures, fail to be responsive to the human attentional state, or even fail to bond with humans altogether implies that something more than domesticated status is required to explain the success of dogs that succeed on these tasks. We have previously suggested that these factors include adequate exposure to humans during a species-specific critical period of social development (socialization or taming) and experience with the relevant stimuli under test (Udell et al., 2008, 2010b).

Miklósi and Topál (2011, Fig. 1) introduce a theoretical model predicting that the social development of domestic dogs should unfold faster than that of wolves, with dogs crossing the threshold for the expression of “certain social skills in inter-specific context” at an early age, while wolves with intensive socialization cross this line later in development. The difference between these two points is labeled as developmental delay on the part of wolves. We find this figure puzzling, given that decades of genetic, biological, and behavioral evidence show that it is domesticated (sub)species, including the domestic dog, that show developmental delays—with the exception of sexual maturation—in comparison with their nondomesticated counterparts. These delays include slower social development and the retention of juvenile characteristics into adulthood (Coppinger & Coppinger, 2001; Frank & Frank, 1982; Price, 1984, 1999; Scott & Fuller, 1965; Trut, 1999; Trut, Plyusnina, & Oskina, 2004). In fact, it is this delay or lengthening of social development that provides an extended opportunity for dogs to form bonds with humans and other animals (Lorenz & Coppinger, 1986; Udell et al., 2010b). Similar changes in the timing of social development have been observed in experimentally domesticated fox pups as well (Trut et al., 2004). Therefore, we cannot share Miklósi and Topál’s position on how domestication may influence performance on human guided tasks.

Miklósi and Topál (2011) raise two other points. The first is that “the majority of their [Udell et al., 2011] wolves (unlike dogs) were subjected to extensive associative conditioning (clicker training) and were familiarized with two-way object choice situations (see Udell et al., 2008).” This is simply incorrect. The location (Wolf Park, Battle Ground, Indiana), rearing practices, and living conditions of the wolves utilized in this study are readily available (see Klinghammer & Goodmann, 1987). The facility is also open to the public, making the status and identity of the subjects more accessible than for most studies conducted with wolves or animal subjects in general.

If Miklósi and Topál (2011) argument is that lifetime exposure to humans, including both explicit and implicit associative conditioning, underlies the animal’s responsiveness to human cues, we certainly agree. We agree further that the wolves tested by Udell et al. (2011) are not typical of the whole population of Canis lupus lupus, because they were socialized to humans early in life, and experience continued levels of interaction with humans more similar to that typical of pet dogs than of wild wolves. However, we part ways with Miklósi and Topál when they claim that these wolves are at an advantage in comparison with the average pet dog, because they have had some exposure to clicker training. The wolves in Udell et al. (2011) were not, and have never been, “show” animals. The clicker training they received during their lifetimes was not extensive, in comparison with that of average pet dogs, and was unrelated to the begging task utilized in Udell et al. (2011).

Miklósi and Topál (2011) further argued that the wolf and dog comparisons from their own research are more valid than those we reported in Udell et al. (2011), because their canid subjects were reared in a more controlled environment, with identical socialization experiences between dog and wolf subjects. However, according to their prior publications (for a review, see Kubinyi, Virányi, & Miklósi, 2007), individual dog and wolf subjects were raised at different houses in youth and were taken by their caretakers to a wide range of diverse environments—including formal training classes—many of which were not experienced by all individuals. Methodological differences can be noted. For example, wolves were sometimes tested outdoors from a standing position, while dogs were tested indoors from a kneeling position on pointing tasks (Kubinyi et al., 2007). At 2 months of age, wolves were relocated to a wolf pack where caregivers visited once or twice a week. Domestic dogs used for comparison, however, continued to live in human homes and had daily contact with humans (Kubinyi et al., 2007). In sum, while we do not deny the value of the results from the Kubinyi et al. study or other studies utilizing this group of wolves (including Miklósi et al., 2003; Topál et al., 2005), we disagree that the results from that study should be considered of greater utility than those from other recent studies of dog and wolf responsiveness to human cues. Certainly Kubinyi et al.’s study does not nullify reports that wolves have the capacity to be successful on human-guided tasks (Gácsi et al., 2009; Udell et al., 2008, 2011) or the findings that some groups of domestic dogs are not successful (Udell et al., 2008; Udell, Dorey, & Wynne, 2010a).

Miklósi and Topál (2011) also raise concerns over the ages of our subject groups. Our inclusion criterion specified that subjects had to be at least 4 months of age to participate in the study. While there is evidence that developmental factors may predict reduced success on human-guided tasks for dogs under 4 months of age (Dorey, Udell, & Wynne, 2010), no reports have indicated that age influences a subject’s performance after 4 months. Furthermore, success on the task was based on whether a group demonstrated above-chance performance on a particular occluder as assessed independently. Therefore, age was kept consistent for comparisons looking at each group’s success in using specific occluder types and would not explain why each group consistently succeeded on certain type(s) and not others.

Miklósi and Topál (2011) attempt to argue that similar performance between the older wolves and the younger shelter dogs does not fit with the predictions of the two-stage hypothesis, because “the older an individual is the more relevant learning experience is supposed to be gained” (Miklósi & Topál 2011). This is to miss the logic of the two-stage hypothesis (Udell et al., 2010b). Age itself tells us nothing about the relevant learning experiences an individual has encountered. The two-stage hypothesis proposes no grounds to expect that even the oldest wolf should perform better than the youngest shelter dog on a discrimination task involving a stimulus to which neither subject has previously been exposed. The finding that a wolf and a shelter dog that have had similar opportunities for exposure to a person turning his or her back and have had no known opportunities to observe a person reading a book would behave similarly when exposed to a task utilizing those stimuli—independent of age—was not a surprising outcome to us.

The two-stage hypothesis was developed as an interactive approach that would take both nature and nurture into account. The review by Udell et al. (2010b) addressed many of the issues raised here and provides a more complete picture of this hypothesis’ predictions. As presented by Miklósi and Topál (2011), the synergistic model seems to be a recapitulation of the two-stage hypothesis. Whatever one wishes to call it, the move toward an approach that considers the interacting ultimate and proximate mechanisms involved in the development of domestic dogs’ human-directed behavior is long overdue.

Having outlined our differences, we end on a note of conciliation. Miklósi and Topál (2011) sum up their critique by saying: “Thus we are left with the conclusion that wolves with particular social experience are sensitive to certain manifestations of human attentional state under some conditions, and the dogs’ flexibility to detect human visual cues of attention is affected by the age, rearing conditions and treatment practices.” We could not agree more.

What is perspective taking?

Virányi and Range (2011) critique two aspects of the study conducted by Udell et al. (2011). First, they argue that the perspective-taking task utilized should be considered an obedience task, rather than a begging task, and is thus not a test of canine theory of mind. Second, they suggest that the study provides no evidence that gray wolves respond to the attentional state of humans, despite their successful performance on the task.

Virányi and Range (2011) state that the experimenters in Udell et al. (2011) “simultaneously called the subject’s name and repeatedly delivered the command ‘Come!’” during testing. They claim the use of the command “Come” changed the task into a simple obedience test and, as a result, cannot be used to assess perspective-taking ability.

This conclusion is flawed for at least two reasons. First and most crucially, the command “Come!” was never given at any stage in the study under discussion here (Udell et al., 2011). The methods clearly state that, “Once the experimenters were in position with their condition-specific occluder in place, the assistant counted to three, at which time both experimenters simultaneously called the subject’s name (or a term such as “puppy,” for subjects without known names).” Reference to the use of the command “Come!” is an embellishment introduced in Virányi and Range (2011) and is inaccurate. To suggest that the procedure we actually followed, of simply saying the subject’s name, is equivalent to providing an explicit command is imprecise. Thus, we believe that our procedure should not be directly equated to that in studies providing behavioral commands such as “Fido down” (as in Virányi, Topál, Gácsi, Miklósi, & Csányi, 2004). However, even if, for the sake of argument, one were to consider the experiments reported by Udell et al. (2011) as context-specific obedience tasks, this would not negate the fact that canine performance was guided by stimuli associated with human attentional state, nor would it be any less of a perspective-taking task than those in prior studies of this type.

Perspective-taking tasks in dogs have traditionally included both begging and obedience tasks. In fact, the majority of studies on canine perspective taking involve giving a dog a direct command, such as “Leave it!,” and then systematically varying the attentional state of the experimenter (Bräuer, Call, & Tomasello, 2004; Call, Bräuer, Kaminski, & Tomasello, 2003; Fukuzawa, Mills, & Cooper, 2005). Therefore, the task at hand would still fall within the realm of traditional perspective-taking methodologies. The heart of the problem seems to lie in the interpretation of perspective taking as a cognitive skill that inherently requires theory of mind, instead of a behavioral repertoire that requires an empirical explanation.

Virányi and Range (2011) suggested that dogs should understand that a human who utters their name is attending to them independent of human orientation, much like humans understand that someone who is calling their name over a loudspeaker or talking to them while typing on a computer is still paying attention to them. This not only assumes a priori that dogs possess a theory of mind equivalent to that of humans in terms of scope and quality of experience, but more importantly insinuates that our individual subjective experience of perspective taking can inform us about the origins of this behavior or, at least, the universal presence of this ability.

In reality, there is still a lot to be learned about the development of what is called theory of mind in humans. Perspective-taking mistakes by human children are common; such mistakes are even made by adults (Epley, Morewedge, & Keysar, 2004). We know that in addition to age-related development, performance on perspective-taking tasks can be influenced by a child’s environment, experience, and prior history of relevant consequences (Perner, Ruffman, & Leekam, 1994). It should also be noted that dogs do not always respond to commands given by humans independent of orientation or attentional state (Fukuzawa et al., 2005); instead, context and prior experience impact whether human orientation is a relevant stimulus or not. Figure 2 in Udell et al. (2011), clearly shows that while dogs and wolves both demonstrate the capacity to respond appropriately to the attentional state of humans, individual performance on this task varies. We believe that this corresponds, in part, to the varied life experiences of the subjects.

While we could agree that the creation of an alternative method that does not require the use of any verbal stimuli at all could benefit future studies of canine sensitivity to human attentional state, we doubt greatly that an experimental design is going to be able to forego some form of signal to indicate to a subject that a trial is commencing. Because prior studies suggested that outdoor environments can initially be more distracting in the context of human-guided tasks (Udell et al., 2008), a verbal cue (calling the canid’s name) was used to mark the beginning of the trial for all groups, increasing the likelihood that the subject would notice the experimenters and food in the first place.

Virányi and Range (2011) argue further that Udell et al.’s (2011) results provide no evidence that gray wolves respond to the attentional state of humans, despite their successful performance on the task. We can see no reason to accept the positive results from dogs, while dismissing those from the wolves. Dogs and wolves were tested in identical manners. Not only were wolves successful as a group in the back-turned condition, but at least one individual wolf performed statistically above chance in each of the other conditions as well. These results would demonstrate the capacity for responsiveness to human attentional state in wolves, even if any one of the four occluder conditions was ignored. Citing the performance of dogs as evidence for generalized perspective taking while dismissing the successful performance of wolves on identical tasks does not provide a fair comparison between the subspecies and makes it seem unlikely that there could be any test that would be considered valid if wolves proved to be successful on it.

Theory of “mind” or “behavior”?

Roberts and Macpherson (2011) suggest that alternative experimental designs might have given the theory-of-mind theory of canine behavior a better chance over lower-level interpretations of the animals’ behavior. To this end, they suggest that had the “seeing” experimenter turned her back but looked over her shoulder, this might have constituted a more compelling test. We fail to see how success on this condition could have added anything to the data already collected. On our preferred explanation of the results of the experiment, canids that had previously experienced people glancing over their shoulders should succeed; others should not. It would be difficult, perhaps impossible, to formulate a plausible estimate as to which canids might have had such experience, thus rendering the results difficult to interpret.

We are grateful to Roberts and Macpherson (2011) for reminding us of the tantalizing first-trial data related in Cooper et al. (2003). Cooper et al. summarized results from a version of Povinelli, Nelson and Boysen (1990) “guesser-–knower” experiment carried out on 15 dogs. Each dog was given a choice of selecting a location containing food over two alternative empty locations on the basis of the pointing of two humans. One (the “knower”) pointed to the baited location; the other (the “guesser”) pointed at one of the unbaited locations. The dog saw that the knower observed the food placement, whereas the guesser did not. Roberts and Macpherson drew attention to the very high first-trial performance (93%) of these dogs.

Unfortunately, the experiment referred to in Cooper et al. (2003) has never been properly published. In Cooper et al., it is referred to as “Bishop and Young, in press,” but we have been unable to identify a final published paper. Even if the finding was that dogs spontaneously follow the pointing gestures of a person the dog has seen place a bait, this would still be open to alternative explanations in terms of the animal’s prior experience and simpler behavioral processes, rather than in terms of theory of mind.

This conclusion is in line with Horowitz’s (2011) point that there may be “intractable logical problems” with “even the best-designed theory-of-mind” tests. Horowitz perceptively points out that any and all putative theory-of-mind experiments for nonverbal animals suffer from the drawback that there exist “any number of other abilities which might account for the observed behavior.” Conversely, “failure might reasonably be explained as indicative of problems in experimental design.” If interpreted within a theory-of-mind framework, the guesser–knower task could easily suffer from the same criticisms, even if dogs were to succeed on it.

This being the case, why persist with talk of theory of mind at all? Rather than introduce a new category of rudimentary theory of mind, as proposed by Horowitz (2011), for subjects that recognize some portion of the intended human signal in such tests, perhaps it is time to acknowledge that the term theory of mind has outgrown its usefulness in comparative cognition studies. The alternative would be to concentrate our efforts on attempting to understand the specific stimuli controlling the animals’ behavior under different conditions.

Future directions

Understanding how canids, and other species, may come to show sensitivity to human cues is not a simple matter. No single study or methodology is likely to be sufficient on its own. Future progress on this issue will, we believe, be accomplished only by acknowledging the value of studies from many and diverse laboratories and approaches. This is not to say that methodological concerns should be ignored, but researchers should work to determine what impact specific methodological constraints actually have by using a systematic empirical approach (e.g., Udell et al., 2008). One very positive outcome of the recent surge of interest in dog social cognition is the large body of data that is being accumulated by independent investigators. This has allowed for more checks and balances, as well as new methodological innovations within our field. At present, data from many labs suggest a shift away from a strictly evolutionary approach to a more interactive one that acknowledges the interconnectivity of evolution, environment, experience, and development (Udell et al., 2010b).

Taken together, the evidence suggests that dogs do not need to be readers of our minds; instead, they are exquisite readers of our behavior. Dogs do not need to be preprogrammed with responses to human gestures or actions, because they are incredibly flexible and quick to make associations in their environment. Although a demonstration that known basic processes can account for the behavior of dogs on social tasks cannot disprove the existence of additional evolved mental states or capacities, perhaps that is part of the problem with such capacities: Are they derived from falsifiable hypotheses? It may be more constructive to ask what biological evidence exists to support these claims and why additional cognitive processes, such as a theory of mind, would have evolved in dogs, given that preexisting simpler processes can account for the behavioral repertoire credited for their success in human environments? Our folly may be in describing social attentiveness toward companions and flexibility through conditioning as lower level; the less attractive outcome in the battle of semantics. Instead, we should identify associative processes with the richness and complexity we observe in the behavior of domestic dogs. Indeed, it is the simpler explanation that most fully accounts for the great diversity and individuality within this species.