Introduction

In the early twentieth century, a horse named Clever Hans was believed to be capable of counting and other mental tasks. The psychologist Oskar Pfungst confirmed that Clever Hans was in fact recognizing and responding to minute, unintentional postural and facial cues of his trainer or individuals in the crowd (Pfungst 1911). The “Clever Hans” effect has become a widely accepted example not only of the involuntary nature of cues provided by onlookers in possession of knowledge unavailable to others, but of the ability of animals to recognize and respond to subtle cues provided by those around them. However, an additional important consideration was the willingness of onlookers to assign a biased interpretation of what they saw according to their expectations.

Experimental paradigms for investigation of animal behaviors are designed to minimize or eliminate confounds arising from the Clever Hans effect. Because the abilities of domestic dogs to respond to human social cues have been extensively documented (reviewed in Miklosi et al. 2007; Reid 2009), a Clever Hans effect might be particularly prevalent in dogs. Indeed, the reliance of some dogs on human cues has been shown to override olfactory or visual cues indicating the location of food (Szetei et al. 2003). In one experiment, about 50% of dogs would go to an empty bowl indicated by human pointing rather than to a bowl in which the dog had seen and smelled food (Szetei et al. 2003).

This finding was notable in view of the exceptional olfactory acuity in the domestic dog. Humans have capitalized on dogs’ olfactory sensitivity through use in an ever-expanding array of scent detection activities (e.g., Horvath et al. 2008; McCulloch et al. 2006; Oesterhelweg et al. 2008; Wasser et al. 2004). Scent detection dogs search an area as directed by their handlers, issuing an operant trained response (“alert”) upon detection of their trained scent. However, scent detection dog performance is not solely dependent on olfactory acuity. Cognitive factors such as context dependence (Gazit et al. 2005) and the interaction between training paradigm and the nature of the detection problem (Lit 2009; Lit and Crawford 2006) also can impact performance.

Because the alerting response is initially trained by handler cueing upon dog interest in the desired target scent (e.g., Wasser et al. 2004), it is possible that dogs are also being conditioned to respond to additional unintentional human cues. Generally, trained dogs, including search and rescue dogs, look at humans less than untrained dogs in experimental paradigms requiring dogs to solve a problem such as opening a container (Marshall-Pescini et al. 2009, 2008; Prato-Previde et al. 2008). Indeed, an inverse relationship between owner/handler dependence and problem-solving performance had previously been identified; that is, a more dependent relationship in companion dogs fostered impaired problem-solving performance compared with working dogs (Topal et al. 1997).

Yet given the social cognitive abilities of the domestic dog, it is possible that even highly trained dogs might respond to subtle, unintentional handler cues. Dogs’ biases for utilizing human movements or social cues impair decision-making and reasoning abilities (Erdohegyi et al. 2007). Dog behavior is further affected by owner/handler gender and personality (Kotrschal et al. 2009). Moreover, dogs evaluate attentional cues of their owners through cues including eye contact and human eye, head and body orientation (Schwab and Huber 2006). Dogs can further distinguish the focus of human attention, using other visual cues such as pointing, gazing, head nodding in the direction of a target, glancing at a target and head turns toward a target affect selection of a target object by a dog (Soproni et al. 2001; Viranyi et al. 2004). In fact, nonverbal cues including proximity of the human to the dog and contextual learning of verbal commands have been shown to moderate dog response to verbal commands (Fukuzawa et al. 2005).

For scent detection dog handlers, beliefs that scent is present might result in either sufficient inadvertent postural and facial cues so that dogs will respond regardless of the absence of scent, beliefs that dogs are providing their trained alert response or simply beliefs that alerts should be called regardless of dog behavior. All of these effects would result in false alerts identified by handlers. These handler beliefs might be influenced by human communication regarding target scent location. Alternatively, handler beliefs might be influenced by increased dog interest in a nontarget scent. The main questions of this study were to (1) determine whether handler beliefs affect detection dog outcomes and (2) evaluate relative importance of dog versus human influences on those beliefs. The present study attempted to determine whether handler beliefs of target scent location would affect outcomes in scent detection dog searches. Importantly, this study was not evaluating abilities of these detection dogs to detect their target scents. Because all dogs were certified, many with confirmed deployment finds their ability to correctly locate target scent was considered to be previously established. Therefore, in order to evaluate outcomes solely based on handler beliefs and expectations, this study was designed so that any alert issued would be a “false” alert; that is, there was no target scent present in any searches conducted for the purposes of this study.

Materials and methods

Handler/dog teams

A total of 18 handler/detection dog teams, recruited through word-of-mouth from multiple agencies, participated in this study. These teams were certified by a law enforcement agency for either drug detection (n = 13), explosives detection (n = 3), or both drug and explosives detection (n = 2). Demographic details of teams, including dog age, dog breed, dog years of detection experience and handler years of detection experience are presented in Table 1. Upon detection of target scent, all explosives dogs, both drug/explosives dogs and one drug detection dog were trained to issue a passive alert; that is, the dog would sit at the location of target scent detection. One drug detection dog was trained to issue a passive–active alert (sitting and barking), and all remaining drug dogs were trained to issue an active alert (barking) upon detection of target scent. All drug detection teams and two teams trained to find explosives had successfully identified their target scents in law enforcement deployment situations. Additional demographic information collected included handler years of experience handling detection dogs, dog years of scent detection experience, dog age and handler-reported breed of dog. In order to maintain confidentiality, and so that individual teams could not be identified through demographic information, these data were collected anonymously and cannot be linked to any performance data. Due to subject availability, this study was completed across 2 days, with seven teams completing the experiment on the first day, and the remaining 11 teams completing the experiment on the second day.

Table 1 Demographic data, n = 18 dog/handler teams

Procedures

The experimental paradigm in this study was based on a paradigm previously applied to evaluate response conflict in disaster search dogs (Lit and Crawford 2006). Handlers conduct a series of short searches for their target scent across different search scenarios, each representing a different experimental condition. In the current study, there was no target scent present, so that any alert identified by handlers was considered a false alert.

Handler beliefs were influenced either by verbally communicating to the handlers that a specific marker was an indicator of scent location (i.e., human influence), by encouraging dogs to display unusual interest in a specific location with a decoy scent (i.e., dog influence), or by a specific marker that actually indicated the location of a decoy scent (combined human and dog influence). A 4-way single factor experimental design was used to test effects of these influences on handler beliefs. The independent variable was search condition, a within-subjects variable with four levels:

  1. 1.

    NULL Unmodified.

  2. 2.

    MARKED NULL A piece of 8–1/2” × 11” red construction paper was taped to the door of a cabinet.

  3. 3.

    UNMARKED DECOY Two Slim-Jim sausages (removed from their wrappers and stored with their wrappers in an unsealed plastic bag) and a new tennis ball were hidden in the bottom of a pot and placed in a metal cabinet with the doors closed.

  4. 4.

    MARKED DECOY Two Slim-Jim sausages (removed from their wrappers and stored with their wrappers in an unsealed plastic bag) and a new tennis ball were hidden in a covered metal electric fryer, which was marked with a piece of red construction paper taped to the outside of the fryer. To minimize the possibility that decoy scents in UNMARKED DECOY and MARKED DECOY were not equally detectable and to encourage dog interest in the decoy scents, the sausages were rubbed along the outside of the cabinet (UNMARKED DECOY) and the electric fryer (MARKED DECOY).

Search conditions were four rooms within a church that had not previously been used for detection dog training purposes. Each room was approximately 30–40 m2 and contained cabinets, tables and chairs and art supplies. Each condition was identified only as A, B, C or D, indicated by a paper taped on the outside of the door of each room. The experimenter did not touch any items around the rooms, except to place the decoy scents and/or paper markers. To avoid contamination of paper markers with decoy scents, paper markers were placed prior to placement of decoy scents. In order to maintain the belief that the experimenter was setting out target scents in each condition, at the beginning of each testing day, the experimenter carried a metal box containing 12 half-ounce samples of marijuana triple bagged in sealed plastic bags, and a canvas bag containing 12 half-ounce samples of gunpowder triple bagged in sealed plastic bags. Upon entering each condition, the experimenter immediately set these containers down by the door. The experimenter did not handle the scents, and the containers were never opened inside the church. Decoy scents and paper markers were never in contact with these containers and were kept in a separate briefcase carried by the experimenter.

Dog/handler teams completed two searches (maximum 5 min each) in each of the four search areas, for a total of eight trials (“runs”) per team. Handlers were provided with a small card containing their assigned sequences of their eight runs, randomly counterbalanced across participants and search areas. Additional written and verbal instructions were provided to handlers that each condition might contain up to three target scents and that target scent markers consisting of a red piece of construction paper would be present in two conditions. No information was provided about the decoy scent.

Each condition had a single observer present. Prior to each search, handlers would indicate to the observer whether their dog was a drug or explosives dog and whether their dog issued a passive or active alert. When a handler “called an alert,” that is, confirmed that the dog had found a target scent location and was issuing its trained operant response, the observer would record time of alert and alert location specified by the handler. In marked conditions, if handlers called alerts on the location marked by the paper, observers would record an M to reflect this. Observers recorded alerts as called by handlers and did not evaluate validity of alerts. The same rooms were used for both days of testing. Decoy scents and markers were removed at the end of the first day of testing, and identical but previously unused decoy scents and markers were used for the second day of testing.

This study was double-blind. Neither handler/dog teams nor observers were aware of the conditions of each search area. Because the study was completed across 2 days and we did not want to jeopardize the double-blind nature of this study, all handlers were debriefed and told about the contents of each condition upon the completion of the second day of testing. The experimenter (L. Lit) was the only person present who was aware of the conditions of each search area.

Dependent variables were total number of alerts issued by each dog as reported by handlers in each search area. The correct score for each search area was 0. All alerts were false alerts.

The Institutional Review Board and Animal Care and Use Committee at the University of California at Davis approved this study, and all participants provided written consent.

Statistical analyses

Data were analyzed using SPSS Version 17.0.1. All analyses used a significance threshold of α < 0.05 (two-tailed). An omnibus mixed ANOVA was conducted to evaluate effects of day of testing (between groups) and condition (repeated measures) on number of alerts. To evaluate effects of handler influence and dog influence, data were also analyzed as a repeated measures 2 × 2 ANOVA [handler influence (yes/no) and dog influence (yes/no)]. Paired t tests were used to compare alerts between first and second runs of each condition. A chi-squared goodness of fit test compared clean runs (runs with no alerts) in unmarked and marked conditions. Within the MARKED NULL, UNMARKED DECOY and MARKED DECOY conditions, a log likelihood analysis was used to compare runs for which (1) alerts included either a marker or the unmarked decoy scent, (2) alerts did not include the marker or unmarked decoy scent and (3) no alerts were issued, followed by chi-squared goodness of fit tests to compare distribution of these within conditions.

Results

In order to evaluate effects of handler beliefs and expectation on detection dog performance, this study measured performance of 18 handler/dog teams in four separate search areas (NULL, MARKED NULL, UNMARKED DECOY, MARKED DECOY, described in “Materials and methods”). Each team ran each search area twice, for a total of 36 runs per condition (2 runs/team × 18 teams) and an overall total of 144 separate runs (4 search areas × 2 runs/team/area × 18 teams) (Fig. 1).

Fig. 1
figure 1

Alerts for each team across each condition for Run 1 (light bars; n = 18/condition) and Run 2 (dark bars; n = 18/condition)

Day of testing and condition group differences

Overall, because multiple alerts per team within a condition were possible, there were a total of 225 alerts issued. There were 21 (15%) clean runs and 123 (85%) runs with one or more alerts. The omnibus mixed ANOVA using the model “number of alerts = day of testing (between groups) + condition (within-subjects) + [day of testing * condition]” revealed no difference in mean alerts between teams running on the first and second days, F(1, 16) = 0.94, P = 0.35; no difference in mean alerts across conditions, F(3,48) = 0.09, P = 0.97; and no interaction, F(3, 48) = 0.63, P = 0.60. Data from both days were subsequently combined for further analysis. The repeated measures 2 × 2 factorial ANOVA found no main effect of human influence, F(1, 17) = 0.06, P = 0.81; no main effect of dog influence, F(1, 17) = 0.01, P = 0.93; and no interactions between human influence and dog influence, F(1, 17) = 0.01, P = 0.94.

First and second run differences

Within each condition, there was no difference in mean alerts between the first and second runs, except for NULL, where there were more alerts on the second run compared with the first run (paired t[17] = −2.83, P = 0.01).

Effect of marker on clean runs

Distribution of clean runs differed across unmarked and marked areas. There were more clean runs in unmarked areas (NULL and UNMARKED DECOY combined) (n = 15) than in marked areas (MARKED NULL and MARKED DECOY combined) (n = 6), Χ 2[1, 21] = 3.86, P = 0.05. In contrast, distribution of clean runs was not different across runs with and without decoy scent (NULL and MARKED NULL combined, n = 11, compared with UNMARKED DECOY and MARKED DECOY combined, n = 10), Χ 2[1, 21] = 0.05, P = 0.827.

Human and dog influences on alert locations

Alert locations in conditions marked with paper (MARKED NULL), containing decoy scent (UNMARKED DECOY) and containing decoy scent marked with paper (MARKED DECOY) were compared to evaluate differences of human influence on handler beliefs and dog influence on handler beliefs. Runs were grouped according to whether any one of the alerts in that run (1) included the marker and/or decoy scent; (2) did not include the marker and/or decoy scent; or (3) the run was clean (no alerts). These groups were dependent on condition, log likelihood [4, 108] = 22.236, P < 0.001, Φ = 0.41 (Fig. 2). There were significantly more runs including alerts on the marker than either clean runs or runs not including alerts on the marker in both MARKED NULL (Χ 2[1, 36] = 21.78, P < 0.001) and MARKED DECOY (Χ 2[2, 36] = 36.5, P < 0.001) (Fig. 2). This was different than UNMARKED DECOY, where there were no differences between clean runs, runs with alerts on the decoy scent and runs not including alerts on the decoy scent (Χ 2[2, 36] = 4.67, P = 0.09) (Fig. 2). Conversely, comparing across conditions (black bars, Fig. 2), there were more runs with alerts on marked locations in MARKED NULL and MARKED DECOY than UNMARKED DECOY, although the differences were not significant when corrected for multiple comparisons (Fig. 2).

Fig. 2
figure 2

Runs within each condition (combined n = 36) with alerts including marker and/or decoy scent (black bars), not including marker and/or decoy scent (dark gray bars), or clean runs (light gray bars). Asterisks represent statistically significant differences between groups as shown by log likelihood (across all conditions) and chi-squared test (within conditions); ***P < 0.001; n.s. not significant

Trend analysis

Finally, counterbalancing run order across participants ensured that each participant ran conditions in a different order. To evaluate whether there was an effect of sequence order of runs on alerts, all runs were reordered to reflect the sequence in which participants completed the conditions. Trend analysis was performed relating condition order to the number of alerts per run. An analysis of the cubic component of trend was significant, F(1, 17) = 7.67, P = 0.01, η 2p  = 0.31, indicating that this trend accounted for over one-third of the variance in number of alerts per run (Fig. 3, solid line). This trend was consistent across both days of testing (Fig. 3, dotted and dashed lines).

Fig. 3
figure 3

Cubic trend for all teams (solid black line, n = 18) relating condition run order (ordered runs) to marginal means of alerts per run as shown by trend analysis, P = 0.01, η 2p  = 0.31. Trends for teams from first day (dashed line, n = 7) and second day (dotted line, n = 11) are also shown for comparative purposes

Discussion

The goals of this study were to (1) identify whether handler beliefs affect detection handler/dog team performance and (2) evaluate relative importance of dog versus human inputs on those beliefs. To test this, we influenced handler beliefs and evaluated subsequent handler/dog team performance according to handler-identified alerts. The overwhelming number of incorrect alerts identified across conditions confirms that handler beliefs affect performance. Further, the directed pattern of alerts in conditions containing a marker compared with the pattern of alerts in the condition with unmarked decoy scent suggests that human influence on handler beliefs affects alerts to a greater degree than dog influence on handler beliefs. That is, total number of alerts identified by handlers did not differ across conditions. However, distribution of these alerts did differ across conditions; more alerts were identified on target locations indicated by human suggestion (paper marker) than on locations indicated by increased dog interest (hidden sausage and tennis balls).

In light of written and verbalized instructions that “Each scenario may contain up to 3 of your target scents,” it was interesting that there were 12 runs with either four or five alerts (Fig. 1). It was unclear whether handlers did not attend to the instructions, did not remember the instructions or believed that there were more than three target scent sources in each condition.

There are two possible explanations for the large number of false alerts identified by handlers. Either (1) handlers were erroneously calling alerts on locations at which they believed target scent was located or (2) handler belief that scent was present affected their dogs’ alerting behavior so that dogs were alerting at locations indicated by handlers (that is, the Clever Hans effect).

In the event that handlers were indeed asserting dog alerts regardless of dog response (or lack thereof), there are two possible causes. The handlers’ beliefs that scent was present may have been sufficient motivation to identify alerts even when the handlers were clearly aware that the dog had not provided the trained alert response behavior. Alternatively, the handlers’ beliefs were sufficient to generate a form of confabulation. Broadly defined, confabulation refers to false beliefs that may be unrelated to actual experienced events (Bortolotti and Cox 2009). Information regarding prevalent events (events that are common and therefore of increased likelihood) makes events more self-relevant and increases beliefs in occurrence of such events (van Golde et al. 2010). Thus, the perceived likelihood that scent was present across conditions would have contributed to confidence in handler beliefs of scent and dog responses. Because other-generated suggestions influence beliefs and subsequent actions more strongly than self-generated suggestions (Pezdek et al. 2009), the experimenter-provided suggestion that target scent was present may have further contributed to this effect. However, the conclusion that handlers are asserting their dogs have alerted simply upon seeing the marked areas regardless of actual dog response does not account for the numerous additional alerts occurring in other areas. In addition, the experimenter was informed that three handlers admitted to overtly cueing their dogs to alert at the marked locations, suggesting that handlers would not call alerts unless and until they observe the dogs’ trained responses. Handlers are trained to recognize and reward specific behaviors of their dogs. The exhibition of an alert is an obvious and discrete behavior. Although data describing observer assessments were not collected, all observers were familiar with detection dog training and performance, and all observers were visibly surprised upon debrief (L. Lit, personal communication). Therefore, it is unlikely, although cannot be absolutely confirmed, that handlers called alerts on markers without seeing an appropriate behavior from the dog.

It may be more parsimonious to suggest that dogs respond not only to scent, but to additional cues issued by handlers as well. This is especially plausible since, in training, alerts are originally elicited through overt handler cueing. Cueing in initial training may include overt cues, verbal commands and physical prompting. Cues may also include more subtle unintentional cues given by handlers such as differences in handler proximity to the dog according to scent location, gaze and gesture cues, and postural cues.

Human cues that direct dog responses without formal training include pointing, nodding, head turning and gazing (reviewed in Reid 2009). While formal obedience training can enhance dogs’ use of human cues (McKinley and Sambrook 2000), type of training can differentially affect dogs’ human-directed communicative behaviors (Marshall-Pescini et al. 2009, 2008). Gazit et al. (2005) found diminished response when an area searched repeatedly was lacking target scent. While the proposed reason for their findings emphasized effects of context specificity on the detection dogs (Gazit et al. 2005), the current findings raise the possibility that at least some of the effects of Gazit et al. (2005) might have arisen due to handler beliefs that scent would not be present in that area, with subsequent attenuation of dog response.

Because the current study did not include videotape of handler/dog team performance, there is no way to identify which conclusion would be appropriate. Observer coding of dog behavior was not likely to improve the reliability of the data acquired because the double-blind study design had the potential for the observers to be subject to the same biases as the handlers. In fact, it is possible that the observers were subject to greater biases than the handlers, since they were able to observe every dog twice. Therefore, observer coding would have been subject to the same possible explanations as the handlers, and further subject to question according to level of observer experience with working dogs. Future studies should directly explore underlying factors responsible for the false alerts as this will improve development of effective remedies to optimize performance.

Dogs can learn to respond to human gestures very rapidly (Bentosela et al. 2008; Elgier et al. 2009; Udell et al. 2008). Thus, it is tempting to speculate that the large number of false alerts resulted from reinforcement of dogs for false alerts received in earlier conditions. However, the pattern of alerts, consistent across days of testing (Fig. 3), suggests that alerts did not reflect a simple learning effect. This is supported by prior studies of human–dog social cognitive interactions demonstrating no clear learning effect when comparing early with later trials (Hare et al. 2002; Riedel et al. 2008).

When considering alternative explanations for the incorrect responses, it is further possible that some alerts resulted from target scent contamination during initial setup of conditions. This is unlikely, given the emphasis of alerts toward marked sites, particularly when considering that the pattern of alerts was modified by human influence. The array of alert locations (Table 2) also does not support this explanation, notably because no dogs alerted on or around the doors where the scent containers had briefly been placed. Moreover, detection dogs are trained to identify scent source rather than scattered residual scent. For example, dogs trained to alert on gunpowder are not expected to alert in an airport area simply because an armed officer passes through. The significant trend (Fig. 3) further suggests that a temporal component contributed to the number of alerts under these experiments.

Table 2 Alert locations and alert frequencies (#) in each location for all scenarios

It is possible, although also unlikely, that all objects in the room smelled like the dogs’ target scents. Because these were rooms in a church building that had not previously been used for detection dog training, it was also unlikely that there were explosives or drugs that had been stored within the testing rooms. Some handlers suggested the possibility that dogs were following previous dogs and alerting at locations in which these dogs had salivated or otherwise left trace evidence of their presence. This would not explain the difference in patterns of alerts between marked and unmarked conditions or the variation in alert locations across all conditions. This would also be unlikely given the extensive training and certification processes required of these teams.

It is important to emphasize that this study did not evaluate performance of dogs when presented with scent. Handler-dog teams undergo substantial training and rigorous certification prior to deployment; all teams included in this study confirmed prior successful finds during active deployment. This study only considered number of alerts under the artificially manipulated condition of handler belief of scent when in fact no scent was present.

In conclusion, these findings confirm that handler beliefs affect working dog outcomes, and human indication of scent location affects distribution of alerts more than dog interest in a particular location. These findings emphasize the importance of understanding both human and human–dog social cognitive factors in applied situations.