The 32 pairs that took part in this study were normal pet dogs of various breeds and their owners. Two pairs had to be excluded during testing because of health problems of the dog, leading to a final sample size of 30 dog–owner pairs. Of these dogs, 18 were female and 12 were male (mean age 5.8 years, range 2–13 years), whereas 24 of the owners were female and 6 were male (for detailed information about pairs see Online Resource 1 and 2). Dogs were recruited from the DogStudies database of the Max Planck Institute for the Science of Human History in Jena. Selection criteria for dogs were high toy motivation and the ability to fetch inert objects (which was additionally tested explicitly; see Sect. Pretest). All dogs were healthy individuals with no known sight or hearing problems and no known aggression towards humans.
Materials and set-up
In the test room, four small boxes (8 cm × 15 cm × 20 cm) were attached to the windowsills which constituted the four possible hiding places. They were numbered from 1 to 4, so the owners could identify each box for their choices in the test. In the close condition, boxes were put up 17 cm apart from each other, while in the far condition, boxes were positioned 90 cm apart from each other (Fig. 1). For the owners, a chair with an accompanying questionnaire was placed in the middle of the room on which owners had to check the supposed target box (for detailed information about materials and set-up see Online Resource 1).
Each dog–owner pair visited the laboratory twice within 1 week. Only one condition, comprising four trials, was tested per session (i.e., per day) with an inter-trial break of ~10 min. Conducting all eight trials in one session was decided to be too demanding for most dogs. While owners were instructed for the test, dogs could freely explore the test room.
Before the actual test, a pretest was conducted. The owner was instructed to sit on the chair facing the dog sitting in ~2 m distance. If necessary, one experimenter held the dog by their collar. The dog’s favorite toy was now put between the two parties and the owner was instructed to call the dog to bring the toy. Owners were told to do so in a natural manner, as they would in a typical playing or training context since the aim of this study was to investigate the typical communication of the pairs. If the dog did as requested, the pair was allowed to play for a short amount of time. The exact duration of play varied between subjects because of different play styles but was kept approximately constant within subjects to avoid unintended differential rewarding (e.g., we either kept constant how often the toy was thrown and fetched or, if pairs preferred other play styles like tug-of-war, the duration of play time was kept constant). If a dog failed to bring the toy right away it had one more chance to do the task correctly before being excluded from the study. All dogs successfully completed the pretest. This procedure was repeated at the beginning of each new trial to re-establish the play context.
Immediately after the short play session, the owner handed over the toy to experimenter 1 (E1). Experimenter 2 (E2) left the room with the owner through door 1 (see Fig. 1).Footnote 1 Now, E1 first gently waved the toy in front of the dog’s face to get its attention (this was repeated whenever the dog averted its gaze from the toy, accompanied by calling the dog by its name). E1 then put the toy into the target box and closed it. Immediately, the box was reopened and this procedure was repeated one more time to assure that the dog really processed where the toy had been hidden. Meanwhile, E2 guided the owner around the room to door 2 (see Fig. 1) and waited for the signal from E1 which was given as soon as E1 had closed the box and left the room. E2 now opened the door and let the owner inside the room.
We also wanted to investigate the effect of behavioral restrictions on communication. Previous research has shown that a standardized, but nevertheless unnatural setting, can inhibit dogs’ natural behavior and conceal their actual abilities (e.g., Bräuer et al. 2013). Therefore, the following procedure was divided into two phases with differing degrees of standardization (since this manipulation hardly yielded any effects, most results will not be discussed here and can be viewed in Online Resource 1).
Phase 1: The owner entered the room and directly sat down on the chair. During phase 1, owners were not allowed to stand up and walk around. Other than that, no constraints were put on communication between dogs and owners. After 1 min had elapsed, one of the experimenters signaled the owner from outside the room to fill in the questionnaire which also marked the end of phase 1. The owner now had to check the box in which he or she assumed the toy is located (i.e., make their choice for phase 1).
Phase 2: As soon as the owners had checked the questionnaire, they were allowed to stand up and move around freely within the test room. Here, the only communicational constraint was that owners were not allowed to open the boxes unless they wanted to make a choice. Phase 2 lasted a maximum of 1 min, hence, in contrast to phase 1, owners had the possibility to make their choice before 1 min had passed, even directly after filling in the questionnaire without further interacting with their dog. However, if owners had not opened a box after 1 min, experimenters prompted them by calling “Wählen!” (German for “Choose!”). The box that was opened in phase 2 could be different from the choice made in phase 1. If the pair chose correctly in phase 2, they could play together as a reward (again duration of play varied across but not within subjects). If the wrong box was opened, the experimenters would enter the room and open the correct box to show the toy to both the dog and the owner, but the pair was not allowed to play.Footnote 2 Choices for phase 2 were coded live by the experimenters and back-checked from tape afterwards.
In contrast to previous studies, the current set-up only prevented smaller dogs from accessing the boxes. Consequently, some dogs retrieved the toy on their own.Footnote 3 If dogs retrieved the toy already in phase 1, before owners could check the questionnaire, the respective box was taken as choice for both phases because in this case it was unambiguous for the owner which box was the correct one. If owners did not check a box on the questionnaire and the dog did not retrieve the toy, the choice for phase 1 was coded as 0 and subsequently as incorrect choice because it neither overlapped with the target box nor did it indicate a correct inference from the dog’s behavior. In between trials dogs had no access to their favorite toy or any other toys. Online Resource 3 displays a video of the procedure.
Order of conditions was counterbalanced across subjects. Order of boxes was semi-randomized across conditions, with the stipulations that the same box could not be target in two consecutive trials within a session and that each box had to be target twice for each dog. The number of the first box was counterbalanced across subjects. Due to excluded pairs and problems during the test, the final distribution is slightly uneven: Seven pairs started with box 1, ten with box 2, six with box 3 and seven with box 4.Footnote 4
All behaviors were coded using Solomon Coder software (Péter 2017) which was set-up with a sensitivity of 0.20 s. For dogs, seven different behaviors were coded: gazes directed at each of the boxes and the owner, movements directed at each of the boxes and the owner, time spent near each box, jumping/standing upright in front of each of the boxes, vocalizations, whether the dog opened the boxes and whether the dog retrieved the toy on its own.
For owners, the following behaviors were coded into one variable owner behavior: owners’ gazing at the dog, gazing at the boxes (i.e., one specific box or the general direction of the boxes), pointing at the boxes, nodding in the direction of the boxes, showing empty hands, shrugging, approaching the boxes, talking (any utterances by the owner, i.e., including laughing, sneezing, coughing) and calling the dog by its name (including obvious nickname versions of the dog’s name, e.g., Sue for Susi, but no other kinds of nicknames that were given, e.g., honey). This variable is very broadly defined, since, for an explorative analysis of the interaction of owner and dog, the variable should cover a wide range of possibly influential behaviors. (We also conducted analyses with owner behavior separated into non-verbal prompting, talking and calling the dog’s name which can be seen in Online Resource 1).
All dog- and owner-related variables were coded in terms of frequency, and time point relative to all other behaviors (both the dog’s and the owner’s), i.e., how often and when they happened. All behaviors that were necessary for the calculation of showings (see below) were additionally coded in terms of duration, i.e., when they started and when they ended relatively to all other behaviors. Solomon Coder provides a timetable of all behaviors (dog’s and owner’s) as output as well as automatically calculates frequencies and durations of variables.
To assess the inter-coder reliability, 20% of the videos (i.e., 6 pairs) were coded by a second observer, naïve to the hiding location and the purpose of the study. Agreement between the two coders was calculated using Spearman rank order correlation, and inter-coder reliability was assessed according to the limits proposed by Cicchetti (1994). Accordingly, mean inter-observer reliability was good for frequencies of gazes (r = 0.74), and excellent for durations of gazes (r = 0.82), frequencies (r = 0.78) and durations of the dog’s movements (r = 0.82), frequencies (r = 0.97) and durations (r = 0.96) of dogs spending time near each box, frequencies (r = 0.98) and durations (r = 0.99) of jumping/standing upright at the boxes as well as opening boxes (r = 0.92) (Spearman rank order correlation coefficients for each behavior per box can be seen in Online Resource 1). For dog vocalizations, coders reached good agreement for frequencies (rs = 0.74, p < 0.001) and excellent agreement for durations (rs = 0.77, p < 0.001). Lastly, inter-coder reliability was excellent for owner behavior (rs = 0.97, p < 0.001).
To specify showing behaviors, we generalized the definition for gaze alternation that Russell et al. (1997) initially used for chimpanzees and Miklósi et al. (2000) transferred to dogs, to include other showing behaviors as well: The directional component has to be followed directly and within two seconds by the attention-getting component or vice versa (i.e., order of components does not matter). Therefore, the above-mentioned behaviors were divided into directional components and attention-getting components (Miklósi et al. 2000). All 15 possible combinations of these components form the showing behaviors analysed in this study (see Table 1).
Since the initial definition focused on gaze alternation (Miklósi et al. 2000; Russell et al. 1997), it stated that the two components have to occur in succession. In this study, however, the two components could also occur simultaneously (e.g., spending time near a box while gazing at the owner). Therefore, both alternations and (partial or complete) overlaps of the above-mentioned behaviors were defined as showing. Showings were calculated based on the timetables provided by Solomon Coder using a script programmed with Python (further details regarding behavioral coding, flowcharts. depicting the employed algorithm and an example of the generated output can be seen in Online Resource 1).
For analyses regarding showing effort, low-effort showing was defined as the least effortful showing strategy: gazing at a box plus gazing at the owner (i.e., gaze alternation). Similarly, high-effort showings were defined as all showings involving the most effortful behavioral component: jumping/standing upright plus any of the three attention-getting components (i.e., gazing at the owner, moving towards the owner or vocalizing). However, since many dogs did not exhibit jumping/standing upright at all, the second most effortful showing strategy was added as well: moving towards a box plus moving towards the owner.
All analyses were done with R software (version 3.6.3; R Core Team 2020), the code can be viewed in Online Resource 4. In line with the Cumming’s propositions of “new statistics” (Cumming 2014, p. 7) and the Publication Manual of the American Psychological Association (APA 2010), raw estimations and effect sizes will be reported and discussed independent of, and in addition to, their significance status (α = 0.05) and with regard to their respective confidence intervals. Raw data can be found in Online Resource 2. Results of analyses adjusted for outliers are displayed in Online Resource 1.
Overall success, i.e., whether pairs chose correctly or not, was investigated with a one-sample t tests against chance (25%) for each phase since two different measures of performance were used in phase 1 and 2 (i.e., questionnaire versus opening box). Two-sided, paired t tests were calculated to assess differences in performance between phases and differences in frequencies of the different showing types between conditions.
For all other effects, we applied a model comparison approach. Models were compared based on their respective Akaike information criterion (AIC; Akaike 1974) value. The respective model with the smallest AIC was chosen as final model, and to test for significant differences between the models a Chi-square test was applied (results of all calculated models and comparisons can be found in Online Resource 1). Whenever the program responded a warning of nonconvergence, the respective model was optimized using the BOBYQA algorithm (Powell 2009). For each analysis, rows including missing values for a variable of interest were excluded. According to the study design, session, trial and phase were always treated as one nested variable (i.e., phases were nested within trials which were nested within sessions) which is henceforth referred to as time.
To investigate the effects on success, generalized linear mixed-effects models (GLMM) with a binomial distribution were calculated using the R package lme4 (version 1.1–19; Bates et al. 2018). Since the outcome variable was binary (i.e., correct vs. incorrect), a logit transformation was applied, i.e., the dependent variable for models was the probability of pairs choosing correctly rather than incorrectly.
For the investigation of effects on the proportion of correct showing and showing effort, linear mixed-effects models (LMM) were calculated. For this, we used the R package lme4 (version 1.1–19; Bates et al., 2018), and p values were calculated using the lmerTest package (version 3.0–1; Kuznetsova et al. 2017). Showing effort was defined as the frequency of high-effort showings relative to the sum of frequencies of high- and low-effort showings, i.e., the proportion of high-effort showing. Hence, higher values for this variable indicate higher showing effort.