‘Who’s a good boy?!’ Dogs prefer naturalistic dog-directed speech

Benjamin, Alex; Slocombe, Katie

doi:10.1007/s10071-018-1172-4

‘Who’s a good boy?!’ Dogs prefer naturalistic dog-directed speech

Original Paper
Open access
Published: 02 March 2018

Volume 21, pages 353–364, (2018)
Cite this article

Download PDF

You have full access to this open access article

Animal Cognition Aims and scope Submit manuscript

‘Who’s a good boy?!’ Dogs prefer naturalistic dog-directed speech

Download PDF

Alex Benjamin¹ &
Katie Slocombe¹

78k Accesses
23 Citations
788 Altmetric
134 Mentions
Explore all metrics

Abstract

Infant-directed speech (IDS) is a special speech register thought to aid language acquisition and improve affiliation in human infants. Although IDS shares some of its properties with dog-directed speech (DDS), it is unclear whether the production of DDS is functional, or simply an overgeneralisation of IDS within Western cultures. One recent study found that, while puppies attended more to a script read with DDS compared with adult-directed speech (ADS), adult dogs displayed no preference. In contrast, using naturalistic speech and a more ecologically valid set-up, we found that adult dogs attended to and showed more affiliative behaviour towards a speaker of DDS than of ADS. To explore whether this preference for DDS was modulated by the dog-specific words typically used in DDS, the acoustic features (prosody) of DDS or a combination of the two, we conducted a second experiment. Here the stimuli from experiment 1 were produced with reversed prosody, meaning the prosody and content of ADS and DDS were mismatched. The results revealed no significant effect of speech type, or content, suggesting that it is maybe the combination of the acoustic properties and the dog-related content of DDS that modulates the preference shown for naturalistic DDS. Overall, the results of this study suggest that naturalistic DDS, comprising of both dog-directed prosody and dog-relevant content words, improves dogs’ attention and may strengthen the affiliative bond between humans and their pets.

Differential effects of speech situations on mothers’ and fathers’ infant-directed and dog-directed speech: An acoustic analysis

Article Open access 23 October 2017

Pet-directed speech draws adult dogs’ attention more efficiently than Adult-directed speech

Article Open access 10 July 2017

Dog brains are sensitive to infant- and dog-directed prosody

Article Open access 18 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

When talking to an infant, adults use a special speech register characterised by elevated fundamental frequency (pitch), exaggerated intonation contours and high affect (Burnham et al. 2002). This phenomenon is evident across languages including English, Russian, Swedish and Japanese (Kuhl et al. 1997; Andruski et al. 1999). It is thought that infant-directed speech (IDS) facilitates infants’ linguistic development by amplifying the phonetic characteristics of native language vowels (Kuhl et al. 1997), allows infants’ to select appropriate social partners (Schachner and Hannon 2011) and increases social bonding between infant and caregiver (Kaplan et al. 1995).

In the same way that IDS is produced automatically when talking to infants, humans in Western cultures also produce a special speech register when talking to their pets. This pet-directed speech (PDS) shares some of the acoustic features of IDS including elevated pitch and exaggerated affect compared to adult-directed speech (ADS) (Burnham et al. 1998). It is possible that pitch is elevated in IDS and PDS in order to attract the listener’s attention, while affect is elevated to meet listener’s emotional needs, possibly motivating affiliative interaction with the speaker. One crucial feature not shared between IDS and PDS and only found in IDS is the hyperarticulation of vowels (Burnham et al. 1998). Hyperarticulation of vowels may be the aspect of IDS that assists spoken language acquisition (Kuhl et al. 1997) and the speaker’s hyperarticulation may be mediated by the perceived linguistic capacity of the receiver; evidence that supports this view is provided by a study that compared speech produced to dogs, parrots and infants. Speakers seem to hyperarticulate their vowels most with prelinguistic human infants, followed by parrots, with little evidence of this when addressing dogs, who in contrast to parrots have no ability to produce speech (Xu et al. 2013).

It is evident that speakers are sensitive to their audience in terms of acoustic preference, emotional needs and linguistic potential; however, in order to understand the function of special speech registers, it is crucial to understand how they affect the receiver. Human infants show a preference for IDS from a very early age (Kaplan et al. 1995), with Cooper and Aslin (1990) finding preferences for IDS over ADS in 2-day-old infants. Werker and McLeod (1989) measured affective responsiveness to ADS and IDS in 4–5- and 7–9-month-old infants. Two trained raters judged the affective responsiveness of infants, comprising of how much they thought the infant was trying to interact with the speaker, how interested they appeared and the valence of the infant’s emotional state. They found that infants of both age groups showed greater affective responsiveness to IDS than to ADS. They also found that when presented with video recordings of infants listening to speech, unfamiliar observers rated the infants more ‘appealing’ when the infants were listening to IDS than when they were listening to ADS. This indicates that the use of IDS may facilitate the development of an emotional bond between adults and infants. In contrast to IDS, there has been very little research into the effect of PDS on receivers, meaning that it is currently unclear whether PDS is a non-functional overgeneralisation of IDS in Western cultures where pets often have the status of infants or whether it functions to gain pets’ attention and strengthen the affiliative bond between humans and their pets.

Ben-Aderet et al. (2017) were the first to investigate both the production of dog-directed speech (DDS) and the behavioural response to DDS in puppies, adult dogs and older dogs. Acoustic analysis of DDS confirmed previous descriptions of the acoustic structure of this speech register, where DDS was higher in pitch, with more pitch variation over time, and higher harmonicity than ADS. They also showed that human adults produced DDS to dogs of all ages. Crucially, Ben-Aderet et al. (2017) then conducted playback experiments using the DDS and ADS recorded in the first part of the study to test dog responses to these types of speech. Stimuli consisted of repetitions of the phrase ‘Hi! Hello cutie! Who’s a good boy? Come here! Good Boy! Yes! Come here sweetie pie! What a good boy!’ in dog- and adult-directed prosody. Speech was played from a loudspeaker in the corner of the room, with no human near the source of the sound and various measures of dogs’ attention to and approach of the loudspeaker were combined into a composite behavioural response measure. They found that puppies showed a higher behavioural response to DDS than for ADS, but this preference decreased as a function of age. The authors conclude that puppies are highly reactive to DDS and that pitch is a key feature in modulating this preference, but that adult dogs do not react differentially to DDS and ADS. They argue that DDS may have a functional value in puppies, but not adult dogs, and therefore, the use of DDS with adult dogs may simply be a ‘spontaneous attempt to facilitate interactions with non-verbal listeners’ (Ben-Aderet et al. 2017, p. 1). It is, however, possible that alternative explanations of the null result with adult dogs exist. As Ben-Aderet et al. discuss, adult dogs may need additional cues (e.g. gestures) to respond to unfamiliar speakers. If DDS functions to facilitate social communication and interaction, it may only be relevant to attend to it when it comes from a human that can be attended to and socialised with. It is possible that if no human experimenter is present, adult dogs realise that there is no social benefit to reacting preferentially to any speech. Puppies, with little experience of the world, may not recognise this and therefore still responded to DDS in the absence of a feasible producer. While it is clear that puppies are more reactive to the prosody of DDS than adult dogs, further testing with a human speaker present during stimulus presentation is required in order to rigorously test whether adult dogs really are insensitive to DDS. We therefore aimed to test the possible function of DDS with adult dogs in a more ecologically valid setting where attention and affiliation towards the individuals who produced DDS could be directly measured. Dogs were presented with two experimenters with audio speakers on their laps that played naturalistic DDS or ADS (differing in both prosody and content), and we measured the dogs’ attention to each individual during speech and then proximity to the experimenters once dogs were given the opportunity to approach them after the speech finished. We predicted that if DDS is functional for adult dogs, in experiment 1 they should attend more to DDS than ADS, and when given the opportunity to approach the experimenters, they should choose to spend more time in proximity to the individual who produced DDS. We then ran a second experiment to investigate whether content or prosody was driving any preferences for naturalistic DDS. Here we presented content-mismatched stimuli (e.g. adult content with dog prosody and vice versa) and predicted that if the content of naturalistic DDS was driving preferences, dogs should attend to and spend more time near the individual producing dog-relevant content. If, on the other hand, the prosody of DDS was driving preferences, as was the case for the puppies studied by Ben-Aderet et al. (2017), dogs should attend to and spend more time near the individual producing dog-directed prosody. Finally if preferences for naturalistic DDS are driven by both content and prosody, or result from the combination of dog-relevant content and DDS prosody, we expect to find no significant preference for either of the mismatched stimuli.

Experiment 1

As we were interested in naturalistic dog- and adult-directed speech, the stimuli used in this experiment varied in both content and prosody. The stimuli were ‘matched’ in prosody and content such that DDS consisted of dog-relevant content and dog-directed prosody, and ADS consisted of adult-relevant content and adult-directed prosody.

Methods

Study site and participants

Dogs were recruited from Redhouse Boarding Kennels, York, with permission from the kennel owner. In experiment 1, 37 dogs took part (17 females and 20 males; mean age 6 years ± 3.86) in this study between January and May 2014. See supplementary material for more detailed age, gender and breed information (Table S1). Where dogs have been removed from various parts of the analysis due to interruptions, equipment failures or safety reasons, the details and N for each analysis are given.

Stimuli

Stimuli were recorded as uncompressed WAV files using a Marantz PMD661 solid-state recorder from the two human female experimenters (aged 20–21). The recordings from experimenter A were always presented through experimenter A’s speaker (and the same for experimenter B), ensuring congruency of speech with physical characteristics. Although only presenting speech from the experimenters meant that multiple dogs heard the same recordings, it ensured that the stimuli were congruous with the physical characteristics of the experimenters (age, gender, height), thus maximising ecological validity and removing the possibility of looking time measures being affected by incongruity of the stimuli. DDS was chosen from a sample of recorded naturalistic interactions with a friendly dog (irish setter). ADS was chosen from a sample of naturalistic adult–adult interactions that occurred between the experimenters (see supplementary material for transcripts).

Two different segments of DDS and ADS for each experimenter were selected from the continuous speech recordings (one 10-s segment and one 15-s segment). The amplitude of the speech in each segment was modified using Raven Pro (version 1.4), so that the mean RMS amplitude of each segment was equalised at approximately 3000. For each trial, the DDS track of one experimenter was paired with the ADS track of another. Figure 1 illustrates the stimulus timeline.

Design

This experiment used a within-subject design, where all dogs heard both DDS and ADS. All dogs heard simultaneous speech first, followed by DDS only and ADS only. The order of DDS and ADS only segments was counterbalanced across trials. Simultaneous was played again at the end, to eliminate the possibility that dogs would approach the individual who spoke last. We also counterbalanced the identity of the DDS speaker (experimenter 1 or 2) and the location from which DDS was played (left/right) across trials.

Procedure

Equipment was set up as illustrated in Fig. 2. The speakers were equalised to 70 dB at 1 m away with white noise using a sound pressure meter, to ensure that that speech broadcast from each speaker would be equal in volume. Experimenters 1 and 2 then left the room via door 2. The third experimenter (handler) retrieved the dog from its kennel and entered the experimental room through door 1. The dog was allowed to explore the experimental room for 1 min (to habituate to the environment in order to reduce distraction during the trial), before being put back on a lead and taken into a waiting room via door 3. Experimenters 1 and 2 entered through door 2 and sat in the chairs. The handler entered with the dog. Once the dog was in position, the stimulus was played.

For the duration of the stimulus, the experimenters sat still to ensure the dogs were not exposed to any body language cues. The experimenters did not attempt to move their mouths simulating the speech. Instead, the experimenters placed one hand covering their mouths so that the dog could not see their lips. They also maintained neutral expressions with eyes directed towards the dog to ensure the dog did not receive differential facial cues from the experimenters.

While the stimulus played, the dog was kept on a short lead to ensure it remained within camera visibility, while still allowing the dog to move around within 1 m of the handler. The handler did not interact with the dog and looked at the ground throughout. At the end of the stimulus phase, the lead was removed and the dog was allowed to explore freely for 1 min and approach experimenters 1 and 2 if they wished. The dog received no interaction from any experimenter.

Video coding

Video recordings of each session were analysed, and during the stimulus presentation, time spent looking towards DDS and ADS was recorded as measured by head direction. During the 1-min off-lead period following the stimulus presentation, time spent in proximity to DDS and ADS speakers was recorded, as measured by the position of the dog’s head in the 1.1 m² area surrounding the speaker (see Fig. 2).

The period after the dog entered the room, but before the stimulus began was used as a control period (mean duration 4.56 ± 2.14 s). Looking times during this phase were recorded in order to establish whether the dog displayed any preference for one experimenter in particular, or one location (left or right) that may have influenced looking times in the experiment.

Interobserver reliability

The primary observer (AB) coded 100% of videos. For experiment 1, two trained observers each coded 30% of videos (N = 24/36 trials total) and measured looking time at each speaker in each section of the stimulus (control silence, simultaneous 1, DDS only, ADS only, simultaneous 2; N = 10 measurements) and time in proximity to each speaker in the minute post-stimulus presentation (N = 2 measurements). The primary coder had high agreement with the two secondary coders, and there was also high agreement between the two secondary coders across all measurements (Spearman’s R > 0.90, p < 0.001 for all comparisons), indicating the videos had been coded reliably.

A third observer, who was blind to the hypotheses of the experiment, also coded 22% of the videos (N = 8/36 trials total) with the sound turned off so that they were unaware which speech type was heard by the dog. There was high agreement with the primary coder for looking time (R = 0.86, p < 0.001) and for proximity preference (R = 0.96, p < 0.001).

Statistical analysis

All data were analysed using IBM SPSS (version 24) with the significance level set at p < .050. Attentive and affiliative preference was evaluated using mixed ANOVAs with the fixed within-subject factor speech prosody (DDS/ADS), between-subject factors DDS identity (experimenter 1/experimenter 2) and DDS location (right/left). A single mixed ANOVA was conducted on the proximity to speakers in the minute post-stimulus presentation. For looking time, after the ANOVA on the total looking time had been completed (Table 1), separate ANOVAs were then run for each section of the stimulus (simultaneous; ASD only; DDS only). We applied a more conservative Bonferroni-corrected alpha level to the separate section analyses (p = 0.01) to correct for family-wise error that might have arisen from running multiple tests on the same data set. Finally, we ran an ANOVA with between-subject factors DDS identity (experimenter 1/experimenter 2) and DDS location (right/left) on proportion of looking times in the control period. All assumptions of these parametric tests were tested and met.

Table 1 Results of a between-subject ANOVA (df = 1, 29) on looking proportions in the control period and a mixed ANOVA (df = 1, 29) comparing main effects and interactions for looking times towards content-matched DDS and ADS

Full size table

Results

Looking preference

For this analysis, four subjects were removed due to equipment failure (N = 33). During control silence, there was no significant main effect of Identity or Location, indicating that dogs did not display any preference for one particular experimenter or speaker location (Table 1). Dogs displayed a significant preference for DDS across the whole trial (Fig. 3; Table 1) and during each phase that contained DDS (Fig. 3; Table S3). Dogs tended to look more towards ADS when this was the only stimulus available; however, this preference was non-significant (Fig. 3). No significant interactions with speaker identity or location were found for total time (Table 1) or separate segments of the stimuli (simultaneous, DDS only, ADS only) (Supplementary Material: Table S3).

Proximity preference

For this analysis, three dogs were removed from the data set due to equipment failure or because the dog had to be kept on a lead, resulting in an N = 34. A mixed ANOVA revealed that after hearing content-matched stimuli, dogs spent significantly more time in close proximity to the DDS speaker than the ADS speaker (F (1, 30) = 5.54, p = 0.025; Fig. 4). No significant interactions with location or speaker identity were found (Table 2).

Table 2 Results of a mixed ANOVA with degrees of freedom (1, 30) comparing the time spent near DDS and ADS speakers for content-matched speech

Full size table

Discussion

This experiment showed that dogs display a behavioural preference for naturalistic DDS (matched in prosody and content) compared with ADS when presented in the presence of an associated human. Dogs, on average, spent more time looking towards a speaker of DDS compared with a speaker of ADS in all segments of the stimulus containing DDS and across the trial as a whole. We also found that when given the subsequent opportunity to interact with the speakers, dogs chose to spend more time in proximity with the DDS speaker, than the ADS speaker. Although the absolute differences in looking and proximity time were small and therefore their functional relevance may be questioned, we feel the substantial effect sizes obtained and the convergence of results across our behavioural measures indicates we have detected functionally relevant differences in behaviour. Overall, our results support the hypothesis that dogs display attentive and affiliative preferences for naturalistic DDS over ADS.

The results from the control period show no significant preference for a specific location, or speaker identity, indicating that the dogs had no a priori preference for looking at one experimenter or location. In line with this, no significant main effects of location or speaker identity, or interactions of identity, location and speech type were found.

Although our results show a robust preference for naturalistic DDS over ADS, as the stimuli in this experiment differed in both content and prosody, it is not possible to determine whether this effect is driven by dog-directed prosody or content, as these factors did not vary independently. Therefore, although this experiment clearly shows that dogs discriminate between and show a behavioural preference for naturalistic DDS over ADS, further investigation is required to determine the extent to which prosody and content are driving this preference.

Experiment 2

Experiment 2 was designed in order to examine whether content alone or prosody alone was sufficient for driving the preference found in experiment 1. In experiment 2, the content from experiment 1 was reproduced but with reversed prosody such that the dog-related content was spoken with the prosody of ADS and vice versa. For simplicity, in all cases, DDS refers to stimuli with dog-directed prosody (with either dog- or adult-related content) and ADS refers to stimuli with adult-directed prosody (with either adult- or dog-related content). In experiment 2, we presented dogs with content-mismatched DDS (dog-directed prosody with adult-related content) and content-mismatched ADS (adult-directed prosody with dog-related content).