Journal of Autism and Developmental Disorders

, Volume 42, Issue 4, pp 499–511

Acoustic and Perceptual Measurement of Expressive Prosody in High-Functioning Autism: Increased Pitch Range and What it Means to Listeners

Authors

    • School of Communication Sciences and DisordersMcGill University
  • Holly Shaw
    • School of Communication Sciences and DisordersMcGill University
Original paper

DOI: 10.1007/s10803-011-1264-3

Cite this article as:
Nadig, A. & Shaw, H. J Autism Dev Disord (2012) 42: 499. doi:10.1007/s10803-011-1264-3
  • 589 Views

Abstract

Are there consistent markers of atypical prosody in speakers with high functioning autism (HFA) compared to typically-developing speakers? We examined: (1) acoustic measurements of pitch range, mean pitch and speech rate in conversation, (2) perceptual ratings of conversation for these features and overall prosody, and (3) acoustic measurements of speech from a structured task. Increased pitch range was found in speakers with HFA during both conversation and structured communication. In global ratings listeners rated speakers with HFA as having atypical prosody. Although the HFA group demonstrated increased acoustic pitch range, listeners did not rate speakers with HFA as having increased pitch variation. We suggest that the quality of pitch variation used by speakers with HFA was non-conventional and thus not registered as such by listeners.

Keywords

High-functioning autismExpressive prosodyAcoustic measurementsPitch variabilityPerceptual judgments

Introduction

Prosody, the melody or supra-segmental aspects of speech, has a direct impact on social interaction and communication. By varying prosodic features such as pitch/fundamental frequency and speech rate, speakers can portray additional meaning about their emotional state and modulate their communicative intent. For instance, the sentence “I can’t wait until Friday” would be spoken with a higher pitch and faster rate if the speaker were excited rather than apprehensive and speaking ironically about upcoming events (Rockwell 2000). Disordered expressive prosody has long been considered a central feature of autism spectrum disorders (ASD) for verbal individuals; in fact, it appeared amongst the clinical features in the original accounts of ASD by Kanner (1943) and Asperger (1944). Currently, standard diagnostic tools of autism (Autism Diagnostic Interview, Revised, Rutter et al. 2003; Autism Diagnostic Observation Schedule, Lord et al. 1999) still include atypical expressive prosody as a feature of the disorder. Yet description of how prosody may differ in autism is incredibly broad, including deviations in rate, rhythm, volume, intonation, or lack of changes in register. This is likely due to the heterogeneity of presenting symptoms in ASD (Rice et al. 2005; Shriberg et al. 2001), but is also a consequence of the dearth of objective measurements of prosody in autism until very recently (see Diehl et al. 2009 for a review).

In addition to the heterogeneity of prosodic disturbances in ASD, it is unclear how universal some sort of prosodic atypicality is to verbal individuals with ASD. McCann et al. (2007) reported that all 31 of their participants with high functioning autism (HFA) demonstrated impairments in some area of prosodic functioning, as assessed by the Profiling Elements of Prosodic Systems-Children (PEPS-C, Peppé and McCann 2003), a clinician rating scale. In contrast, other studies have found considerable overlap in global prosodic features between HFA and comparison groups and report that only half of their sample with HFA displayed notable prosodic deficits (Simmons and Baltaxe 1975; Shriberg et al. 2001). Importantly, when these differences do exist they negatively impact others’ perceptions of the individual with autism. For instance, Van Bourgondien and Woods (1992) and Paul et al. (2005a) report that speech characteristics are primary contributors to others’ impressions of social oddness when interacting with high-functioning adults with autism. Similarly, Shriberg et al. (2001) note that small but perceptually noticeable prosodic characteristics can lead to unintended, negative impressions: high-pitched speech can give the impression of overbearing insistence, while very slow speech may give the impression that the speaker is condescending. Although many aspects of communication improve over time in individuals with HFA, residual prosodic difficulties often remain, yet they are not usually targeted in therapy (McCann et al. 2007; Paul et al. 2005a).

Consistent prosodic characteristics of conversational speech that contribute to perceptions of oddness, and their acoustic correlates, have yet to be identified. If they exist, they would have clear clinical significance in providing specific markers to focus speech assessment and intervention efforts. In particular we focus on atypicalities in intonation or pitch variation in school-age children with HFA. This feature has been described in different and contradictory ways in individuals with HFA, ranging from monotonous to exaggerated intonation (Schreibman et al. 1986; Van Lancker et al. 1989). This suggests that there may not be consistent prosodic patterns across individuals with HFA and that instead there may be a range of idiosyncratic differences. However, studies to date have repeatedly reported increased pitch range (e.g., difference between maximum and minimum pitch or fundamental frequency (F0) for a stretch of speech) or pitch variation (e.g., standard deviation in pitch during a segment of speech) at the group level in HFA that warrants closer investigation.

While early clinical reports described the speech of children with autism as monotone and/or mechanical (e.g. Kanner 1943), studies using perceptual ratings have reported the opposite atypicality: increased pitch variation in the speech of individuals with ASD. For instance, Simmons and Baltaxe (1975) reported that four of the seven adolescents with autism whom they assessed had excessive pitch variation, according to perceptual judgments. Acoustic measurements would complement and help to clarify the prosodic differences perceived by listeners. In preliminary data from Fosnot and Jun (1999), four children with autism (level of functioning not reported) demonstrated a wider pitch range and greater pitch variation than children who stuttered or age-matched typical controls when reading sentences or imitating sentences produced by others. Edelson et al. (2007) completed an acoustic analysis of the speech of 8- to 19-year-olds with either HFA or typical development, elicited in a task where they retold an emotional story. The HFA group demonstrated significantly higher pitch and a larger pitch range, as well as repeated use of simple pitch accents relative to the more varied and complex pitch accents observed in the comparison group. These preliminary findings suggest not only an objective increase in pitch range, but also a difference in the manner in which pitch variation is employed.

Similarly, in two studies with different samples Diehl et al. (2009) compared the speech of children and adolescents with HFA with a typically-developing comparison group matched on age, IQ, and language level in a task where narratives were produced based on a cartoon. The first study included 10- to 18-year-olds with HFA or typical development, the second involved 6- to 14-year-olds. Narratives were analyzed acoustically; pitch variation was measured by sampling the mean pitch in 250 ms time slices across the entire narrative, and then calculating each individual’s standard deviation in pitch across samples. In both studies, the HFA group demonstrated a significantly higher standard deviation in pitch than the typically developing group. These authors raised the question of whether their acoustic findings would be similar to human perceptual judgements, a question we address in the present study.

Very recently the acoustic analysis of autistic prosody has been conducted in languages other than English, where increases in pitch variability have been documented as well. Sharda et al. (2010) analyzed the speech of 4- to 10-year-old Hindi-English bilinguals, elicited in a picture description task. They found that a group of children with ASD demonstrated increased pitch range and pitch relative to age-matched controls. Focusing on more detailed phonetic analysis, Green and Tobin (2009) carried out ToBI intonation transcription and acoustic analyses to characterise intonation patterns of Hebrew-speaking school-age children with HFA and typically-developing controls who were matched on age and mean length of utterance. Speech was elicited by reading aloud and asking participants questions about themselves; data from these two situations was then combined rather than analyzed separately. The authors report an increased absolute pitch range (difference between maximum and minimum pitch) in the HFA group, which they found to include three different subgroups of speakers: those with narrow, wide, or typical pitch ranges, reflecting individual differences within the HFA group. In addition they found greater and repetitive use of high pitch accents in the HFA group, which they describe as creating fewer pitch transitions and a monotonous accent, whereas the TYP group exhibited a greater diversity of accent types leading to more “flexible sounding” prosody (p. 314). These findings corroborate Edelson et al. (2007)’s report of constrained and repetitive prosodic patterns employed by speakers with HFA.

Finally, Bonneh et al. (2011) tested a large sample of young Hebrew speakers (aged 4;0 to 6;5 years, 41 with ASD, 42 controls) who were matched on age but not language ability. In order to elicit a speech sample without the demands of social interaction they were asked to name 36 pictures, and did so in the span of approximately 60 s. Analyses conducted over 10 ms time slices revealed larger pitch variability in the ASD group, evidenced by both a larger pitch range and larger standard deviation in pitch. The authors describe this difference as “primarily derived from short periods of continuous changes (upward or downward sweeps) rather than random values” (Bonneh et al. 2011, p. 4). The control participants converged around a mean pitch between 200 and 300 Hz, whereas the ASD group was much more variable with respect to mean pitch. These findings document increased pitch variation in autism even during single word naming, where there is no need to signify communicative function at the utterance level.

Specific markers of disordered prosody in HFA need to be defined in order to create assessment tools and intervention protocols that effectively target prosodic differences. Given the findings reviewed above, increased pitch variability appears to be a common prosodic feature in HFA, across several samples and different languages. Prior findings indicate increased variability of pitch across brief segments of speech (using measures such as standard deviation of pitch) or increased pitch range (difference between maximum and minimum pitch) within speech samples in speakers with HFA. However the basis for this increased pitch variability has yet to be explored in detail. At first glance, especially in emotional narrative tasks, one would expect increased pitch range to reflect increased emotional modulation of speech, with the communicative function of telling a story dramatically. Interpreted as such, this finding runs counter to the stereotype of robotic or monotone speech in autism. However another possibility is that pitch variation is increased, as confirmed by acoustic measures, but in an atypical manner that is not effective in communicating emotional information to listeners. For instance, individuals with HFA may employ more extreme pitch variation but this may be placed arbitrarily in a phrase, rendering it non-meaningful to listeners. Moreover, marked pitch accents may reoccur due to repetitive speech patterns rather than due to the expressive modulation of speech, which should result in a diverse set of pitch accents. Indeed, the reports of Edelson et al. (2007) and Green and Tobin (2009) suggest the more repetitive use of a limited range of prosodic contours and pitch accents in speakers with HFA.

The current set of studies had three objectives. The first was to add to the nascent literature reporting objective, acoustic analysis of expressive prosody in HFA and a comparison group that is well-matched for language ability and age (c.f. Diehl et al. 2009). Such efforts have been called for by many researchers and will help establish if, and which, features of atypical prosody are consistent clinical markers of HFA. This was done by measuring the global prosodic characteristics of pitch range, mean pitch and speech rate. In particular, we focused on the difference of increased pitch range reported in studies to date, and investigated whether this would be observed in settings other than narrative production. Mean pitch is examined primarily as a baseline against which to judge differences in pitch range, as speakers (e.g. young children) who have higher pitch demonstrate increased variation in pitch as well (Whiteside and Hodgson 2000). The second aim of the study, which applies to the assessment of prosody more broadly, was to examine how perceptual ratings relate to the acoustic measurements of prosodic features. Will raters who are blind to speaker diagnosis be able to pick up on the acoustic features that differ between groups? Does listener perception provide information that does not have direct acoustic correlates? In addition, perceptual ratings provide a way to gauge whether acoustic differences are meaningful to listeners. Finally, the third aim of the study was to sample these prosodic features across two complementary communicative settings: face-to-face conversation, which provides natural speech with high ecological validity, and a structured communication task where utterance content was relatively controlled, providing a more stringent test of prosodic differences that are not due to discrepancies in the content of speech.

Experiment 1. Acoustic Analysis of Conversational Speech

Method

Participants

Fifteen children with HFA and 13 typically developing children (TYP), aged 8–14 years participated in a face-to-face conversation with an adult research assistant in a comfortable lab setting. Participants were recruited from the Sacramento, California area. All participants were from monolingual English households and had language abilities in the normal range or above. The groups were not significantly different in terms of age, gender, language level, as assessed by a comprehensive test of language ability, the Clinical Evaluation of Language Fundamentals, Fourth Edition (CELF-IV, Semel et al. 2003), or Performance IQ as assessed by the Wechsler Abbreviated Scales of Intelligence (WASI; Wechsler 1999). Participant characteristics are shown in Table 1a. Autism diagnosis was confirmed within the study by direct observation via the Autism Diagnostic Observation Schedule Module 3 (ADOS-3, Lord et al. 1999) and by parent report via the Social Communication Questionnaire (SCQ; Rutter et al. 2003, scores of 15 or higher are consistent with an autism spectrum disorder). All children in the HFA group met full DSM-IV criteria for Autistic Disorder (American Psychiatric Association, Diagnostic and statistical manual of mental disorders, 4th edition, 1994). Participants in the TYP group were also screened for autism symptoms using the SCQ; all fell in the non-autism spectrum range (scores below 15). All standardized assessments were administered by the first author who held a Ph.D. and was trained to research reliability on the ADOS. Diagnostic confirmation was done under the supervision of a licensed clinical psychologist.
Table 1

(A) Sample characteristics for conversation task in Experiment 1. (B) Sample characteristics for structured communication task in Experiment 3

 

HFA (n = 15)

Typically developing (n = 13)

p value

Mean (SD)

Range

Mean (SD)

Range

(A)

 CA

11;0 years

(19 months)

8;5–14;5

11;0 years

(24 months)

8;5–14;0

>.05

 Language level (CELF-IV)

109 (13)

92–134

115 (10)

95–126

>.05

 PIQ (WASI)

105 (15)

81–126

111 (14)

88–135

>.05

 SCQ

 26 (6)

16–34

2 (3)

0–7

<.05

 Gender

13 male, 2 female

11 male, 2 female

>.05

 ADOS algorithm score

13 (3)

7–20

  

n/a

 ADOS total score (sum of all items)

23 (6)

13–37

  

n/a

 

HFA (n = 15)

Typically developing (n = 11)

p value

Mean (SD)

Range

Mean (SD)

Range

(B)

 CA

10;6 years (17 months)

8;6–14;1

10;8 years (23 months)

8;6–14;0

>.05

 Language level (CELF-IV)

108 (16)

81–134

117 (13)

88–129

>.05

 PIQ (WASI)

111 (17)

81–133

116 (13)

97–135

>.05

 SCQ

 26 (6)

18–36

 2 (2)

0–7

<.05

 Gender

12 male, 3 female

9 male, 2 female

>.05

 ADOS algorithm score

15 (3)

7–20

  

n/a

 ADOS total score (sum of all items)

25 (7)

13–37

  

n/a

Procedures

During the face-to-face conversation participants were asked about their siblings, pets, special interests, or hobbies. Aside from questions to initiate the topic the conversational partner was instructed to respond to the participant naturally and not to continue to prompt or dominate the conversation. Conversation audio was obtained via a Crown PZM-20R Boundary Microphone flush mounted to the ceiling of the testing room, approximately 5 feet above where participants were seated. Given this, the sound quality was not always ideal and sometimes contained environmental noise. Audio of the longest uninterrupted segment of each child’s speech was extracted from the video using Final Cut Pro software. Due to the give and take inherent in natural conversation, clean samples of the child’s uninterrupted speech were often brief. To ensure that audio clips from each speaker were approximately the same length, each participant’s longest segment of uninterrupted speech was truncated to 10–13 seconds in duration, from the start of the initial utterance through completion of the last clause that ended within a 13 seconds window. Thus the speech samples examined in this experiment were very brief, generally two to three utterance excerpts of conversational speech. There was no significant difference in clip duration (HFA M = 11.15 s, TYP M = 10.99 s, p = .64), or number of syllables spoken (HFA M = 31.80, TYP M = 29.46, p = .50) between the two groups. Audio clips were analyzed using PRAAT software (Boersma and Weenink 2008) to automatically extract mean pitch, maximum and minimum pitch, and duration of the audio clip. Since our sample population contained both girls and children, the default pitch analysis range settings in PRAAT were modified from the standard minimum of 75 Hz (recommended for adult males) to a minimum of 130 Hz to eliminate low pitch track errors. Each audio file was then individually examined for pitch track errors according to the following procedure. Files which had unexpectedly low pitch tracks or were missing pitch contours were flagged for manual inspection and editing. If an unusually low pitch track was found to be accurate (reflect a very low pitch in the speech stream) it was retained. However, if it was found to have been registered during a non-speech period (e.g., due to noise), a small portion of the audio file (spanning a few milliseconds) was deleted to eliminate the inaccurate pitch track portion, and pitch measurements were re-calculated. Pitch range was calculated as the difference between maximum and minimum pitch over the short conversation sample. To calculate speech rate each syllable was counted, excluding repeated words and interjections, following Sturm and Seery’s (2007) speech rate calculation methodology. The number of syllables was divided by the duration of the audio clip in seconds and multiplied by 60 s to obtain a measure in syllables per minute (spm) to allow for comparison with previous research.

Results

An alpha level of .05 was used for all statistical tests. Where variables passed both tests of normality and equal variance an independent samples t-test is reported. Otherwise, the non-parametric Mann–Whitney U test was used to test for group differences. Effect size is reported with Pearson’s correlation coefficient r, which can be calculated for both parametric and non-parametric contrasts. Values of .1 are considered small effects, .3 medium effects, and .5 large effects (Cohen 1992). For ease of comparison across studies the statistical details of primary analyses are reported in Table 2.
Table 2

Results of primary analyses for Experiments 1–3

Experiment

1: Conversation sample acoustic measures

2: Conversation sample perceptual ratings

3: Structured task acoustic measures

Group

HFA

TYP

HFA

TYP

HFA

TYP

Number of Participants

15

13

15

13

15

11

Pitch range

Mdn = 200 Hz

Mdn = 124 Hz

Mdn = 4.00 (scale 1–7)

Mdn = 3.81 (scale 1–7)

M = 156 Hz

M = 122 Hz

 Statistical test value

U = 20

U = 78.5

t (24) = 2.13

 p value

< .001**

.38

.04*

 r (effect size)

.67

.17

.40

Mean pitch

M = 225 Hz

M = 214 Hz

M = 3.97 (scale 1–7)

M = 3.85 (scale 1–7)

M = 247 Hz

M = 236 Hz

 Statistical test value

t(26) = 1.62

t(26) = −.12

t(24) = 1.25

 p value

.12

.63

.22

 r (effect size)

.30

.09

.25

Speech rate

M = 172 spm

M = 148 spm

M = 4.15 (scale 1–7)

M = 3.77 (scale 1–7)

M = 207 spm

M = 204 spm

 Statistical test value

t(26) = 1.3

t(26) = −.39

t(24) = .15

 p value

.20

.25

.88

 r (effect size)

.25

.22

.03

Overall impression

  

M = 2.76 (scale 1–4)

M = 3.23 (scale 1–4)

  

 Statistical test value

  

t(26) = .47

  

 p value

  

.01*

  

 r (effect size)

  

.48

  

* Significant difference at p ≤ .05

** Significant difference at p ≤ .001

Pitch range was non-normally distributed, so a non-parametric test was used. Pitch range was significantly higher in the HFA group (Mdn = 200 Hz) than the typical group (Mdn = 124 Hz), representing a large effect size. We conducted correlations to explore whether individual differences in pitch range within the HFA or TYP groups were related to participant characteristics. None of the relationships examined were significant for either group: Performance IQ (WASI, HFA r = −.05, p = .87; TYP r = −.21, p = .49), language level (CELF-IV, HFA r = −.37, p = .18; TYP r = −.04, p = .89), and for the HFA group autism severity scores from the ADOS (sum of scores over all items, r = −.40, p = .14).

Mean pitch was normally distributed with similar variances between groups, therefore an independent samples t test was computed. The HFA group (M = 225 Hz) was not significantly different than the typical group (M = 214 Hz) with respect to mean pitch. Similarly, data on speech rate was entered in a t test. The groups were not reliably different with respect to speech rate (HFA M = 172 spm, TYP M = 148 spm).

Discussion

Consistent with previous studies using narratives, reading, picture description, or imitation to elicit language samples (Diehl et al. 2009; Edelson et al. 2007; Fosnot and Jun 1999; Sharda et al. 2010), we found that when compared with typically-developing peers children with HFA, as a group, employed greater pitch range during conversation. Individuals with autism have been reported to display atypicalities in intonation, ranging from monotonous or highly variable (“sing song”) intonation (McCann and Peppé 2003). However, acoustic measurements from four samples of children with high-functioning autism, including the present study, provide evidence for increased pitch range or pitch variation in autism relative to matched comparison groups, rather than flat intonation. We examined whether individual differences in pitch range correlated with participant characteristics such as language level, Performance IQ, and autism severity scores from the ADOS but did not find any significant relationships.

With respect to mean pitch in conversational speech, we found a non-significant trend for the HFA group to produce a higher mean pitch than the typical group. Findings to date from narrative retelling and picture description tasks vary, with Diehl et al. (2009) reporting a non-significant difference in mean pitch production between both younger and older groups of children and adolescents with HFA and typically developing comparison groups, whereas Edelson et al. (2007) and Sharda et al. (2010) found their HFA group to produce a significantly higher mean pitch than their typical group.

Finally, when examining speech rate we found no significant difference between groups. The average conversational speech rate for 11-year-olds, the mean age of participants in the present study, for familiar topics is M = 162 syllables per minute (spm), range = 132–193 spm (Sturm and Seery 2007). Hence, both groups (HFA M = 172 spm, TYP M = 148 spm) fell within the expected range for speech rate in children of this age, supporting a lack of significant difference between groups.

In present study we analyzed very brief conversation samples that were 11 s long on average. However the validity and generalizability of our findings are bolstered by the fact that they are consistent with reports in the literature obtained from much longer language samples (e.g. increased pitch variation in narratives that were several minutes in duration from Diehl et al. (2009), and with normative measures of speech rate for children of the same age from Sturm and Seery (2007)). Now we turn to whether the significant acoustic difference found in pitch range is detectable by listeners at the perceptual level.

Experiment 2. Perceptual Ratings of Conversational Speech

Method

In this experiment the conversation samples analyzed in Experiment 1 (from 15 HFA and 13 TYP participants) were rated by 32 Applied Masters students from McGill University’s School of Communication Sciences and Disorders. We selected Speech-language pathology students as raters since they would have a basic understanding of speech concepts (e.g., “pitch”) and because we were interested in the potential application of perceptual ratings for the clinical assessment of prosody. Raters were blind to group membership and ratings were obtained with the perceptual rating scale provided in Appendix A.

Procedure

Audio of the conversational speech samples were presented via a PowerPoint presentation in a classroom. The presentation began with a short tutorial with examples of high versus low pitch, flat versus variable changes in pitch, and slow versus fast speech rate to provide guidelines for the raters who had varying levels of familiarity with speech science. Raters were shown each child’s age and gender on a written slide while audio of that child’s conversational speech was played. They were instructed to use their first impression, relative to the child’s age and gender, to rate each conversation sample for the features of pitch, pitch changes, and speech rate using seven point scales (where 4 was normal, 1 was low or slow, and 7 was high or fast). They also rated their overall impression using a four point scale where 4 was normal and 1 was atypical. Two practice trials were presented, using stimuli from children who were not included in the study, to familiarize raters with the procedure. Conversation samples from 28 speakers with and without HFA were presented in a fixed random order. Raters listened to each brief conversation sample once and were then given approximately 35 s to complete the perceptual rating scale for that child. Raters were allowed to leave individual features unrated if they were unable to rate them for any reason. This occurred rarely, 10 times in the entire data set. Importantly, this happened as often (5 instances each) for HFA and TYP speakers. If a rater left an individual feature blank, the group mean for that speaker was calculated over one less rating.

Results

Statistics are once again reported in Table 2. Raters distinguished the HFA and typically-developing groups in terms of overall impression: children with HFA (M = 2.76) were rated as having prosody significantly more atypical than the TYP group (M = 3.23) on a scale where one was “atypical” and four was “normal.” With respect to individual performance, nine of 15 speakers with HFA were rated lower than the range of the TYP group, while the remaining six were rated as falling within the range of the TYP group.

There were no other significant differences in perceptual ratings between the HFA and TYP groups. The HFA group received similar median ratings for pitch variation (Mdn = 4.00) when compared with the TYP group (Mdn = 3.81), where four indicated a normal amount of changes in pitch based on the child’s age and gender on a 7-point rating scale. However, there was a larger spread in ratings of pitch variation for the HFA group (range of mean scores: 2.63–5.13) than in the TYP group where scores stayed closer to the “normal” midpoint (range of mean scores: 3.22–4.38) as seen in Fig. 1. The HFA group received similar ratings for mean pitch (M = 3.97) as the TYP group (M = 3.85), where four reflected “normal” pitch. Finally the HFA group received similar ratings of speech rate (M = 4.15) to the typical group (M = 3.77) where four indicated an average speech rate based on the child’s age and gender.
https://static-content.springer.com/image/art%3A10.1007%2Fs10803-011-1264-3/MediaObjects/10803_2011_1264_Fig1_HTML.gif
Fig. 1

Ratings of pitch changes/variation

Relationship Between Acoustic Measurements and Perceptual Ratings of Conversational Speech

The second aim of this study was to examine how perceptual ratings relate to the acoustic measurement of the prosodic features we focused on: pitch range, mean pitch, and speech rate. This comparison is important for multiple reasons. A strong relationship between perceptual judgments and acoustic measurement would indicate that a prosodic feature could reliably be assessed clinically without special equipment or analysis. In addition, the relationship between perceptual ratings and acoustic measurement can provide complementary information on the significance of acoustic changes for the listener. To explore these questions we calculated correlations between the acoustic measurements of Experiment 1 and the perceptual ratings of Experiment 2 for the three features of mean pitch, pitch variation, and speech rate. We examined these correlations separately for the HFA and TYP groups, since relationships may differ within each group.

Acoustically, variation in pitch was measured as pitch range, or the difference between maximum and minimum pitch during the brief conversation sample. Perceptual rating of variation in pitch or “changes in pitch” and was collected on a seven point scale where one reflected flat or monotone intonation and seven indicated speech that was too variable or sing-song-like. For the HFA group, the two variables were not significantly correlated, λ = .10, p = .59, nor were they for the TYP group, λ = .25, p = .24. However, as reflected in Fig. 2 there was a different pattern between acoustic and perceptual measurements for the two groups of participants that we return to in the discussion. For mean pitch there was a modest relationship between acoustic measurements and perceptual ratings for both groups of participants. This correlation was significant for the HFA group r = .53, p < .05 but not for the TYP group r = .32, p = .28. For speech rate there was a strong significant relationship in both groups between the measurement of rate, calculated as syllables per minute, and perceptual rating of speech rate, HFA r = .65, p < .01, TYP r = .87, p < .001.
https://static-content.springer.com/image/art%3A10.1007%2Fs10803-011-1264-3/MediaObjects/10803_2011_1264_Fig2_HTML.gif
Fig. 2

Correlations between acoustic pitch range and perceptual ratings of pitch variation

Finally we examined whether perceptual ratings of overall impression of prosody were related to any of our acoustic measures, e.g., were raters relying on acoustic differences in pitch range, mean pitch, or speech rate to come up with their overall impression? We found no significant relationships to this effect. The correlation between overall impression ratings and pitch range was r = −.19, p = .33, between overall impression and pitch was r = −.27, p = .16, and between overall impression and speech rate was r = −.05, p = .78.

Discussion

Raters who were blind to group membership judged the speakers with HFA to be significantly less “normal” (or more atypical) in overall prosody relative to the TYP speakers, demonstrating that raters perceived a different quality in the melody of their speech. A potential confound, since we used natural rather than low-pass filtered speech, is that raters’ global ratings of prosody were tied atypical content rather than to prosody per se. We believe this not to be the case given the brief nature of the samples (generally 2–3 utterances) and similarity in content across the groups: the number of syllables spoken, duration, and speech rate did not differ between groups. Furthermore conversation was on relatively constrained topics: siblings, pets, or hobbies/circumscribed interests, and at debriefing raters did not mention perceiving two different groups of speakers. Nevertheless, we control for this possibility in Experiment 3. We did not find any of the acoustic measurements of pitch range, pitch, or speech rate to be significantly related to perceptual ratings of overall prosodic impression.

However raters did not distinguish between groups along any of the individual characteristics rated: changes in pitch, mean pitch, or speech rate. The latter two of these findings would be expected since the acoustic measurements from Experiment 1 showed that the groups did not differ significantly from each other with respect to mean pitch or speech rate. However, they did differ with respect to acoustically measured variation in pitch. Were the raters impervious to the increased acoustic pitch range observed in speakers with HFA? Although the group difference was not seen directly, as reliably elevated ratings indicating more variable pitch, it appears that this group difference may have been captured in the extreme spread of ratings of changes in pitch for speakers with HFA. Although no speaker with HFA had a lower acoustic pitch range than the TYP group, a few were rated as having more monotone speech, suggesting that their modulation of pitch was hard to interpret and did not convey increased emotionality or expressiveness that raters would conventionally register as more variable pitch.

The relationship between acoustic and perceptual pitch range measures across groups, shown in Fig. 2, also contributes to this interpretation. Though significant correlations were not found between acoustic and perceptual measures of pitch range for either group, clearly different patterns emerge for each group’s data in the scatterplot. Whereas the regression line is almost flat for the HFA group, indicating no association between acoustic pitch range and raters’ perception of it, a more linear relationship was found for the TYP group, such that perceived changes in pitch tended to increase alongside greater acoustic pitch differences. Thus, it seems that typically-developing speakers’ pitch modulation was perceived and rated as such, while the increased variation in pitch used by speakers with HFA was not communicative in the same way: it was rated as more different than normal, but in opposing directions (both decreased and increased variation, as observed in Figs. 1 and 2). We propose that this may be why previous clinical descriptions of intonation in autism have included both monotone (minimal pitch variation) and sing song (excessive pitch variation) descriptions, though all studies employing acoustic measurements to date have found increased pitch variation in samples of individuals with high functioning autism. This proposal is preliminary but calls for further investigation in larger samples of speakers and over longer stretches of conversational speech. In addition, future work should include acoustic measurements of intensity and rhythm which may inadvertently contribute to listener’s perceptual ratings of pitch.

In contrast to the findings for pitch variation, the relationship between acoustic measurement and perceptual ratings followed a similar pattern in both groups for mean pitch and speech rate. This demonstrates that some prosodic features such as speech rate are easily and reliably assessed by the perceptual judgment of listeners who have little specialized training. It should be noted that these correlations were computed over small groups of participants, 15 in the HFA group and 13 in the TYP group. This makes the significant relationships all the more important, as they are likely to be even stronger with larger samples.

Now we turn to the third aim of the study: examining variation in pitch, mean pitch, and speech rate in a structured task with relatively controlled content. This serves as a further control for Experiment 1 in that prosodic differences are unlikely to be driven by differences in the content of speech, and also allows for an investigation of how consistent patterns of prosodic differences are across communicative settings.

Experiment 3. Acoustic Analysis of Speech From Structured Task

Method

Participants

Audio recordings of 15 children with HFA and 11 TYP children aged 8–14 years who participated in an interactive communication task were analysed acoustically. Five of the TYP children and ten of the HFA children were also included in the sample for Experiments 1 and 2. Most participants were from monolingual English households. However one participant in the HFA group had additional language exposure in his household. All group comparisons remained the same when analyses were conducted with and without this participant. Therefore, to maintain as large a sample size as possible we report results including this participant. The groups were matched for age, gender, language ability, and Performance IQ as shown in Table 1. Autism diagnosis was confirmed using the same procedures described for Experiment 1. Children with HFA met full DSM-IV criteria for Autistic Disorder. Participants in the TYP group were screened for autism symptoms using the SCQ; all fell in the non-autism spectrum range (scores below 15).

Procedures

In the communication task the child was required to describe a target object to a partner from an array of four household objects. Each child participated in 15 trials with different displays where he/she gave instructions such as “Pick up the little glass” or “Can you please pick up the shampoo?” For acoustic analysis, tokens with dysfluent speech, more than one simultaneous speaker, unconventional phrase structure (without a carrier phrase such as “Pick up the…” examples include the following descriptions: “big scissors,” “your bottom left,” or “made of glass”), or too much background noise were eliminated. In total, each participant had between 8 to 15 valid utterances analyzed using PRAAT software as in Experiment 1 (Boersma and Weenink 2008). The mean duration of these utterances did not differ between groups (HFA M = 2.11 s, TYP M = 1.97 s, p = .45). From this data we calculated the mean of each participant’s tokens for pitch, pitch range, and speech rate. In addition, we calculated the standard deviation of pitch range and of mean pitch across a speaker’s tokens to compare the amount of variability in each group.

Results

All variables were normally distributed so parametric tests were use. Statistics from primary analyses are reported in Table 2 as for the other experiments.

In this complementary data set of speech obtained from a structured task where participants gave a partner instructions to pick up an object, mean pitch range was again higher for the HFA (M = 156 Hz) than TYP (M = 122 Hz) participants. The effect of group was significant with a medium effect size. Within the HFA group, pitch range was negatively correlated with Performance IQ, r = −.65, p < .01; that is participants with higher IQs demonstrated lower pitch ranges. Significant relationships were not found between pitch range and language level (r = −.32, p = .25) or autism severity (r = −.10, p = .73) in the HFA group, and pitch range was not related to PIQ (r = −.16, p = .63) or language level (r = −.40, p = .22) for the TYP group. We also examined the standard deviation of pitch range across each speaker’s tokens. The HFA group had a similar mean standard deviation (50) to the TYP group (60), t(24) = −.88, p = .39, r = .03.

The HFA group demonstrated a similar mean pitch (M = 247 Hz) to the TYP group (M = 236 Hz). We also examined the standard deviation in mean pitch across each speaker’s tokens. The HFA group had a similar standard deviation (17.5) to the TYP group (13.3), with no group difference t(24) = 1.7, p = .10, r = .10. As in Experiment 1, the HFA group (M = 207 spm) had a similar speech rate in syllables per minute to the typically-developing group (M = 204 spm).

Discussion

In this experiment we analyzed the same three global prosodic features of pitch range, mean pitch, and speech rate in a different communicative context: a structured task where isolated, one-utterance instructions were given by participants. Therefore the content and communicative goal of speech were more constrained in this experiment than in Experiment 1. Speakers with HFA once again displayed a higher pitch range than those in the TYP group. Mean pitch, however, was similar in both groups. As in Experiment 1, there was no indication of a group difference in speech rate in this data set. In fact both groups had higher speech rates in this constrained task than in conversational speech, perhaps due to reduced demands of utterance planning and a priming effect of producing many similar instructions in the structured task. For example, most speakers used the same carrier phrase, “Pick up the …” on each trial.

Since each speaker contributed multiple tokens to this analysis we were able to examine the variability across tokens via their standard deviation. Unlike other studies (Bonneh et al. 2011; Diehl et al. 2009; Green and Tobin 2009), we did not find group differences in standard deviation of pitch range or pitch across a speaker’s tokens. This may be due to the fact that our analysis was at the utterance level (approximately 2 s) rather than sampling over numerous very short segments of speech on the scale of 10 or 250 ms.

HFA participants with higher Performance IQ scores demonstrated lower pitch range, a characteristic similar to that of the typically-developing group. Compared to Experiment 1, where we examined short excerpts of conversational speech, we found more relationships between participant characteristics and acoustic measures of speech when sampling production during a structured task. It is possible that prosodic modulation is more related to general cognitive abilities when encoding information in a constrained task where an object needs to be described, as opposed to open-ended conversation. This finding is in need of replication with more comprehensive IQ measures as we employed only a brief estimate of Performance IQ skills via the WASI.

General Discussion

The main finding from this set of studies is of increased pitch range in speakers with HFA compared to typically-developing speakers who were matched on language level and age, corroborating findings from a growing body of studies employing acoustic measurements (Diehl et al. 2009; Edelson et al. 2007; Fosnot and Jun 1999; Green and Tobin 2009; Sharda et al. 2010). Our findings complement previous ones that employed non-interactive tasks by sampling speech during two social situations: conversation and a structured communication game with a partner. Taken as a whole, these findings demonstrate that increased pitch variation or exaggerated intonation is a consistent prosodic characteristic in child and adolescent speakers with HFA, across a number of communicative settings. We found no acoustic evidence of flat or monotone intonation produced by speakers with HFA, contrary to the traditional stereotype.

The combination of acoustic measurements in Experiment 1 and perceptual ratings of prosodic features in Experiment 2 are a novel contribution to the literature that allows us to explore how acoustic differences are perceived by listeners. Speech-Language Pathology student raters who were blind to group membership reliably distinguished between speakers with high functioning autism and those with typical development based on a rating of overall impression of prosody: speakers with HFA were given more “atypical” scores. On a scale where four indicated a “normal” overall impression and one indicated “atypical,” nine of 15 speakers with HFA were rated lower than the range of the TYP group, while the remaining six were rated as falling within the range of the TYP group. This is consistent with previous studies using perceptual rating measures (Shriberg et al. 2001) in that some but not all individuals with HFA are perceived as demonstrating atypical prosody. Future investigations should explore the perceptions of naive raters, which would provide a more ecologically valid measure of how members of the general public perceive the expressive prosody of individuals with autism.

We were particularly interested in how listeners would perceive pitch variability, since acoustic measurements of increased pitch range have been widely attested in speakers with HFA and were also found in the studies reported here. We did not find a significant group difference in ratings of pitch variability that aligned with these acoustic measurements, that is, raters did not consistently rate speakers with HFA to have higher pitch variability than TYP speakers. In fact, there was a much broader range of ratings of pitch variation for the HFA group compared to the TYP group, and some speakers with HFA were rated as having less pitch variation than TYP speakers, consistent with the stereotype of monotone speech, though this was never attested by acoustic measurements. This demonstrates that, despite the increase in pitch range, this was not clearly identified as such by listeners, perhaps because they did not perceive the variation in pitch as meaningful. Different patterns in the relationship between acoustic measurements and perceptual ratings of pitch variation suggest that while listeners seem to be able to track pitch variation in TYP speakers reasonably well, quantitative differences in pitch range in speakers with HFA do not translate to analogous qualitative judgements.

We propose that this may stem from non-conventional use of prosodic contours in autism, where they do not serve communicative functions that are easily interpretable by listeners (hence the lack of perceptual ratings matching the acoustic differences). Similar contentions have been proposed elsewhere in the literature. Edelson et al. (2007) relate that ASD speakers produced increased pitch range compared with TYP speakers, as well as more simple than complex pitch slopes (involving both rises and falls in pitch) when engaged in the retelling of emotional stories. In this case the complex pitch slopes of TYP speakers, rather than greater absolute differences in pitch range demonstrated by speakers with HFA, may have better expressed the emotions of the story to a listener. Green and Tobin (2009) contend that speakers with HFA in their study demonstrated a limited repertoire of more extreme pitch accents and prosodic boundary accents relative to TYP speakers, resulting in a “stiff sounding prosody.” Thus, rather than serving an expressive or communicative function, the more extreme and possibly repetitive pitch modulation observed in HFA appears to be used in an idiosyncratic way. Future work should more closely examine the nature of increased pitch variation and use of pitch contours in the speech of individuals with HFA and how it does or does not map onto specific communicative functions (for instance, asking questions, providing factual information, negating information, asking for clarification).

What might be the root of this increase in pitch variability in the speech of individuals with HFA, confirmed now across multiple studies using acoustic measures? Sharda et al. (2010) link their findings of increased pitch range and pitch in children with HFA to prolonged mimicry of the exaggerated prosodic patterns of infant directed speech in this group, relative to typically developing children. They support this proposal with data showing similar mean pitch and pitch excursions in children with HFA and mothers speaking to their infants, while typically-developing children matched on age exhibited lower pitch and smaller pitch excursions. However this explanation is difficult to reconcile with the results of our perceptual rating study, where the increased pitch range exhibited by speakers with HFA was not registered as such by listeners, whereas it presumably would be for infant-directed speech. Bonneh et al. (2011) comment that some but not all of their participants with ASD demonstrated pitch excursions that are similar to motherese. Therefore these superficial similarities in increased pitch range may result from different underlying patterns. Recent studies (Edelson et al. 2007; Green and Tobin 2009) have highlighted repetitive use of limited range of prosodic contours and boundaries in the speech of children with HFA. A future direction would be to examine whether these are qualitatively similar to prosodic patterns of infant directed speech, or if they represent non-conventional prosodic patterns that are not linked to content or communicative function. A deeper understanding of differences related to pitch variation is clearly necessary in order to effectively assess and treat atypical prosody.

Yet, speakers with autism have been shown to have increased pitch variation, in terms of pitch range and standard deviation in pitch, even at a more basic level of speech production when naming pictures and not engaged in an interactive task or narrative (Bonneh et al. 2011). As proposed by those authors, the finding of increased pitch range under minimal communicative demands suggests a disruption in basic speech production mechanisms having to do with perception, action, or the feedback loop between the two. Atypical auditory processing has been widely reported in autism (e.g., Siegal and Blades 2003), which could have an impact on the perception of one’s own speech production. Also consistent with a disruption in speech production mechanisms is a report by Grossman et al. (2010), who found that speakers with HFA produced lexically ambiguous words with appropriate stress patterns (e.g., makeup/cosmetics vs. to make up/reconcile), though all of their productions were longer in duration than the same words spoken by TYP speakers. It should be noted however that severe disruptions in speech production and/or auditory processing are unlikely to characterise high functioning individuals with autism such as those tested here, as many of these individuals do not experience significant language delay, but still display atypical prosody. Important avenues to pursue in understanding increased pitch variation in autism include: what aspects of atypical pitch variation are tied specifically to communicative functions, to what extent differences found in communicative settings can be explained by disruptions in basic speech production mechanisms, and at what stage of speech perception or motor control these differences arise.

The other findings of this study showed that mean pitch was similar in our groups of 8- to 14-year-old HFA and TYP participants. Group differences of increased pitch in speakers with HFA have been found in some samples (Edelson et al. 2007; Sharda et al. 2010) but not others (Diehl et al. 2009, our Experiment 3). Finally, we examined speech rate across two different settings and found no group differences between speakers with HFA and TYP speakers. The speech rate observed coincides with that documented previously for the conversational speech of 11-year-olds by Sturm and Seery (2007).

Notably, in our sample, none of the prosodic features we measured acoustically in Experiment 1 or 3 were related to autism severity scores from the ADOS-3. Whereas previous studies reported significant relationships between autism severity and prosodic measures (Diehl et al. 2009; Paul et al. 2005b), we did not replicate this finding. One contributing factor may have been our small sample size. Another is that we compared acoustic measures of pitch range to autism severity (sum of scores on all items on ADOS), whereas Paul et al. (2005b) compared expert ratings of appropriate stress (defined as relative emphasis on syllables and words in terms of intensity, pitch, and duration) from on the Prosody-Voice Screening Profile (PVSP; Shriberg et al. 1990) to scores from the communication sub-domain of the ADOS specifically. One item in that subdomain of 10 items is a rating of atypical prosody (Item 2, Speech abnormalities associated with autism). Thus, the relationship examined by Paul et al. (2005a, b) was between two perceptual ratings of prosody or communication skills, whereas we examined one between acoustic measures and overall autism severity scores. However, Diehl et al. (2009) did assess an acoustic measure of pitch variability. They report that, in one but not the other of their two samples of children and adolescents with HFA, this was significantly related to scores from the ADOS communication subdomain, such that participants with higher pitch variation also had higher communication scores, signifying more symptoms. Therefore in the extant literature autism symptoms have sometimes but not always been found to be related to atypical prosody. Future work with larger samples is needed to evaluate whether overall autism severity, beyond the communication subdomain, is related to acoustic measures of prosodic differences.

Limitations of this study include a modest small sample size and some but not full overlap in the speakers included in the samples of Experiments 1 and 3. However the groups were well-matched on language ability, Performance IQ, gender and age which allowed us to isolate prosodic differences from these potentially contributing factors. Our HFA group included only individuals who met full DSM-IV criteria for Autistic Disorder and who had language in the normal range or above; generalizability is limited to this subgroup on the autism spectrum. The conversational speech samples we analyzed in Experiment 1 were very brief (11 s on average), due to the give-and-take nature of conversation. It would be ideal to analyze longer stretches of conversational speech to examine prosodic patterns as they occur in normal interaction; we feel that the results provided here are an important first step in this direction. Experiment 3 examined a single, imperative function of asking a partner to pick up an object. Our rationale of focusing on a single function was to rule out the possibility that prosodic differences are linked primarily to differences in the content or function of speech which were not controlled for in the conversation task. That appears not to be the case; we found a similar increase in pitch range in both open-ended conversation and a structured task where utterances had an imperative function. Prosodic differences linked to different communicative functions (e.g., rejections/denials, comments, greetings) should be investigated in future work. Finally, our focus in this set of studies was to better understand the finding of increased pitch variation, as measured acoustically, in speakers with HFA. We did not examine other acoustic measurements, such as those reflecting rhythm and intensity; these should be explored in future work for a comprehensive understanding of differences in the expressive prosody of speakers with HFA.

According to McCann and Peppé (2003) prosody is often neglected in speech and language therapy for individuals with high functioning autism. Yet the increased pitch range or variation demonstrated here and in other recent studies with speakers with HFA (Diehl et al. 2009; Edelson et al. 2007; Fosnot and Jun 1999; Green and Tobin 2009; Sharda et al. 2010) indicates that this is a consistent prosodic characteristic in this population. Since speakers with HFA experience difficulties with social acceptance due to odd prosodic characteristics (Paul et al. 2005a; Shriberg et al. 2001), it is important that prosody be addressed in treatment along with formal aspects of language and conventions of social communication. Achieving a better understanding of the nature of increased pitch variation in high functioning autism moves us further towards this goal.

Acknowledgments

A version of this study was presented at the 2010 International Meeting for Autism Research in Philadelphia, PA. This work was supported by NIDCD F32-DC007297 to Nadig and McGill Faculty of Medicine research bursaries to Shaw. We are grateful to the families who participated in the study as well as the students who provided ratings for Experiment 2. We thank Josh Diehl and Duane Watson for their generous sharing of PRAAT scripts. Finally we would like to extend thanks to Lisa Goffman and Shari Baum for helpful input on this work.

Copyright information

© Springer Science+Business Media, LLC 2011