Attention, Perception, & Psychophysics

, Volume 74, Issue 8, pp 1742–1760

Pitch chroma discrimination, generalization, and transfer tests of octave equivalence in humans

  • Marisa Hoeschele
  • Ronald G. Weisman
  • Christopher B. Sturdy
Article

Abstract

Octave equivalence occurs when notes separated by an octave (a doubling in frequency) are judged as being perceptually similar. Considerable evidence points to the importance of the octave in music and speech. Yet, experimental demonstration of octave equivalence has been problematic. Using go/no-go operant discrimination and generalization, we studied octave equivalence in humans. In Experiment 1, we found that a procedure that failed to show octave equivalence in European starlings also failed in humans. In Experiment 2, we modified the procedure to control for the effects of pitch height perception by training participants in Octave 4 and testing in Octave 5. We found that the pattern of responding developed by discrimination training in Octave 4 generalized to Octave 5. We replicated and extended our findings in Experiment 3 by adding a transfer phase: Participants were trained with either the same or a reversed pattern of rewards in Octave 5. Participants transferred easily to the same pattern of reward in Octave 5 but struggled to learn the reversed pattern. We provided minimal instruction, presented no ordered sequences of notes, and used only sine-wave tones, but participants nonetheless constructed pitch chroma information from randomly ordered sequences of notes. Training in music weakly hindered octave generalization but moderately facilitated both positive and negative transfer.

Keywords

Octave equivalence Octave generalization Note-range discrimination Music training Psychoacoustics Music cognition Sound recognition 

Two acoustic events are separated by an octave when the frequency of the second event is double or half the frequency of the first. This logarithmic relationship between acoustic events spaced an octave apart is a description of the physics of wave transmission. Human perception has evolved to grasp this unique acoustic relationship in speech and music (see, e.g., Burns, 1999; Patel, 2003; Peter, Stoel-Gammon, & Kim, 2008). In all cultures, the production and perception of octaves are fundamental characteristics of music (Crickmore, 2003). That is, although the number of notes in an octave, their labels, and their frequencies can differ in music across cultures, all cultures recognize the similarity between notes an octave apart (the notes are said to have the same pitch chroma); this phenomenon is known as octave equivalence.

Octave equivalence is one of the two most potent determinants of pitch judgments. A second important determinant is pitch height, which is a log-linear scale of pitch in which the more two sounds differ in frequency, the more they differ in pitch. Octave equivalence and pitch height are opposing percepts. For example, a note one-third of the way between two notes separated by an octave is more similar in pitch height to the first note than to the second, whereas the two notes separated by an octave are more similar to one another in chroma than to the note one-third of the way between them.

Perception of octave equivalence has a neural basis: For example, neurons in the auditory midbrain show preferences for harmonically related sounds, and the ventral nucleus of the lateral lemniscus has a structure reminiscent of the pitch helix (Langner & Ochse, 2006). The pitch helix is a spiraling structure that completes a circular motion once in each octave. The helix was first proposed as a theoretical spatial mapping of human pitch height and chroma perception by Moritz Wilhelm Drobisch (ca 1846), then used by Shepard (1982) much later in his well-known theory of pitch perception.

Despite all of the evidence and theory supporting the importance of octaves, experimental demonstration of the perception of octave equivalence has been problematic. In the identification of simple melodies, alteration of the octaves of several of the notes reduces identification of the melodies (Deutsch, 1972). However, by maintaining pitch contour (i.e., the direction of frequency change) in the melody, identification was partially restored (Dowling & Hollombe, 1977). This makes sense, because contour is a form of pitch height information, and reducing the effects of pitch height should make pitch chroma a more salient feature.

Not all instances of octave equivalence testing involve music. Tests that require even limited amounts of music training are restrictive because they preclude unbiased testing of nonmusicians. But testing outside of a music context has been problematic. Allen (1967) asked participants to rate the similarity of two notes, including notes that were separated by an octave. He found that only musicians showed octave equivalence. Nonmusicians rated notes that were more similar in pitch height (i.e., frequency) more similar than notes played at the same chroma but in an adjacent octave. Krumhansl and Shepard (1979) also found that musically untrained participants relied more on pitch height than on chroma. Kallman (1982) conducted similar experiments to Allen (1967) and in most of these, the effects of octave equivalence were small or nonexistent. Only in experiments that manipulated pitch height, such that the two comparison notes were always an octave or close to an octave apart, was there some evidence of octave generalization in nonmusicians. In other words, again it appears that only when the effects of pitch height are greatly limited is it possible to observe even modest effects of pitch chroma. Overall, because of differences in procedure, the literature has produced contradictory results concerning octave equivalence. As Burns (1999, p. 252) noted in his review: “If the results of some relevant experiments are accepted at face value, octave equivalence is shown by rats (Blackwell & Schlosberg, 1943), human infants (Demany & Armand, 1984), and musicians (Allen, 1967), but not by starlings (Cynx, 1993), 4- to 9-year-old children (Sergeant, 1983), or nonmusicians (Allen, 1967).”

In summary, notwithstanding the presumed prominence of octave equivalence in music and speech, the experimental evidence for octave equivalence is sparse and contradictory. The purpose of the present research was to develop a music-independent protocol for studying octave equivalence and to use that methodology to enlarge our knowledge about equivalence in humans. In addition, we hoped that our protocol might clarify contradictory past findings lost in the myriad of protocols used in the prior literature. We adopted operant go/no-go procedures because of their positive impact in our prior research on pitch height perception and their immediate usefulness in studying octave equivalence in humans of all ages and verbal abilities and in other species (e.g., Weisman, Balkwill, Hoeschele, Moscicki, Bloomfield, & Sturdy, 2010a; Weisman, Hoeschele, Bloomfield, Mewhort, & Sturdy, 2010b).

Experiment 1

The starting point for the present research was Cynx’s (1993) study of octave equivalence in European starlings (Sturnus vulgaris). Cynx trained starlings in an operant go/no-go discrimination between 2200- and 1000-Hz sine-wave tones. The 2200-Hz tone was S+ (the go signal for food reward), and the 1000-Hz tone was S– (the no-go signal for no food reward) in one group; the go and no-go signals were reversed in the second group. During the generalization test, starlings heard 26 probe tones ranging from 1038 to 2119 Hz. The critical probe pitches for the octave generalization test were 1100 and 2000 Hz, spaced exactly an octave from the go signals in both the first (S+ = 2200 Hz) and second (S+ = 1000 Hz) groups, respectively. Cynx predicted that if the birds heard octave equivalence, they should have confused the go tone and the octave generalization probe tone. In fact, responding fell in an orderly monotonic function from the go to the no-go tone, without an increase at the octave-equivalent tone, which is what would be expected if the birds were using pitch height alone. Starlings showed no evidence of octave generalization; Cynx concluded that it was unlikely that they perceived octave equivalence. Before accepting Cynx’s conclusions about songbirds, we sought to determine whether Cynx’s procedure could be used to show octave generalization in humans. If the replication failed with humans, then we need to develop a more effective procedure for testing humans: one that takes pitch height into account and can also be adapted for training birds. Humans with absolute pitch (AP) can usually identify the chroma of a pitch across octaves, so we prescreened the participants to identify AP possessors using a note-naming test (Athos, Levinson, Kistler, Zemansky, Bostrom, Freimer, & Gitschier, 2007).

Method

Participants

Twenty-eight students at Queen’s University participated for course credit. They provided their ages and the details of their music and language training in written responses to a questionnaire. Each gave informed written consent, and the General Research Ethics Board at Queen’s University approved our research protocols.

The participants ranged in age from 17 to 22 years old, M = 18; seven were men and 21 were women. Twenty-six of the participants were enrolled in a first-year psychology course and completed the experiment for course credit. Two of the original participants were AP possessors, and two additional AP possessors were recruited especially for this study and were paid $10 for participating. We determined participants’ AP status using Athos et al.’s (2007) note-naming test (see our Results section below).

Because musical (e.g., Allen, 1967; Krumhansl & Shepard, 1979) and language training (e.g., Deutsch, Henthorn, & Dolson, 2004; Pfordresher & Brown, 2009) are sometimes factors in music perception, we have provided more information about the participants’ histories. Four participants had no formal music training, 11 began their training with the piano; the remainder had training in voice or a variety of musical instruments. Among musically trained participants, the amount of training varied from 1 to 14 years, M = 7.5; 21 played at least one additional instrument, and 20 still played at least one instrument; only five had passed formal examinations in music. Nineteen participants learned English as their first language, four learned Korean first, the remaining four learned three other languages.

Apparatus

Training and testing were conducted on a Toshiba 149 Tecra laptop (Intel Pentium M processor and Intel 855 series chip set) using Sennheiser HD 580 headphones. The participants used a mouse to make their responses and could use a rotary control on the computer to adjust the volume to the headphones at any time during the experiment. The procedures and data collection were programmed in Visual Basic.

Stimuli and procedures

The experiment consisted of three phases: a test for AP ability, auditory discrimination training, and an auditory generalization test. The second and third phases were adapted from Cynx’s (1993) study.

AP testing

The protocol was adapted from a procedure used by Athos et al. (2007) to test 2,213 participants: The note durations and frequencies were a direct replication of Athos et al.’s procedure. In the note-naming tests, we identified AP possessors using Athos et al.’s scoring protocol: 1 point for each correct identification, and 0.75 points for responses to notes ±1 semitone from the correct note.

Sinewave tones presented in the test were synthesized at the frequencies of 40 notes randomly sampled from the 66 notes on the chromatic scale that spans the 5 1/2 octaves from C2 to G8, on the basis of A4 = 440 Hz; each note was played for 1,000 ms (see Athos et al., 2007). The actual notes presented were D#2, F2, F#2, G#2, A#2, B2, C#3, D#3, E3, F3, G3, G#3, C4, C#4, D4, D#4, F4, F#4, A4, C5, C#5, D5, E5, F#5, G5, G#5, A5, A#5, C6, D6, A6, B6, C#7, D#7, F#7, B7, E8, F#8, G8, A#8. These tones and all the others presented in this study were constructed at a standard 16-bit, 44.1 kHz sampling rate and ramped at onset and offset, respectively, upwards and downwards for 5 ms. Because four of the sine-wave tones lie above the notes on the piano keyboard (in Octave 8) and proved difficult to identify, participants rarely named them accurately. In practice, therefore, the test consisted of 36 notes (see Athos et al., 2007).

The test began after a short practice session (eight trials), given to acquaint participants with making mouse responses to graphics on the screen and to allow participants to individually adjust the tone amplitude to a comfortable level. During the practice session and the test, a participant clicked on the “Play” button at the top of the screen and heard a tone selected randomly without replacement from the 40 test tones, which controlled for any possible predictable relative pitch carryover effects between tones (Ward & Burns, 1982). To “name” the musical note corresponding to a tone, the participant clicked on one of 12 black and white piano keys shown on the screen. The test continued without feedback until the participant heard all 40 tones. In this note-naming test and all following tests, the participants could take as much time as they liked between trials, as a trial began only after participant had clicked the “Play” button. We did not record time between trials.

Operant discrimination training

Participants were asked to classify notes into two categories (go and no-go tones) to the best of their ability, without any instructions about which notes made up each category. Participants were told that discrimination training was a test of their perceptual categorization ability but not that it was a test of octave equivalence.

Only two frequencies were presented: 1000- and 2200-Hz sine-wave tones. Each frequency was presented 50 times in a random order without replacement for 100 trials. Reward (positive feedback) was counterbalanced across two groups: a 1000-Hz S+/2200-Hz S− group, and a 2200-Hz S+/1000-Hz S− group. Participants initiated a trial by clicking the button labeled “Play tone” on the screen to hear a tone. If a participant clicked on the button on the screen labeled “S+” after hearing a go tone, the word “correct” appeared in a box adjacent to the S+ button. If the participant clicked the S+ button on a no-go trial, the word “incorrect” appeared in a box adjacent to that button, and the next trial was delayed by 3 s. If a participant failed to click the S+ button after either a go or a no-go tone, the trial terminated after 2 s without feedback, as is typical in go/no-go discrimination procedures. To control for amplitude, two versions of each tone were played, one at 70 dB (SPL) and a second at 80 dB (SPL), each on 25 trials chosen at random without replacement. This strategy made pitch a more salient determinant of the discrimination by reducing the confounding of pitch with loudness (Moore, 1989). The initial sound pressure level (in decibels) of each tone was measured from the location of the ear with an integrating sound level meter (Type 2239 A; Brüel & Kjær Canada Ltd, Point Claire, Quebec, Canada). Each participant was allowed to adjust overall amplitude to a comfortable level during a short practice session (four trials, one of each of the four trial types presented during training) prior to the training session; this meant that the actual tone amplitudes heard in the discrimination task varied across participants.

Generalization testing

The participants were told that they would no longer receive feedback for responding but were asked to respond as they had during training, to the best of their ability. Participants received no further instructions. The trials were identical to those during discrimination training, except that probe tones were presented in addition to the training tones and the feedback box was no longer presented. Hiding the feedback box eliminated reward for responding during the generalization test.

The two training tones and 26 probe generalization tones played at intermediate frequencies were presented during the test. We presented the same probe-tone frequencies as had Cynx (1993)—1038, 1059, 1100, 1122, 1165, 1189, 1235, 1260, 1308, 1335, 1386, 1414, 1468, 1498, 1556, 1587, 1648, 1682, 1746, 1782, 1850, 1888, 1960, 2000, 2076, and 2119 Hz—on five trials each. Two versions of the play list were used, to control for amplitude: In one test version, all odd-numbered probes in ascending order of frequency were played at 70 dB (SPL), and the others were played at 80 dB (SPL). In the other version, the amplitudes were reversed (i.e., the 70-dB tones in the other version were now played at 80 dB, and vice versa). The two versions were counterbalanced across participants and training conditions. In the same play lists, the training tones, 2200 and 1000 Hz, were presented on 50 trials each—25 trials at 80 dB and 25 trials at 70 dB. Each version of the play list included one trial at each probe frequency and five trials at each training frequency/amplitude combination, sampled five times at random without replacement during the generalization tests of individual participants.

Results and discussion

Here we report the results of AP testing, discrimination training, and testing for octave generalization.

AP testing

Using Athos et al.’s (2007) criterion score for highly accurate AP (≥24.5, based on 36 tones), we separated participants into n = 4 AP possessors, 30.3 ± 2.2, and n = 24 nonpossessors, 6.8 ± 0.4. All summaries show M ± SE).

Discrimination training

Every participant responded significantly more to S+ than to S– tones: n = 28, percent correct = 96 % ± 0.5 %, binominal test, p < .0001, where percent correct = (mean percent response to S+ tones + [1 – mean percent response to S− tones]). In a mixed ANOVA, we found no significant effects for S+ frequency (high or low), F(1, 24) = 0.40, p = .53, \( \eta_{\text{p}}^2 = .0{2} \), AP status (possessor or nonpossessor), F(1, 24) = 0.01, p = .94, \( \eta _{\rm{p}}^2 > .01 \), or a significant interaction, F(1, 24) = 0.11, p = .74, \( \eta _{\rm{p}}^2 > .01 \).

Generalization testing

To include the 1000-Hz S+ and 2200-Hz S+ groups in the same analysis, we calculated standardized frequencies relative to S+ by dividing the larger frequency by the smaller frequency. In the 2200-Hz S+ group, we divided 2200 Hz by each probe frequency. In the 1000-Hz S+ group, we divided each probe frequency by 1000 Hz. Notice that the standardized frequency of the octave was always 2.0, which was either twice or half the frequency of the S+. The results for AP possessors were indistinguishable from those for nonpossessors, so we merged the results.

Figure 1 shows percent response as a function of the standardized frequencies of S+ and S−, the octave probe, and the remaining 25 probe tones. The dashed line shows an exponential function fit to the results, excluding the octave probe: \( y = {738}.{7} \times {1}{0^{{ - 0.{\text{99X}}}}} \), and the correlation of the function with the results is r(26) = .96, p < .0001. The choice of an exponential fit follows from the log-linear relationship between frequency and pitch. The octave probe is shown as a darkened point, which fits precisely on the line for the function calculated without response values for the octave. Notice that responding to S+ and to S− is not predicted well by the function: previous discrimination training resulted in near perfect discrimination between S+ and S−. In contrast, the probe stimuli were novel, so responding to them was pure generalization from previous training.
Fig. 1

Percent response as a function of the standardized frequencies of S+ and S–, of the octave probe (shown as a darkened circle), and of the remaining 25 generalization probes in Experiment 1. The dashed line shows an exponential function fit to the results. Recall that the location of notes within an octave is itself an exponential function (21/12 to 212/12). The standardized frequency at the octave is always 2.0. The fit is remarkably good for all of the generalization tones, including the octave.

Personal histories

We asked whether the participants’ gender or previous music and language training contributed to their ability to perform the discrimination, as measured by percent correct, or to generalize between S+ and its octave equivalent, as measured by the percent response to the octave-equivalent tone. None of the correlations and point-biserial correlations between these two performance measures and possible historical determinants (i.e., gender, AP score, years of training, number of instruments, or number of languages) approached significance, rs ≤ .16, ps ≥ .45. Also, variation in the participants’ first instrument training or first language training had no significant effects on discrimination or generalization, Fs(1, 21) ≥ 1.41, ps ≤ .25, \( \eta_{\text{p}}^2{\text{s}} \leqslant .06 \).

Conclusions

Cynx’s (1993) procedure failed to uncover evidence of octave generalization (the perception of octave equivalence) in European starlings, but, more importantly here, the procedure also failed to uncover evidence of octave generalization in humans. Our findings agree with those of Kallman (1982) and Allen (1967), who found that human participants preferred pitch height to pitch chroma as a basis for judging the similarity of two tones. Generalization in this task did not depend on prior musical or language experience.

Experiment 2

Although Cynx’s (1993) procedure did not provide evidence of octave equivalence in humans, it provided a starting point for our second experiment and tied our procedural changes in the present experiment more directly with the literature. By instituting a few important changes, we hoped to show octave equivalence with a similar operant procedure. We had several requirements for an effective test of octave equivalence. Most importantly, the new procedure had to succeed in measuring octave equivalence in humans with a broad range of musical training, from nonexistent to highly skilled. We chose a task that required minimal instructions, so that versions of the procedure could be developed for humans of all ages and verbal abilities, as well as other species. We used a go/no-go procedure to make the work compatible with Experiment 1. Also, we used variants of the go/no-go protocol to eliminate artifacts and establish the validity of our generalization measure of equivalence.

We began with discrimination training that provided feedback and reward for responding to a defined range of musical notes in the middle of an octave, the go notes, but withheld reward for responding to other notes in higher and lower pitch ranges within the same octave, the no-go notes (see Fig. 2A). Although we had never trained a three-range discrimination to span exactly an octave, it seemed likely that humans and many other species could learn this discrimination as readily as they had other three-range discriminations (Weisman et al., 1998). The critical test of octave equivalence was generalization of the discrimination to the octave just above the training octave (see Fig. 2B). Testing outside the pitch range of discrimination training promised to eliminate difficulties in observing octave equivalence reported by Kallman (1982) and in Experiment 1.
Fig. 2

Summary of the procedures in Experiment 2. (A) In discrimination training, participants in the middle-range S+ group were rewarded (positive feedback) for responding to a range of notes in the middle of an octave but not for responding to notes in the higher and lower pitch ranges within the same octave. In the middle range S- group, the correlation between responding and reward was reversed, so that participants were rewarded for responding to notes in the higher or lower pitch ranges in the same octave but not for responding to notes in the middle range. (B) The notes included in the S+ and S– ranges during discrimination training in Octave 4, along with the equivalent novel ranges of notes in Octave 5—all presented without feedback—comprised the test notes during generalization.

Method

Participants

Forty-six students at the University of Alberta and 33 students at Queen’s University participated for course credit. The details of their written questionnaires and consents, and the conditions of their participation were the same as in Experiment 1. Research Ethics Boards at the University of Alberta and Queen’s University approved our protocols.

Participants ranged in age from 17 to 25 years old, M = 19; 26 were men, 51 were women; 2 did not report their gender. We determined participants’ AP status using Athos et al.’s (2007) note naming test. One participant was an AP possessor.

We have provided some information about the participants’ histories here. Twenty-one participants had no formal music training, and 25 began their training with piano. The remainder began with a variety of other instruments including guitar and violin. Among musically trained participants, the total amount of training across all instruments varied from 3 months to 26 years, M = 7 years. Thirty-eight played at least one additional instrument, and 14 had passed formal examinations in music.

Fifty-two participants learned English as their first language, eight learned Mandarin first, and 18 participants learned one of 10 other languages first. By pooling the music and language history from Experiments 2 and 3, we were able to provide an extensive analysis of the contributions of the more common types of first music training and of first language training to the results of our discrimination training and generalization testing. These analyses are presented in the Results section of Experiment 3.

Apparatus, stimuli, and procedures

Several aspects of the method were unchanged from Experiment 1: (a) the apparatus and methods for producing notes; (b) the phases of the experiment, including practice trials and phase order: AP testing, discrimination training, then generalization testing; (c) the computer screens, responses, and instructions used in each phase; (d) the scoring of responses in each phase; and (e) details of the note-naming test of AP. The training sine wave tones differed in Experiment 2; they were synthesized at the frequencies of successive chromatic musical notes from C4 (262 Hz) to B5 (988 Hz).

Operant-discrimination training

We conducted discrimination training between the middle note range, and the lower and upper note ranges in Octave 4 (four notes per range, see Fig. 2a) separately in two reward order groups. In one reward-order group: the middle-range S+ group, in the middle note range, responses to E, F, F#, or G were rewarded (with positive feedback); in the lower note range, responses to C, C#, D, or D#, and in the upper note range, responses to G#, A, A#, or B, were not rewarded (no positive feedback). In a second reward order group: the middle-range S– group, the relationship between responding to the notes and reward was reversed, so that responses to the middle notes were not rewarded, but responses to notes in the upper and lower ranges were rewarded. This had the effect of counterbalancing the order of rewarded and unrewarded notes between the two groups, which eliminated artifacts that may have arisen from a predominance of go or no-go notes, or any other peculiarity due to the selection of go and no-go notes. The majority of participants heard two versions of each tone to control for the effects of amplitude on pitch perception (Moore, 1989), one at 70 dB (SPL) and another at 80 dB (SPL), as in Experiment 1.

We trained a subset of participants in each group with 60 dB (SPL) tones only to control for aural harmonics (Newman, Stevens, & Davis, 1937). The membranes and bones of the ear can introduce aural harmonics and the louder the tone, the louder the harmonics. Hence, playing tones at 60 dB greatly reduced the potential that aural harmonics could confound our explanation of the results. That is, by controlling for aural harmonics with the 60-dB group, we could rule out a potential confound: that the first harmonic of any sound is its octave equivalent, so the presence of loud harmonics in our training stimuli might explain a finding of octave equivalence.

Generalization testing

We conducted generalization testing using the 24 notes that comprise Octaves 4 and 5 (Fig. 2b), in the absence of reward (positive feedback). The ranges of notes designated as the S+ and S− ranges during discrimination training in Octave 4 and corresponding ranges of novel notes in Octave 5, comprised the test notes presented during generalization. Generalization of the pattern of responding acquired during discrimination training in Octave 4 to novel notes in Octave 5 would provide evidence for octave equivalence. It is important to understand that, during both the training and generalization phases, notes were selected at random and without replacement. The frequencies for all of the musical notes in an equal-tempered (chromatic) scale with A4 = 440 Hz, including those in the octaves used for discrimination and generalization here and in Experiment 3, are widely available (e.g., Suits, 2012).

Results and discussion

AP testing

Using Athos et al.’s (2007) criterion score for highly accurate AP (≥24.5), we found one AP possessor (AP score = 31) and 78 nonpossessors (7.5 ± .3).

The results of discrimination training and generalization testing are shown in Figs. 3 and 4, respectively, averaged for the four notes in each pitch range over all trials. Statistical analyses of the results are based on these summaries.
Fig. 3

Discrimination averaged over the two note-amplitude groups, presented separately for the middle-range S+ and middle-range S– groups in Experiment 2. Participants in the middle-range S+ group responded more to notes in the middle range than to notes in either the lower or the higher S– range. Participants in the middle-range S– group responded less to notes in the middle range than to notes in either the lower or the higher S+ range. Error bars represent standard errors of the means.

Fig. 4

Generalization pooled over the two note-amplitude groups, displayed separately for the middle-range S+ and middle-range S– groups, in Experiment 2. Participants generalized the middle-range S+ and middle-range S– discriminations learned in Octave 4 to Octave 5. Error bars represent standard errors of the means.

Discrimination training

Not all participants learned the note-range discrimination. The criteria for acquisition of the discrimination: every learner (n = 64) but no nonlearners (n = 15) responded more on average to notes in the middle range than to notes in either the lower or upper S−range in the middle-range S+ group or the reverse in the middle-range S− group. By these criteria, five participants in the middle-range S+ group and 10 in the middle-range S− group failed to learn the discrimination. Here, we analyzed results for discrimination learners; later, we compared the performance of learners and nonlearners during the generalization test.

We conducted separate mixed 2 (amplitude) × 3 (note range) ANOVAs in the two reward-order groups. In the middle-range S+ group (see Fig. 3, upper panel), we observed a significant effect of note amplitude, F(1, 26) = 18.67, p < .001, \( \eta_{\text{p}}^2 = .{42} \), a significant effect of note range, F(2, 52) = 91.13, p < .001), \( \eta_{\text{p}}^2 = .77 \), and no significant interaction, F(2, 52) = 0.65, p = .52, \( \eta_{\text{p}}^2 > .01 \). Participants in the 60 dB group responded more overall than participants in the 70/80 dB group. In planned one-tailed comparisons pooled over the two amplitude subgroups, we determined that participants responded significantly more to notes in the middle range than to notes in the lower or upper ranges, ts(27) ≥ 10.17, ps < .001.

In the middle-range S– group (see Fig. 3, lower panel), we observed no significant effect of note amplitude, F(1, 34) = 0.21, p = .65, \( \eta_{\text{p}}^2 = .0{2} \), a significant effect of note range, F(2, 68) = 58.57, p < .001, \( \eta_{\text{p}}^2 = .{6}0 \), and a significant interaction, F(2, 68) = 5.27, p = .007, \( \eta_{\text{p}}^2 = .0{5} \). In tests of simple effects, ps ≤ .05, we determined that the amplitude subgroups differed at a marginal level of significance, p = .06, only in the lower note range, but both amplitude subgroups responded more in the lower and upper note ranges than in the middle range. As the pattern of accurate discrimination was the same, we pooled over the amplitude subgroups in planned one-tailed comparisons. We determined that the participants responded significantly less to notes in the middle, S– range than to notes in the lower or the upper, S+, ranges, ts(34) ≥ 7.20, ps < .001. In summary, both reward-order groups showed solid evidence of discriminating notes in the middle range from notes in the upper and lower ranges, with minimal intrusion of amplitude effects.

Generalization testing

As with discrimination training, we report results here only for discrimination learners. The analyses are similar to those reported for discrimination training, except that instead of comparing three note ranges, these analyses compared six ranges: three in Octave 4 and three in Octave 5 (see Fig. 4). We conducted separate mixed 2 (amplitude) × 6 (note range) ANOVAs in the two reward-order groups.

In the middle-range S+ group, we observed no significant effect of note amplitude, F(1, 26) = 1.88, p = .18, \( \eta_{\text{p}}^2 = .0{7} \), a significant effect of note range, F(5, 130) = 14.29, p < .001, \( \eta_{\text{p}}^2 = .{36} \), and no significant interaction, F(5, 130) = 0.86, p = .51, \( \eta_{\text{p}}^2 = .0{2} \). As in the analysis of discrimination training, we pooled results across the two amplitude groups. Here we conducted separate sets of planned one-tailed comparisons in Octave 4—the former training octave—and Octave 5—the novel generalization octave. In both octaves, participants responded significantly more to notes in the middle range than to notes in either the lower or the upper range, ts(27) ≥ 2.92, ps ≤ .003.

In the middle-range S– group, we observed no significant effect of note amplitude, F(1, 34) = 0.21, p = .65, \( \eta_{\text{p}}^2 < .0{1} \), a significant effect of note range, F(5, 170) = 22.09, p < .001, \( \eta_{\text{p}}^2 = .{37} \), and a significant interaction, F(5, 170) = 2.44, p = .036, \( \eta_{\text{p}}^2 = .0{4} \). In tests of simple effects, ps ≤ .05, we determined that the amplitude subgroups differed significantly only in the lower range of Octave 4, and not in any range of Octave 5. Both amplitude subgroups responded more to notes in the lower and upper ranges than to notes in the middle range in both octaves, so we pooled results across the two groups. Again, we conducted separate sets of planned one-tailed comparisons in Octaves 4 and 5. In both octaves, participants responded significantly less to notes in the middle range than to notes in the lower or the upper range, ts(34) ≥ 3.72, ps < .001.

In summary, we have reported that participants in both reward-order groups generalized the pattern of responses from note-range discrimination to the note-range generalization test. Here, we determined whether the predicted patterns of responding during generalization were observed in significant numbers of participants in each reward-order groups using binomial tests. We tested the middle-range S+ group in Octave 4, in which 18 of 28 participants showed the predicted pattern (more responding to notes in the middle range than in either the lower or the higher note range), chance = 1/3, p < .0001. In Octave 5, 15 of 28 participants showed the predicted pattern, chance = 1/3, p = .022. We also tested the middle-range S– group in Octave 4, in which 25 of 36 participants showed the predicted pattern (more responding to notes in both the lower and higher note ranges than to the middle range), chance = 1/3, p < .0001. In Octave 5, 24 of 36 participants showed the predicted pattern, chance = 1/3, p < .0001. Significantly, more participants than would be expected by chance showed the predicted patterns without feedback or reward in both octaves in both reward-order groups.

Reporting note-by-note comparisons

We have just presented the results of Experiment 2 using averages over the four-note pitch ranges. This greatly simplified statistical analyses and provided clarity in understanding the results. However, some readers may want a more detailed report: one that shows the results note by note across ranges, in both discrimination and generalization; these data are shown in Figs. 5 and 6, respectively. These figures illustrate that a finer-grained presentation of the results confirms the coarse-grained analysis by note ranges in Figs. 3 and 4. In the finer-grained analyses shown in Figs. 5 and 6, we found that the note ranges were not categorical: Responding gradually increased across an S+ range, then declined and reversed over the S– ranges. Furthermore, generalization (see Fig. 6) showed a correspondence between the note-by-note patterns of responding in Octaves 4 and 5. All of that said, the note-by-note analysis revealed nothing that contradicted our statistical analyses based on the averages for each note range.
Fig. 5

Percent response, represented note by note, during discrimination, pooled over the two note-amplitude groups and displayed separately for the middle-range S+ and middle-range S– groups in Experiment 2. The results confirm in detail the findings shown for discrimination averaged within note ranges (see Fig. 3). Error bars represent standard errors of the means.

Fig. 6

Percent response, represented note by note, during generalization, pooled over the two note-amplitude groups and displayed separately for the middle-range S+ and middle-range S– groups in Experiment 2. The results confirm in detail the findings shown for generalization averaged within note ranges (see Fig. 4). Error bars represent standard errors of the means.

Comparisons with nonlearners

Finally, we asked whether learning the note-range discrimination was necessary for generalization to Octave 5. We compared the percent predicted response scores of learners and nonlearners, for whom percent predicted response = percent response to the notes in Octave 5 that corresponded to the rewarded notes in Octave 4 + (100 % – percent response to the notes in Octave 5 that corresponded to the unrewarded notes in Octave 4). Higher predicted response scores would indicate better octave generalization. In the comparison, the predicted response scores of learners were 58.9 % ± 1.5 %, and those of nonlearners were 51.5 % ± 2.6 %. Learners, t(63) = 5.99, p < .0001, but not nonlearners, t(14) = 0.59, p = .58, n = 15, scored significantly above chance (50 %), and learners scored significantly higher than the nonlearners, t(77) = 4.33, p < .0001. In summary, learners but not nonlearners were able to generalize the note-range discrimination from one octave to the next.

Experiment 3

We conducted a third experiment with two purposes in mind: to replicate the results of Experiment 2 and to extend our results with a transfer test. The logic was identical to a demonstration of positive transfer from real objects to pictures, or in the opposite direction, when the contingency remained the same during transfer (Spetch & Friedman, 2006).

This additional phase introduced reward into the testing octave. For approximately half of the participants, reward was chroma-matched; that is, the contingencies were the same in the testing octave as in the training octave, which we refer to as positive transfer. For the remaining participants, the reward was chroma-reversed; that is, the contingencies in the testing octave were opposite those in the training octave, which we refer to as negative transfer. The goal was to assess whether participants would transfer more easily in the positive-transfer group than in the negative-transfer group because of mediation by octave generalization from the original discrimination.

Method

Participants

Forty-three students at the University of Alberta participated. The details of their written questionnaires and consents, as well as the conditions of their participation, were the same as in Experiments 1 and 2. The Research Ethics Board at the University of Alberta approved our protocols.

The participants ranged in age from 18 to 27 years old, M = 20; 17 were men and 26 were women. We determined participants’ AP status using Athos et al.’s (2007) note-naming test. No participant in this experiment possessed AP.

We provide information about the participants’ music and language histories here. Twenty-three of the participants had no formal music training, and ten began their training with piano. The remainder began with a variety of other instruments, including voice and violin. Among the musically trained participants, the amount of training varied from 1 to 13 years, M = 6 years. Fourteen played at least one additional instrument, and seven had passed formal examinations in music. Twenty-six of the participants had learned English as their first language, six had learned Mandarin first, and 11 had learned another language first. For the instruments and languages that were more common in our sample of participants from Experiments 2 and 3, we were able to perform statistical analyses to assess whether these differences impacted our results (see the Personal History section below for the results).

Apparatus, stimuli, and procedures

Most aspects of the method were unchanged from Experiment 2 with the exception of the inclusion of a transfer phase at the end of the experiment. Because both the middle-range S+ groups and the middle-range S– groups trained and tested with 60 dB (SPL) tones or with 70 and 80 dB (SPL) tones showed the same pattern of results, we conducted Experiment 3 using the middle-range S+ discrimination and played 70 and 80 dB (SPL) notes only.

Transfer testing

We conducted transfer testing with the 24 tones that comprised Octaves 4 and 5. Participants received reward in this phase depending on whether they were in the positive- or the negative-transfer group. For both transfer groups, responses to the tones from Octave 4 were rewarded (positive feedback) following the same contingencies as in training; that is, responses to E, F, F#, or G were rewarded, but responses to C, C#, D, or D# (the lower tone range) and to G#, A, A#, or B (the upper tone range) were not rewarded (no positive feedback). However, the transfer groups differed in that they had opposite contingencies in Octave 5: The positive-transfer group had the same contingencies in Octave 5 as in Octave 4, whereas the negative-transfer group had reversed contingencies in Octave 5. During transfer trials, notes were selected and played at random and without replacement, as they had been in training and generalization.

Results and discussion

AP testing

Using Athos et al.’s (2007) AP criterion score, we found only nonpossessors (7.4 ± 0.4).

Operant discrimination training

Not all participants learned the note-range discrimination. Every learner (n = 39) but no nonlearners (n = 4) responded more to notes in the middle S+ range than to notes in either the lower or upper S– range. Here, we will analyze results only for discrimination learners across all discrimination trials and then block by block during training. Later, we will compare the performance of learners and nonlearners during the generalization test.

We conducted an ANOVA comparing the percent response across ranges and observed a significant effect of note range (see Fig. 7), F(2, 76) = 156.58, p < .001, \( \eta_{\text{p}}^2 = .{8}0 \). In planned one-tailed comparisons, we determined that participants responded significantly more to notes in the middle range than to notes in the lower or the upper range, ts(38) ≥ 14.89, ps < .001.
Fig. 7

Discrimination training in Experiment 3. Participants in the middle-range S+ group responded more to notes in the middle range than to notes in either the lower or the higher S– range. Error bars represent standard errors of the means.

Discrimination acquisition

Here, we asked about the speed of acquisition of the discrimination in Octave 4. Figure 8 shows the course of discrimination in successive ten-trial blocks over 16 blocks of trials. The choice of ten-trial blocks was a compromise; it ensured that at least two trials with notes in each pitch range were presented in each block and that early learning was well represented. Learning here consisted mainly of reduced responding to notes in the lower- and upper-range S− ranges, with almost no change in responding to notes in the middle, S+ range. Notice that extinction of responding to notes in the S− ranges was incomplete: Responding remained at about the same level from the fourth trial block onward. We used percent correct responding to assess learning: The percent correct scores improved significantly, t(38) = 5.68, p < .001, from 58 % ± 1.6 % in the first trial block to 69 % ± 1.1 % in the 16th block. Discrimination improved over the first three or four blocks of training, but analyses based on the first versus the last half of the training session yielded the same pattern of significant differences as the analyses we report here for the entire session. Thus. humans who acquire the discrimination do so remarkably quickly.
Fig. 8

The course of the acquisition of discrimination in successive ten-trial blocks over the 16 blocks of trials in Experiment 3. Learning consisted mainly of reduced responding to notes in the lower and upper S– ranges, with almost no change in responding to notes in the middle, S+ range. Error bars representing the standard errors of the means for Blocks 1, 4, 8, 12, and 16 are shown; the other error bars—in Blocks 2 and 3, 5–7, 9–11, and 13–15 were removed to reduce confusion. (These were similar to the error bars shown here.) From the percent response to the S– note ranges, it appears that acquisition was complete by Block 3 or 4.

Generalization testing

As with discrimination training, we report results here only for discrimination learners. The analyses are similar to those reported for discrimination training, except that instead of comparing the three note ranges in Octave 4, these analyses compared six ranges: three in Octave 4 and three in Octave 5.

We observed a significant effect of note range, F(5, 190) = 53.02, p < .001, \( \eta_{\text{p}}^2 = .{58} \) (see Fig. 9). We then conducted separate sets of planned one-tailed comparisons in Octave 4—the training octave—and Octave 5—the novel generalization octave. In both octaves, participants responded significantly more to notes in the middle S+ range than to notes in either the lower or the upper S– range, ts(38) ≥ 4.08, ps < .001.
Fig. 9

Generalization in Experiment 3: Participants generalized the middle-range S+ discrimination learned in Octave 4 to Octave 5, replicating the findings of Experiment 2. Error bars represent standard errors of the means.

We also determined whether the predicted patterns of responding during generalization were observed in significant numbers of participants. As in Experiment 2, we used binomial tests. In Octave 4, 37 of 39 participants showed the predicted pattern (more responding to notes in the middle range than in either the lower or the higher note range), chance = 1/3, p < .00001, and in Octave 5, 21 of 39 participants showed the predicted pattern, chance = 1/3, p = .0007. Significantly, more participants than would be expected by chance showed the predicted pattern without feedback or reward in either octave.

Transfer testing

As with discrimination training and generalization testing, we report results here only for discrimination learners. The analyses are similar to those reported for generalization, except that we included transfer group (either positive or negative) as a between-subjects variable.

We observed significant effects of transfer group, F(1, 37) = 6.37, p = .016, \( \eta_{\text{p}}^2 = .{17} \), and note range, F(5, 185) = 10.20, p < .001, \( \eta_{\text{p}}^2 = .{2}0 \) (see Fig. 10). We also found an interaction between group and range, F(5, 185) = 4.96, p < .001, \( \eta_{\text{p}}^2 = .{13} \). Simple-effects tests conducted to analyze the interaction showed that responding in the lowest two ranges of the training octave was not significantly different between the transfer groups, ts(37) ≤ 0.19, ps ≥ .854, and in the highest range of the training octave, responding was marginally significantly higher in the negative-transfer group, t(37) = 1.99 p = .054. In contrast, in the transfer octave, responding was significantly higher in the negative-transfer group in all three ranges, ts(37) ≥ 2.86, ps ≤ .007, which suggests that participants were sensitive to the higher overall frequency of reward in the negative-transfer group.
Fig. 10

Positive transfer to the middle-range S+ discrimination in Octave 5 and negative transfer to the middle-range S– discrimination in Octave 5. Relative to generalization testing (see Fig. 9), responding increased in both transfer groups, but the increase was greater in the negative-transfer group. Negative transfer was also inferred from the failure of the function for the middle-range S– discrimination in Octave 5 to invert, or even flatten. Error bars represent standard errors of the means.

To understand how the transfer groups differed in the transfer octave, we conducted an ANOVA separately for each transfer group using the three note ranges and the two octaves as within-subjects variables. In the positive-transfer group, we observed significant main effects of octave, F(1, 20) = 7.13, p = .015, \( \eta_{\text{p}}^2 = .{21} \), and of note range, F(2, 40) = 26.41, p < .001, \( \eta_{\text{p}}^2 = .{36} \), and no significant interaction, F(2, 40) = 1.59, p = .217, \( \eta_{\text{p}}^2 = .0{7} \). In the negative-transfer group, we did not observe a significant main effect of octave, F(1, 17) = 2.30, p = .148, \( \eta_{\text{p}}^2 = .{11} \), but we did observe a significant main effect of note range, F(2, 34) = 13.60, p < .001, \( \eta_{\text{p}}^2 = .{12} \), and no significant interaction, F(2, 34) = 2.42, p = .104, \( \eta_{\text{p}}^2 = .{12} \). In other words, both the levels and the patterns of responding in the transfer octave differed significantly between the positive- and negative-transfer groups.

We then determined whether these patterns of responding, as measured by the percent predicted responding (i.e., responses that followed the pattern of responding predicted during discrimination training) were observed in significant numbers of participants. We used a separate set of binomial tests in each group. In the positive-transfer group, 18 of 21 participants showed the predicted pattern in Octave 4 (more responding to notes in the middle range than in either the lower or the higher note range), chance = 1/3, p < .001, and 12 of 21 participants showed the predicted pattern in Octave 5, chance = 1/3, p = .02. In the negative-transfer group, 13 of 18 participants showed the predicted pattern in Octave 4 (more responding to notes in the middle range than in either the lower or the higher note range), chance = 1/3, p < .001, but in Octave 5, only 8 of 18 participants showed that pattern, chance = 1/3, p = .222. Here, the most important finding is that, in the positive-transfer group, more participants than would be expected by chance showed the expected pattern in Octave 5, whereas in the negative-transfer group, the number of participants who showed the pattern predicted from the original discrimination (i.e., octave generalization, or more responding to notes in the middle range than in either the lower or the higher note range) did not differ from chance. The positive-transfer group showed the predicted pattern in Octave 5, but the negative-transfer group showed no significant pattern. In fact, only three of 18 participants responded to Octave 5 with a reversed pattern (i.e., less responding to notes in the middle range than to notes in either the lower or the higher note range), which was not significantly below chance, chance = 1/3, p = .102. The negative-transfer group was affected by the reversed contingencies, but they retained enough memory of the original discrimination to retard acquisition of the reversed discrimination.

Comparisons with nonlearners

We asked whether learning the note-range discrimination was necessary for generalization to Octave 5. We compared the percent predicted response scores of learners and nonlearners, where percent predicted response = percent response to the notes in Octave 5 that corresponded to the rewarded notes in Octave 4 + (100 % – percent response to the notes in Octave 5 that corresponded to the unrewarded notes in Octave 4). Higher predicted response scores would indicate better octave generalization. In the comparison, the predicted response scores of learners were 57 % ± 1 %, and those of nonlearners were 47 % ± 3 %. Learners, t(38) = 6.16, p < .0001, but not nonlearners, t(3) = 0.90, p = .43, n = 4, scored significantly above chance (50 %), and learners scored significantly higher than nonlearners, t(41) = 2.64, p = .012, in Octave 5. In summary, learners showed greater resistance to the change in the pattern of reinforcement in the negative-transfer group.

Personal history

Previous research suggested that octave equivalence is enhanced in experienced musicians (Allen, 1967) and that absolute pitch (Deutsch, Dooley, Henthorn, & Head, 2009) and relative pitch (Hove, Sutherland, & Krumhansl, 2010) are enhanced in people who first learned a tonal language. Here, we asked whether the participants’ previous music or language histories contributed to their performance during the original discrimination in Octave 4 (Measure 1), during the generalization tests in Octaves 4 and 5 (Measures 2 and 3), or during the transfer tests in Octaves 4 and 5 (Measures 4 and 5), as measured by percent predicted responding (i.e., responses that followed the pattern of responding predicted during discrimination training). We term these measures, considered as a group, the octave equivalence performance quintet.

To assess claims about music and language experience, we pooled the history questionnaire results from learners and nonlearners in Experiments 2 and 3 in order to obtain a sizable sample, n = 121, and to include the full range of variability in quintet scores. We excluded the results for the sole AP possessor, as her performance affected the correlations and means out of proportion to his role. The results for transfer were available only from Experiment 3, n = 43.

Our main tool for evaluating the contributions of music and language histories to octave equivalence was a correlational analysis. The measures of musical experience were internally consistent: Years of music training and AP scores (within the range of AP nonpossessors), r(119) = .22, p = .015, and years of training and number of instruments played, r(119) = .70, p < .0001, were significantly correlated.

The measures in the octave equivalence performance quintet were also internally consistent: Performance during generalization in Octaves 4 and 5 and during transfer in Octave 4 correlated significantly with performance during original discrimination training, rs(119) ≥ .34, ps ≤ .00014, and performance during transfer of training in Octave 5 correlated significantly with performance during generalization in Octave 5, r(41) = .42, p = .005.

Early learning of a tone language—for instance, Mandarin—or of a keyboard instrument—for instance, piano—appears to contribute to both absolute and relative pitch perception (Deutsch et al., 2009; Hove et al., 2010). But first learning of Cantonese or Mandarin (tonal) or of Korean (pitch-accented) language did not result in significantly higher predicted discrimination or generalization responses than did first-language learning of English, Fs(3, 98) ≤ 1.06, ps ≥ .37, \( \eta_{\text{p}}^{{2}}{\text{s}} = .0{7} \). Although some participants had learned three or four languages, the number of languages did not correlate significantly with performance in the quintet, rs(119) ≤ .08, ps ≥ .88, and r(41) = .13, p = .42. Likewise, having first learned to play the piano did not result in significantly higher performance during discrimination or generalization than did first learning violin, voice, or guitar, Fs(3, 18) ≤ 0.43, ps ≥ .73, \( \eta_{\text{p}}^2{\text{s}} = .0{2} \). The present results do not provide evidence that participants’ first language or first instrument experience affected octave equivalence. However, given that Experiments 2 and 3 were not designed to provide such evidence, any final conclusion regarding this issue should await further research.

Most importantly, measures in the performance quintet correlated significantly with measures of musical experience. We observed significant negative correlations between AP scores (all within the AP nonpossessor range), years of musical experience, and performance during discrimination, and generalization in Octaves 4 and 5, rs(119) ≥ −.22, ps ≤ .015. In contrast, we observed a significant positive correlation between years of musical experience and performance during transfer (pooled for the positive- and negative-transfer groups) in Octave 5, r(41) = .44, p = .003. Experiments 2 and 3 were not designed to provide tests of between-group differences for participants with and without music training. That said, all of the untrained participants learned the discrimination in Octave 4, and all but one generalized the discrimination to Octave 5.

General discussion

In one sense, it is unnecessary to ask whether humans perceive the similarity between notes spaced an octave apart, because we already know the answer. From the earliest times, human cultures have recognized that the octave is formed at a ratio of 2:1 between notes and have used that fact to tune their musical instruments (Crickmore, 2003). Later, after written languages developed, diverse cultures—including those of India, Babylon, and Egypt—provided lasting records of how they used the octave to tune instruments, write music, and conceptualize mathematics (McClain, 1978).

The questions for an experimental scientist are how to measure octave equivalence, and under what conditions humans and other animals can be seen to perceive it. We conducted three experiments to study octave equivalence in humans. Each of the experiments followed a similar path, using discrimination and generalization procedures to search for octave generalization as a measure of octave equivalence.

Negative findings in Experiment 1

In Experiment 1, humans learned a discrimination between an S+ and an S− tone with high accuracy. During generalization testing, however, their responses to a tone spaced exactly an octave from S+ fit on the function predicted by pitch height generalization to other tones in the test, without recourse to octave generalization. The task proved exceptionally resistant to the perception of octave equivalence. Neither extensive music training nor absolute pitch perception improved participants’ octave generalization.

Our failure to observe octave equivalence in Experiment 1 does not mean that its findings are unimportant. At the very least, the experiment showed that Cynx’s (1993) procedure, which demonstrated the dominance of pitch height over octave equivalence perception in starlings, yields exactly the same result in humans—a finding consistent with previous research with humans (e.g., Allen, 1967; Kallman, 1982; Sergeant, 1983) and inconsistent with the conclusion that Cynx’s experiment demonstrated that starlings fail to perceive octave equivalence. More importantly, the findings of Experiment 1 led us to seek effective modifications to Cynx’s procedures. Clearly, we needed to look elsewhere to find a simple procedure for studying octave generalization successfully, which we did in Experiments 2 and 3.

Octave generalization and transfer in Experiments 2 and 3

The procedures of Experiments 2 and 3 differed from those of Experiment 1 in several ways. (a) We tuned our stimulus tones to notes on the chromatic scale, which meant that they increased from lowest to highest on a log-linear scale familiar to humans tutored in music. (b) During discrimination training, we divided the training octave into three ranges of four notes each and provided the same feedback and reward for responding to all four notes in each range. That is, we required much less accurate chroma identification than if a separate response were required for each of the 12 notes. (c) To reduce the influence of pitch height perception, octave generalization testing and explicit transfer of training were conducted in an adjacent octave, beyond the pitch height ranges of the training notes.

The procedure used here was adapted from one that we have used extensively to test for pitch height perception across songbird species (Weisman, et al., 2010b). In most of these experiments, sine-wave tonal stimuli increased in frequency on a simple linear scale, and thus were mistuned to human musical scales. Songbirds consistently discriminated between ranges more accurately than did either rats or humans (Weisman, Njegovan, Williams, Cohen, & Sturdy, 2004). An important feature of this procedure was that the rewarded and unrewarded ranges of tones alternated across frequencies, with at least three ranges (e.g., S−, S+, S−) and sometimes eight (e.g., S−, S+, S−, S+, S−, S+, S−, S+), presented during discrimination training.

In Experiments 2 and 3, using similar procedures but with stimuli tuned to the musical scale, we observed strong evidence of octave generalization in the next higher octave. In Experiment 2, we observed generalization whether the middle range was S+ or S−, eliminating possible artifacts introduced by either condition. However, octave generalization was influenced by whether notes in the middle range were S+s or S–s, because discrimination in the middle-range S+ condition depended mainly on inhibition of responding to notes in the higher and lower ranges, whereas discrimination in the middle-range S– condition depended mainly on excitation of responding to notes in the higher and lower ranges.

In Experiment 3, the transfer paradigm contributed important information about the durability of octave generalization. The discrimination learned during original training not only generalized, but octave equivalence also promoted positive transfer to a like discrimination and persistent negative transfer to the opposite discrimination in the next octave. These experiments promote generalization and transfer as powerful tools for the evaluation of octave equivalence. Future research should explore the retention of simple patterns of note-range discriminations over much longer periods, thus continuing to probe their durability in transfer tests.

Failure to learn the note-range discrimination

Several participants failed to learn the original note-range discriminations. More of these participants had difficulty learning the middle-range S− than the middle-range S+ discrimination, especially when the stimuli were played at lower amplitude. We are uncertain why the middle S− discrimination was more difficult, but it seems likely that more participants had trouble resolving the pitches out of the lower-amplitude notes. Of course, young adult human participants present more general issues, because of low perceived reward, inattention, and minor illness, all of which tend to interfere with accurate discrimination. It is possible that the increased difficulty of the discriminations in Experiments 2 and 3 contributed to attention to the octave during generalization.

Participants who failed to learn the initial discrimination provided an interesting control. Those participants who accurately discriminated the middle range in Octave 4 showed a similar pattern of responding in Octave 5, whereas those who failed to learn did not show any consistent pattern of responding across pitch ranges in Octave 5. This finding helped convince us that our observations of octave equivalence in successful learners were products of the generalization of successful discrimination training—not flukes or artifacts, but a solid, palpable phenomenon.

Tests for artifacts of the interaction of pitch and loudness

In Experiment 2, we played the notes at different amplitudes between groups as a check for aural harmonics (Newman et al., 1937). If the participants heard loud harmonics during discrimination training, in effect they may have heard the generalization stimuli during training and would then have been expected to show more octave generalization than participants who heard quieter harmonics. We found no evidence of confounding by aural harmonics; that is, we observed about the same pattern of generalization in Octave 5 whether the notes were relatively quiet, 60 dB, or relatively louder, 70 and 80 dB. We are uncertain why the louder aural harmonics of 70- and 80-dB notes had no more effect than the much quieter harmonics of 60-dB notes. One possibility is that introducing harmonics simultaneously with the stimulus notes may have muted effects when the testing involves notes presented successively, as it did here.

Presenting tones that include harmonics in octave equivalence research would make the stimuli more realistic but would also introduce potential confounds (see Burns, 1999), because the octave would be present in every training stimulus, as has been suggested to have occurred in Blackwell and Schlosberg’s (1943) report of octave equivalence in rats using sine-wave tones of uncertain purity (Burns, 1999). Another source of the confounding of pitch and loudness arises from the interaction of the two, such that louder sounds can be perceived as higher in pitch (Moore, 1989). We removed the correlation between pitch and loudness by presenting each pitch at two distinct amplitudes. Generally in this research, as was shown in the note-by-note presentation shown in Figs. 5 and 6, the percent response was an orderly function of pitch and of its correlation with reward.

Pitch perception is not unitary

When searching for evidence of octave equivalence perception, one needs to bear in mind that pitch perception is not a unitary ability, but is instead a suite of perceptual abilities. Of course, the suite includes the perception of octave equivalence, but it also includes pitch height and relative pitch perception. To complicate matters further, these abilities function simultaneously and without reference to one another. For example, MacDougall-Shackleton and Hulse (1996) found that in starlings, pitch height perception obscured relative pitch perception within the range of the training tone sequences, but not outside of that range. Similarly, when we tested humans in the range between the S+ and S− tones in Experiment 1, pitch height perception blocked octave equivalence. Experiments 2 and 3 found blended octave equivalence and pitch height perception, as shown by a decreased overall level of responding in Octave 5 as compared to the training octave, Octave 4. Reducing the influence of one pitch ability on another is good science, but in practice completely eliminating the influence of pitch height perception on octave equivalence is probably impossible.

Octave equivalence, music, and language training

Do genes, environment, or experience determine whether humans perceive octave equivalence? Our favorite answer to this question is “yes.” Most sounds in the environment include harmonics, and approximately half of the harmonics heard in these sounds are at distances of an octave apart (Pierce, 1999). Thus, the ability to resolve out the octave in sound and in sequences of sounds might be a useful skill. Octave equivalence affects both speech (Peter et al., 2008) and music (Burns, 1999), and octaves are used in this way in all cultures (Crickmore, 2003). Consider, as an example, that sex recognition and sexual signals are basic skills common across species. In humans, the fundamental frequencies of male and female voices are roughly an octave apart (Titze, 2000), and thus octave equivalence might be especially useful to humans comparing speech or song produced by a man to that produced by a woman. Untangling this knot of determinants may be impossible.

Allen (1967) found differences in octave equivalence in favor of participants with musical training. He tested with an equivalence-rating task that probably used trained musicians’ extensive experience in making subjective octave judgments. In contrast, Kallman (1982) found evidence of octave equivalence in less-trained individuals only when he reduced the influence of pitch height by testing over only a small range of pitches. Our most prominent example of the confluence of music ability and octave equivalence must be the single AP possessor in Experiment 2. Her ability to perceive octave equivalence in the generalization test exceeded 90 %, so accurate that, to make sense of the rest of the results, we needed to exclude her data. Pitch height perception can obscure the perception of octave equivalence, but AP perception appears to amplify equivalence, though one might wish for more extensive confirming evidence. Also, even if correct, our finding about AP does not untangle training from the genes, since the question as to which is more important in AP perception is still not settled.

In Experiments 2 and 3, more extensive music training hindered acquisition of the note-range discrimination and reduced octave generalization. These effects may be the result of negative transfer from the skills acquired in extensive music training. Then, during the transfer phase of Experiment 3, more extensive music training made octave equivalence a positive factor in acquiring the same discrimination and a negative factor in acquiring the opposite discrimination in a higher octave. In other words, our results were determined by the complex interaction of music training with our task. We are reasonably confident in our results for correlations between acquisition and generalization of equivalence with music training, as they were obtained with a sample of 121 participants; even so, the influence of music training here deserves further study.

With all this said, the dependence of octave equivalence on training in music might be limited, as rhesus monkeys can be induced to show octave generalization across musical passages (Wright, 2007; Wright, Rivera, Hulse, Shyan, & Neiworth, 2000). As we have shown here, humans show octave generalization with minimal instructions, which included no reference to music or octaves, and in a task that presented notes individually and at random with respect to their pitch heights and chroma. We conclude that music training may sometimes enhance octave equivalence in humans but that it appears unnecessary to the basic perception of equivalence.

The relationship between music and language is receiving increased attention (e.g., Fitch, 2005; Masataka, 2009; Patel, 2003). Researchers studying pitch perception, but not octave equivalence, have found effects of tonal language learning on relative and absolute pitch (Deutsch et al., 2009; Hove et al., 2010). Given that relatively few of our participants initially spoke tonal languages and that the observed effect was not significant, one should not be surprised that we can reach no conclusion about the influence of initial language on octave equivalence. Research explicitly designed to distinguish among language groups would be helpful for determining the relationship between language and octave equivalence.

Conclusion

Cynx (1993), whose procedure we replicated in Experiment 1, decided that his failure to observe octave generalization in starlings meant that starlings lacked pitch chroma perception. We propose an alternative interpretation: Cynx’s procedure is a measure of pitch height rather than chroma perception in both songbirds and humans. This finding offers a useful lesson for comparative psychologists: Despite their seeming reasonableness, no sound conclusions flow from comparisons between experimental evidence about one species and historical, personal, or anecdotal evidence about another. With the results of Experiments 2 and 3, we have solid evidence for octave equivalence in humans, whether tutored in music or not. We are ready now to tackle the question Cynx posed nearly 20 years ago: Do songbirds perceive pitch chroma and, therefore, show octave equivalence? We also have more questions about human octave equivalence. For example, does how we parse the octave during pitch range discrimination affect the perception of octave equivalence? And we have learned that octave equivalence requires neither musical stimuli nor musical training, but instead is a common feature of human auditory perception.

Author note

This research was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and a Discovery Accelerator Supplement, an Alberta Ingenuity Fund (AIF) New Faculty Grant, a Canada Foundation for Innovation (CFI) New Opportunities Grant, along with start-up funding and CFI partner funding from the University of Alberta, to C.B.S. M.H. was supported by an NSERC postgraduate scholarship, an AIF graduate student scholarship, and an Izaak Walton Killam Memorial Scholarship at the University of Alberta. We thank the Queen’s Biological Communication Centre for cooperation in providing facilities for the research conducted at Queen’s University.

Copyright information

© Psychonomic Society, Inc. 2012

Authors and Affiliations

  • Marisa Hoeschele
    • 1
  • Ronald G. Weisman
    • 2
  • Christopher B. Sturdy
    • 1
  1. 1.Department of PsychologyUniversity of AlbertaEdmontonCanada
  2. 2.Queen’s UniversityKingstonCanada

Personalised recommendations