Since its conception, a major component of working memory research has focused on phonological short-term memory (hereafter, pSTM), as represented by the concept of the articulatory loop (later called the phonological loop) of the original working memory model of Baddeley and Hitch (1974). The phonological loop is a subsystem that underpins the functioning of pSTM within the working memory system, which supports the temporal retention of information in the service of cognitive processes for a variety of tasks (e.g., Baddeley, 2012). In the past decade or so, awareness has increased of the important interactions between the phonological loop and long-term memory representations including language (Baddeley, 2000). These include interaction with phonological representations and semantic knowledge (Hulme, Maughan, & Brown, 1991; Jefferies, Frankish, & Lambon Ralph, 2006; Patterson, Graham, & Hodges, 1994). To date, much of this research has been limited to segmental aspects of phonology. In the present study, we expanded the range of this working memory principle to suprasegmental aspects. If the interaction between long-term representations and pSTM is a generalizable principle of working memory function, then there should be evidence for the influence of suprasegmental characteristics in working memory. This hypothesis was the focus of the present study.

Currently, multiple sources of evidence point toward the influence of long-term knowledge on pSTM. For example, short-term memory performance is better for words than for nonwords (Hulme et al., 1991; Thorn, Frankish, & Gathercole, 2009). This can be explained by the fact that, although phonological activation decays over time and/or is degraded by interference, long-term lexical/semantic representations can counteract or compensate, either through continuous interaction between short- and long-term memory (Jefferies et al., 2006; Patterson et al., 1994) or reconstruction of the short-term phonological representation (redintegration; cf. Hulme et al., 1991; Schweickert, 1993; Thorn et al., 2009). In addition, not only lexical/semantic, but also phonotactic information contributes to phonological short-term retention: Nonwords composed of frequent phoneme combinations are recalled more accurately than nonwords composed of infrequent phoneme combinations (the phonotactic frequency effect; e.g., Gathercole, Frankish, Pickering, & Peaker, 1999; Thorn, Gathercole, & Frankish, 2005).

To date, most studies have focused on phonemic elements. However, suprasegmental aspects are also important and obligatory components of the phonological word form. In addition, accent pattern sometimes acts as a distinctive lexical feature in some languages. For example in Japanese, the word /HA-shi/ “chopstick” has a high-pitched first mora, but the word /ha-SHI/ “bridge” has a high-pitched second mora (with capital letters representing the high-pitched morae). These words have the same phoneme sequence but different accent patterns (i.e., they are pitch-accent minimal pairs). Thus, each Japanese word has its own specific accent pattern that helps to differentiate it from other words. Consequently, vocabulary learning requires acquisition of both the lexical accent pattern and other word form elements.

Few studies have focused, however, on the influence of accent pattern on pSTM processing. In the present study, we focused on two issues: first, how accent patterns are processed in pSTM, and second, how accent patterns and phoneme sequences interact in the short-term retention of nonwords. We investigated these issues by utilizing nonword stimuli in order to minimize the influence of preexisting lexical–semantic representations on performance. Before considering the handful of existing studies that have explored pitch accent, we briefly describe the nature of pitch accent in Japanese.

Japanese is considered to be a mora-timed (Kubozono, 1995) or mora-rhythm language (McQueen, Otake, & Cutler, 2001). A mora is a subsyllabic unit composed of the following structures: a vocalic nucleus (V), a vowel with onset (together, CV, or CCV), a nasal consonant (N) in syllabic coda position, a geminate consonant (Q), or a long vowel (R). Another phonological aspect of Japanese is pitch accent. Japanese allows accent pattern changes without phonemic changes, unlike English stress (e.g., the vowel changes in “pro-duce'/pro'-duce”). In the standard theory of Japanese accent types (Kindaichi, 2001), the Japanese accent pattern can be categorized in terms of when the F0 contour drops within a word. For a tri-moraic example, the word /KA-ra-su/ “crow” has a high-pitch mora in the first position, and the F0 contour drops after the first mora. Thus, it is pronounced with a high–low–low pitch pattern (“type-1” pitch accent). The word /yu-MI-ya/ “bow and arrow” has a high-pitched mora in the second position, and the F0 contour drops after the second mora. It is pronounced low–high–low (“type-2” pitch accent). In contrast to these pitch-drop words, the word /sa-KA-NA/ “fish” has no F0 contour drops within the word and is pronounced low–high–high, which sounds like an almost flat pattern (“flat” type pitch accent).

Previous work (Sato 1993) has established differences of type frequencies between accent types with a Japanese accent dictionary compiled in 1981 but has not considered token frequency.Footnote 1 Ueno and colleagues (2014) computed the frequency of each pitch-accent type with respect to the log-transformed token frequencies for all 21,271 tri-mora nouns (removing duplicates) listed within the NTT database (Amano & Kondo, 1999).Footnote 2 These studies showed that the most frequent accent type for tri-mora nouns is flat, followed by type-1, whereas type-2 is the least common. Accordingly, like many other linguistic features across different languages (cf. Seidenberg & McClelland, 1989), Japanese pitch accent has a quasiregular structure (Ueno et al., 2014).

One developmental study (Yuzawa, 2002) showed an influence of accent pattern congruency on pSTM. Yuzawa manipulated the congruency of the accent patterns applied to real words and assessed two age groups of children (3–4 and 5–6 years old). The children’s memory span for real words (as measured by whether the recall of an item was phonemically accurate, no matter what accent pattern was applied to it in recall) was reduced when items were presented with incongruent rather than congruent accent patterns. However, this accent congruency effect was observed only in the younger children (3–4 years old). In addition, children in both age groups tended to correct incongruent accent patterns in recall (accent pattern correction; 65% of incongruent accent pattern items for younger children, and 83% for older children, were corrected, though the effect of age was not significant, probably because they were not instructed to correct the accent patterns). This effect of accent pattern congruency on repetition accuracy was also observed in a study of adults that employed four-mora words (Minematsu & Hirose, 1995). These results indicate the contribution of accent pattern knowledge to phonological short-term processing, which can be accounted for by mechanisms on two levels: the phonological level (Yuzawa, 2002) and the lexical–semantic level (Ueno, 2012; Ueno et al., 2014).

At the phonological level, the negative effect of the incongruent accent pattern is due to the decoupling and violation in the combination of the word’s phoneme sequence and accent pattern (Yuzawa, 2002). The absence of the accent congruency effect in older children could be due to the greater robustness of their phonemic representations and the resultant higher tolerance of stimulus degradation. In addition, the lexical–semantic level may also contribute to the observed effect (Ueno, 2012; Ueno et al., 2014). Younger children have an inflexible/weak link between phonological representations and semantic representations and/or less-developed semantic representations. In this situation, incongruent accent patterns act as a degraded input to the developing semantic system and thereby weaken the lexical–semantic contribution to short-term memory. Because older children have more linguistic experience, they are more likely to activate the lexical–semantic representation even with a degraded input.

Ueno and colleagues (Ueno, 2012; Ueno et al., 2014) investigated the semantic mechanism and its interaction with pitch-accent congruency via a combination of empirical investigations and an implemented computational model. These investigations were based on the notion that phonological forms are supported not only by phonological co-occurrence statistics but also by the automatic interaction between phonology and semantics (Jefferies et al., 2006; Patterson et al., 1994). The phonological system captures the quasiregular statistics (e.g., phonotactic probabilities, pitch-accent patterns) present in the language (cf. Seidenberg & McClelland, 1989) and accordingly, high-frequent/typical word forms are processed more efficiently and effectively than low-frequent/atypical patterns. The interaction between phonology and semantics occurs for all words but it will be especially important for the integrity of the phonological activation for the intrinsically weaker low frequent/atypical items (Jefferies et al., 2006).

Ueno and colleagues (Ueno, 2012; Ueno et al., 2014) tested this idea empirically through a series of immediate serial recall experiments. In one experiment, Japanese tri-mora words were selected in order to manipulate word frequency (high, low) and accent pattern congruency (congruent, incongruent). A greater effect of accent pattern congruency was found for low frequent words than high frequent words, suggesting that the accent pattern congruency is modulated by lexical–semantic factors. The influence of semantics on pitch-accent effects in serial recall were tested more directly in a second experiment, which manipulated word frequency (high, low), imageability (high, low), accent pattern congruency (congruent, incongruent), and accent pattern typicality (typical flat, moderately typical type-1, atypical type-2). As predicted, this experiment found that the effect of semantic factors (i.e., imageability by word frequency interaction) for atypical type-2 accent words was stronger than that for more typical flat and type-1 accent words. In addition, the contribution of long-term lexical prosodic knowledge and the underlying quasiregular statistical structure was also observed in the production of pitch-accent “regularization” pattern errors, which are errors reflecting the typicality of the accent pattern. For example, in accent pattern errors in the condition with congruent accent patterns, words that were presented with type-1 accent tended to be recalled with the more typical flat accent, not atypical type-2 accent. These key features were also simulated in an implemented model of spoken language that included mechanisms for an interaction between phonological and semantic processing (Ueno, 2012; Ueno et al., 2014), on the basis of the model architecture of Ueno, Saito, Rogers, and Lambon Ralph (2011).

Predictions

Previous explorations of pitch accent in serial recall have focused primarily upon the interaction between lexical–semantic representations and phonological structure. However, both pitch-accent and phonotactic statistics should be coded at the phonological level, and their influence should be present even without the interaction/support of semantic-lexical representations. In order to test this hypothesis, the present series of experiments employed immediate serial recall for nonwords and explored the influence of two types of phonological statistics. First, in order to replicate and extend previous explorations of phonotactic frequency (conducted previously in studies of English: cf. Gathercole et al., 1999; Thorn et al., 2005), we manipulated Japanese nonwords along this psycholinguistic dimension. Secondly, to test the influence of pitch-accent statistics at the purely phonological level, for the first time, we also varied the type of pitch-accent pattern applied to the nonwords. The predictions for these experiments were that: (a) as per the previous English experiments, nonwords comprised of high-frequency phonotactic elements would be recalled more accurately than phonotactically low-frequency items; (b) items presented with a typical, more common pitch-accent pattern (flat > type-1 > type-2) would be better recalled in terms of both phonemic and accent-pattern accuracyFootnote 3 and pitch-accent “regularization” errors would be observed more often for the atypical-pitch-accent items; (c) that these two phonological factors would interact given that both are encapsulated within the phonological system (e.g., there would be weaker accent effects for the items with high phonotactic frequency).

Finally, there is another prediction for the influence of accent pattern on phonemic accuracy: The presentations with flat patterns may result in lower accuracy than type-1 and type-2 patterns. Nonwords presented with a flat accent have many neighbors sharing same accent type in long-term memory. Previous studies (Sekiguchi, 2006; Sekiguchi & Nakajima, 1999) found that activation of phonemic neighbors was constrained by accent type during recognition, indicating the constraint strength/size of the flat pattern might be weaker than other accent types. Indeed, Ueno and colleagues (Ueno, 2012; Ueno et al., 2014) reported that participants showed the weakest recall performance for the most typical (flat) accent words in the high-frequency/low-imageability condition, in which their common recall errors were intrusions of real words that were not presented in the list but shared their accent pattern with the target items. Although it has been known that a large phonemic-neighborhood size facilitated short-term memory performance for words (Roodenrys, Hulme, Lethbridge, Hinton, & Nimmo, 2002) and for nonwords (Thorn & Frankish, 2005), a large accent-type neighborhood size might impair phoneme recall performance.

To test these predictions, we conducted three immediate serial recall experiments on nonwords, differing in terms of the instruction and/or the number of items in a list. In Experiments 1 and 2 we employed three-item lists, whereas in Experiment 3 we employed four-item lists. In Experiments 2 and 3, the participants were asked to recall the nonwords with respect to both parts of the phonological form—that is, both the phonemic sequence and the pitch-accent pattern. For Experiment 1, in contrast, the accent pattern of the targets was not explicitly emphasized in the instructions, as was in the standard procedure for immediate serial recall. Each Results section depicts the phonemic and accent short-term retention accuracy and errors, separately. In additional supplementary analyses we investigated how the short-term representation of phoneme sequences and accent patterns interact.

Experiment 1

Method

Design

The experiment had a 2 (phonotactic frequency; high and low) ×3 (accent type; flat, type-1, and type-2) repeated factorial design. Both of these factors were manipulated within participants.

Participants

A group of 24 university students participated (12 females, 12 males). All were native Japanese speakers whose ages ranged from 18 to 29 years old, with the average age being 21.42 years.

Materials

All nonwords were tri-moraic sequences. A nonword in the present study was defined as a phoneme sequence that is not a word and also is not a part of longer words in the Japanese word frequency corpus employed (Amano & Kondo, 2000). All mora in the nonword stimuli had a CV structure, derived from legal combinations of Japanese vowels (a, i, u, e, and o) and consonants (k, s, sh, t, ch, ts, n, h, f, m, y, r, and w). Within a nonword item, the same consonant did not appear in successive mora.

The phonotactic frequency of each nonword was calculated on the method proposed by Tamaoka and Makioka (2004), who computed the frequency of all Japanese bi-mora using the same Japanese corpus cited above. The phonotactic frequency of CVCVCV nonwords was defined as the sum of the bi-mora frequency of the initial-middle and middle-final bi-mora. Nonwords, whose summed phonotactic frequency was 5,000 or less, were defined as phonotactically low-frequency nonwords. If both initial and final bi-mora frequency of a nonword were 25,000 or above (and thus, the phonotactic frequency of the nonword was 50,000 or above), the nonword was defined as phonotactically high frequency. For the experiments, 235 high- and 244 low-frequency nonwords were selected and grouped such that, within each phonotactic frequency group, no items repeated any of the same bi-mora sequences.

Recording and sound editing

Each nonword was digitally recorded in three pitch-accent patterns, flat, type-1, and type-2, by a male Japanese speaker. All 1,437 sound files (479 phoneme sequences × three accent types) were edited with Adobe Soundbooth CS4. Each item was extracted from the audio file and then noise-canceled at a reduction level of 80% and 25 dB. The duration of each item was time-stretched to 700 ms and the amplitudes of all files were equalized to match a selected benchmark file. Finally, to assure the prosodic and phonemic quality of the materials, these edited audio files were tested by means of a dictation and accent-type assessment.

Dictation and accent-assessment test

All 1,437 audio files were presented in random order through headphones and written to dictation by ten Japanese speakers. Only files dictated with 100% accurately were retained. Five Japanese speakers who had not participated in the dictation test assessed these files to determine the accent type (flat, type-1, or type-2). Any time during the test, participants could listen to model files for each accent type from the Japanese corpus (Amano & Kondo, 1999). Only files assessed accurately by four and more of the five participants were retained. Ultimately, 216 nonword items were selected as stimuli (36 phonotactically high- and 36 low-frequency nonwords recorded in each of three accent types). The stimuli and their phonotactic, initial and final bi-mora frequencies are listed in Appendix A.

Procedure

Stimuli were divided into three blocks. Each phonemic sequence (e.g., ka-te-ku) appeared in each of the three blocks but with a different accent type. Each block contained all 72 phonemic sequences, with an equal number of each accent type. For example, the nonword ka-te-ku was included in Block A with a flat accent, in Block B with a type-1 accent, and in Block C with a type-2 accent whereas, in contrast, nonword ka-to-ke was included in Block A with a type-2 accent, in Block B with a flat accent, and in Block C with a type-1 accent, and so forth.

The task was a nonword immediate serial recall test. Three items were aurally presented sequentially through headphones. The item duration was 700 ms, and 1,000-ms blanks were presented after each item. The three items within a list were from the same phonotactic frequency group (i.e., high or low) but their accent types were all different (i.e., flat, type-1, and type-2). The serial order of accent pattern was counterbalanced across participants and the order of lists was randomized. The order of blocks was counterbalanced across participants.

Participants were instructed to recall the items orally in the same order as presented, immediately after the presentation. They were asked to give answers for all three items even if they had forgotten them but not explicitly asked to correctly recall the pitch-accent pattern for each item. Before the test, they were given three practice lists. The experiment was administered using a Macbook Pro laptop computer (MB990J/A) with a 2.26-GHz processor, running Mac OS X 10.6.5 (10H574) and PsyScope (Cohen, MacWhinney, Flatt, & Provost, 1993).

During the task, the first author dictated recalled phonemes and categorized the recalled accent patterns online and auditorily recorded with the laptop’s built-in microphone, using QuickTime Player. After the experiment, the first author checked the dictated responses by listening to the recording, and a third party, an expert in linguistics, transcribed the phonemes and accent patterns of all recorded responses. Between-transcriber agreement was calculated on the basis of the first third (33%) of the responses. Their agreement on phonemes was 98.28% (10,190/10,368), and on accent pattern, 96.41% (1,666/1,728). Thus, the transcriber judgments of phoneme and accent pattern were judged to be reliable.

Results

In accordance with previous studies (Ueno, 2012; Ueno et al., 2014; Yuzawa, 2002), we employed three indices to examine the results: phoneme accuracy score, pitch-accent pattern accuracy score, and accent pattern error score (type of error). The first index reflects whether short-term retention of phoneme sequences was successful, irrespective of the accent pattern accuracy. Likewise, the pitch-accent accuracy score was independent of the phonemic accuracy.

Phoneme accuracy score

We counted the number of accurately recalled phonemes. The rates of phoneme accuracy for each level are shown in Fig. 1, and the results of a two-way analysis of variance (ANOVA) are shown in Table 1. We found a significant main effect of phonotactic frequency, with better performance for phonotactically high-frequent than low-frequent nonwords. In addition, a significant main effect of accent type was also found. A multiple comparison (Shaffer’s method) confirmed that flat presentation showed lower performance than type-1 presentation [t 1(23) = 3.27, adj. p = .01, d = 0.19; t 2(70) = 2.61, adj. p = .03, d = 0.32], or than type-2 presentation, in the by-subject analysis only [t 1(23) = 3.09, adj, p = .01, d = 0.14; t 2(70) = 1.81, adj. p = .08, d = 0.23]. We observed no significant difference between type-1 and type-2 presentation [t 1(23) = 1.03, adj. p = .31, d = 0.06; t 2(70) = 0.76, adj. p = .45, d = 0.10]. The interaction between phonotactic frequency and accent type was not significant.

Fig. 1
figure 1

Rates of phoneme accuracy in each experiment. (Error bars represent SEs.)

Table 1 Outcomes of the ANOVAs for phoneme accuracy in each experiment

Accent pattern accuracy score

The rates of accent pattern accuracy are shown in Fig. 2, and the results of a two-way ANOVA are shown in Table 2. We found a significant main effect of accent type. The result of a multiple comparison (Shaffer’s method) is shown in Table 3. It confirmed that type-2, the most atypical accent type, was recalled less accurately than the flat pattern, the most typical one, or the type-1 accent. The difference between type-1 accent and flat pattern was not significant. The main effect of phonotactic frequency and the interaction between it and accent type were not significant.

Fig. 2
figure 2

Rates of accent pattern accuracy in each experiment. (Error bars represent SEs.)

Table 2 Outcomes of the ANOVAs for accent pattern accuracy in each experiment
Table 3 Outcomes of the multiple-comparison analyses for the main effect of accent pattern on accent pattern accuracy in each experiment

Accent pattern error score

Error responses were defined as responses on which all six phonemes were recalled accurately but the accent patterns were incorrect. We categorized accent errors into six patterns and defined the extent/strength of the regularization according to the direction and distance of accent change. For example, a type-2 accent can potentially change into a flat accent or a type-1 accent, and the change into flat is the strongest regularization. We hypothesized (see the introduction) that the nature of an accent error would tend to reflect the typicality of the accent pattern; thus, an error pattern moving toward the most typical, flat, pattern would be frequent, but errors toward the most atypical accent, type-2, would be infrequent.

To investigate whether this was the case, we counted each accent pattern error, collapsing phonotactic frequency levels (given the different numbers of phonemically accurate trials across levels). Table 4 (top line) shows the frequency of each accent pattern error. A chi-squared test found significant differences between the error types [χ 2(5) = 222.21, p < .05]. The outcomes of multiple comparisons (Ryan’s method) are shown in Table 5 and confirmed regularization of the accent pattern. The strongest regularization error, “type-2 → flat,” was significantly more frequent than all other errors, and a moderate regularization error, “type-2 → type-1,” was significantly more frequent than the others. In addition, the error “flat → type-1” was more frequent than errors moving toward the most atypical accent, type-2 (i.e., “flat → type-2” and “type-1 → type-2”). The only counter pattern was that the regularization error “type-1 → flat” was less frequent than the irregularization error “flat → type-1.”Footnote 4

Table 4 Frequencies of each accent pattern error in each experiment
Table 5 Outcomes of multiple-comparison analyses on accent pattern errors

Summary of results

The main results obtained from Experiment 1 were as follows: (1) Phonotactically high-frequent nonwords were recalled more accurately than less-frequent ones in phoneme accuracy scores. (2) The most atypical accent pattern (type-2) was recalled less accurately than the more typical ones (flat and type-1) in the accent pattern accuracy scores. (3) The most frequent errors in accent patterns were accent pattern changes from the most atypical type (type-2) into the most typical one (flat). (4) Phonotactic frequency did not affect short-term retention of the accent pattern. And (5) phonemes of flat-presented nonwords were recalled less accurately than nonwords presented with the type-1 and type-2 accents, although this was observed only in the by-subject analysis.

Experiment 2

Although we manipulated pitch-accent patterns and employed the scores for accent pattern accuracy and error in Experiment 1, the participants were not explicitly instructed to retain the accent patterns. Experiment 2, in contrast, required the participants to retain not only the phoneme sequences but also the accent patterns. The procedure and all other materials were identical to those in Experiment 1, except for the instructions. Participants were instructed to recall items—not only the phoneme sequence but also its accent pattern—immediately after the presentation, in the same order that they were presented. All three accent patterns were included in each list (i.e., flat, type-1, and type-2), and serial order was completely randomized. In all, 24 university students participated (13 females, 11 males). All were native Japanese speakers whose ages ranged from 18 to 25 years old, with the average age being 20.29 years.

Results

All three indices (phoneme accuracy, accent pattern accuracy, and accent pattern errors) were defined in the same way as in Experiment 1. Five participants’ data were removed because their item presentation orders were based on the same random seed, due to a programming error. The scoring agreement between transcribers, as calculated by the initial third of all responses (as we described for Exp. 1), was 98.32% (8,076/8,208) for phonemes and 97.66% (1,336/1,368) for accent types.

Phoneme accuracy score

Rates of phoneme accuracy for each level are shown in Fig. 1, and the results of a two-way ANOVA are shown in Table 1. We found a significant main effect of phonotactic frequency, with better performance of phonotactically high-frequent than low-frequent nonwords. Though a main effect of accent type was not found, the interaction between phonotactic frequency and accent type was significant in the by-subject analysis and marginally significant in the by-item analysis, reflecting the fact that the effect of accent type was significant only in phonotactically low-frequent nonwords [F 1(2, 36) = 4.99, p = .01, η p 2 = .22; F 2(2, 70) = 3.80, p = .03, η p 2 = .10], not in phonotactically high-frequent nonwords [F 1(2, 36) = 0.48, p = .62, η p 2 = .03; F 2(2, 70) = 0.34, p = .71, η p 2 = .01]. A multiple comparison (Shaffer’s method) for phonotactically low-frequent nonwords revealed that items with a type-1 accent were recalled more accurately than those with a flat accent [t 1(18) = 3.10, adj. p = .02, d = 0.37; t 2(35) = 3.19, adj. p = .01, d = 0.62], and than those with a type-2 accent only in the by-subject analysis [t 1(18) = 2.53, adj. p = .02, d = 0.23; t 2(35) = 1.69, adj. p = .10, d = 0.38]. However, the difference between flat and type-2 accents was not significant [t 1(18) = 1.01, adj. p = .33, d =0.13; t 2(35) = 0.93, adj. p = .36, d = 0.23]. The simple main effects of phonotactic frequency were significant in all accent type conditions [F 1s(1, 18) >14.39, ps < .01, η p 2s > .44; F 2s(1, 70) >7.19, ps < .01, η p 2s > .09].

Accent pattern accuracy score

The accent pattern accuracy for each level is shown in Fig. 2, and the results of a two-way ANOVA are shown in Table 2. We found an effect of accent type. The result of a multiple comparison (Shaffer’s method) is shown in Table 3. It confirmed that the most typical, flat, accent was recalled more accurately than the most atypical, type-2, accent and than the moderately typical type-1 accent. The item analysis showed a significantly higher performance for type-1 accent than type-2 accent. In addition, a main effect of phonotactic frequency was also found, with better performance in the phonotactically high-frequency condition than in the low-frequency condition. The interaction was not significant.

Accent pattern error score

Table 4 shows the frequencies of each accent pattern error. A chi-squared test revealed significant differences between error types [χ 2(5) = 37.56, p < .05]. The outcomes of multiple comparisons (Ryan’s method) are shown in Table 5 and show regularization of the accent pattern. The strongest regularization error, “type-2 → flat,” was more frequent than “type-1 → flat,” “flat → type-1,” and “flat → type-2” errors. Moderate regularization errors (“type-2 → type-1”) were more frequent than irregularization errors (“flat → type-1”).

Summary of results

Experiment 2 replicated the main results observed in Experiment 1, including (1) the phonotactic frequency effect on phoneme accuracy, (2) the effect of accent pattern typicality on accent pattern accuracy, and (3) regularization errors as the most common type of accent pattern error. We also found (4) the phonotactic frequency effect on accent accuracy, in addition to the result in Experiment 1, in which accent patterns applied to phonotactically high-frequent sequences were recalled more accurately than those applied to phonotactically low-frequent sequences. Finally, we observed (5) inferior performance for the flat accent type on phoneme accuracy (only in the subject analysis); the phonotactically low-frequent sequence with type-1 accent was recalled more accurately than the flat pattern and than type-2, though it appeared for phonotactically low-frequent nonwords.

Experiment 3

The two previous experiments robustly established the effects of accent pattern typicality and phonotactic frequency on the efficiency of pSTM. However, these two experiments did not elicit many errors, due to the participants’ generally high accuracy levels. In order to examine the interaction between short-term representation of phoneme sequences and accent patterns further, we needed to analyze trials in which the phonemic or accent representations had deteriorated. In Experiment 3, therefore, we constructed a more demanding memory load by employing four-item lists. The procedure and all other materials were almost identical to those used in Experiment 2. The four items within a list were from the same phonotactic frequency group (i.e., high or low), but their accent types were not the same: Three of the four had different accent patterns (flat, type-1, or type-2), and the last was assigned an accent type randomly, counterbalanced between lists. The serial order of the accent types and the order of the lists were randomized, and the order of blocks was counterbalanced across participants. Participants were instructed to recall both the phoneme sequence and its accent pattern immediately after the presentation, in the same order that it had been presented. In all, 24 university students participated, nine females and 15 males. They were all native Japanese speakers whose ages ranged from 18 to 26 years old, with an average age of 20.9 years.

Results

All three indices (phonemic accuracy, accent pattern accuracy, and accent pattern error) were defined in the same way as in Experiments 1 and 2. The scoring agreement between scorers, as calculated by all responses, was 98.31% (30,577/31,104) for phonemes and 98.15% (5,088/5,184) for accent types.

Phoneme accuracy score

The rates of phoneme accuracy for each level are shown in Fig. 1, and the results of a two-way ANOVA are shown in Table 1. We found a significant main effect of phonotactic frequency, with better performance in phonotactically high-frequent nonwords than in low-frequent nonwords. However, the main effect of accent type and the interaction between phonotactic frequency and accent type were not significant.

Accent pattern accuracy score

The rates of accent pattern accuracy are shown in Fig. 2, and the results of a two-way ANOVA are shown in Table 2. We found a significant main effect of accent type. The results of a multiple comparison (Shaffer’s method) are shown in Table 3. It confirmed that the most atypical accent pattern, type-2, was recalled less accurately than the most typical accent pattern, flat, or than type-1. The difference between flat and type-1 accents was not significant in the by-subject analysis, but a significant difference was observed in the item analysis, with higher performance for flat accents than for type-1 accents. Moreover, a main effect of phonotactic frequency was also found, with better performance in the phonotactically high-frequent condition than in the low-frequent condition. The interaction between the two factors was not significant.

Accent pattern error score

Table 4 shows the frequency of each accent pattern error. A chi-squared test found significant differences between error types [χ 2(5) = 77.61, p < .05]. The outcomes of multiple comparisons (Ryan’s method) are shown in Table 5 and fitted the expected results, with error patterns having a strong tendency to move toward a more typical accent type (“regularization”). The strongest accent regularization (type-2 → flat) was significantly more frequent than any of the other errors, except type-1 → flat, which is another type of regularization error.

Summary of results

We replicated the main phenomena: (1) a phonotactic frequency effect on phoneme accuracy, as well as (2) a typicality effect of accent pattern accuracy and (3) accent pattern regularization in the error analysis. In addition, (4) a phonotactic frequency effect on accent pattern accuracy was observed, as per Experiment 2. (5) Unlike in Experiments 1 and 2, we did not observe inferior performance for the flat accent pattern in phoneme recall.

Supplementary analyses

Finally, we investigated the interaction of phoneme and accent representations by analyzing phoneme- and accent-correct and -incorrect trials. The Results sections of Experiments 1 to 3 summarized the data for phoneme and accent accuracy separately. These analyses did not provide information about the balance between correct and incorrect accent responses in the correctly or incorrectly recalled phoneme sequences, or about the balance between correct and incorrect phoneme recall in the items with correctly and incorrectly recalled accent patterns. Table 6 shows the numbers of responses in each of the four response categories with reference to our two scoring indices (phoneme accuracy and accent accuracy), where the phonemic correct responses are based on the correctness of all six phonemes.

Table 6 Numbers of responses in four response categories in reference to two scoring methods (phoneme accuracy and accent accuracy) in each experiment

We conducted four analyses from the data collated from the three experiments: analyses of phoneme accuracy on (1) accent-correct trials and (2) accent-incorrect trials, and accent pattern accuracy on (3) phoneme-correct trials and (4) phoneme-incorrect trials. Participants and items with missing values were not included in the analyses. From this series of analyses, we examined the influence of the phoneme degradation on accent pattern representations and of the accent degradation on phonemic representations in pSTM. However, most of these additional analyses showed results similar to those reported for each experiment (and so are reported in Appendix B), except for the analyses of phonemic retention in accent-incorrect trials, which are considered below.

The results for phonemic accuracy in accent-incorrect trials are shown in Fig. 3. Two-way ANOVAs (Phonotactic frequency × Accent type) of phonemic accuracy in accent-incorrect trials in all three of the experiments were conducted after angular transformation. Tables 7 and 8 show the outcomes of these ANOVAs and of multiple comparisons of the main effect of accent type.

Fig. 3
figure 3

Phoneme accuracy rates of accent-incorrect trials. (Error bars represent SEs.). The categorization of accent type is based on the recalled accent patterns (i.e., wrongly produced accents).

Table 7 Outcomes of the ANOVAs for phoneme accuracy of accent-incorrect trials
Table 8 Outcomes of the multiple-comparison analyses for the main effect of accent pattern on phoneme accuracy in accent-incorrect trials

Results from the accent-incorrect trials—where the categorization of accent type was based on the recalled, not the presented, type—again showed a significant phonotactic frequency effect in all experiments, except for a marginally significant item effect in Experiment 1 and a positive accent typicality effect in Experiment 3; trials recalled with a flat accent showed more accurate phoneme retention than did trials recalled with type-1 and type-2 accents (significant in both the by-subject and by-item analyses). In Experiments 1 and 2, an effect related to accent pattern was detected but was not strong. The by-item analysis of Experiment 1 showed a main effect of accent type, and multiple comparisons revealed significantly higher performance for trials recalled with a flat accent than for trials recalled with a type-2 accent.Footnote 5 Furthermore, an interaction was obtained in the item analysis of Experiment 2: The phonotactic frequency effect was significant only in the type-2 condition [F 2(1, 32) = 14.39, p < .01, η p 2 = .31], but not in the flat [F 2(1, 32) < 0.01, p = .98, η p 2 = .00] and type-1 [F 2(1, 32) = 0.20, p = .66, η p 2 = .01] conditions. The effect of accent type was not significant in the phonotactically high-frequency condition [F 2(2, 26) = 3.12, p = .06, η p 2 = .19] or in the low-frequency condition [F 2(2, 38) = 2.62, p = .09, η p 2 = .12].

General discussion

The three experiments reported here investigated the influences of phonotactic frequency and pitch-accent pattern on immediate serial recall of Japanese nonwords. In keeping with the existing literature, we found clear evidence for the interaction between long-term representations and short-term memory performance—and extended this to suprasegmental phonological characteristics (Japanese pitch accent). Across all experiments, we found (1) a phonotactic frequency effect on retention of phoneme sequence (replicating previous studies); (2) a typicality effect of accent pattern on retention of accent pattern; and (3) accent pattern regularization in the error analysis. In addition, we found bidirectional interactions between phonemic and accentual components of phonology, such that there was (a) a phonotactic frequency effect on retention of the accent pattern when participants were explicitly required to recall the presented accent patterns (Exps. 2 and 3); and (b) a reduced retention of phoneme sequences for nonwords with a flat accent when recalled in shorter lists (Exps. 1 and 2), which disappeared when recalled in longer lists (Exp. 3). The relatively poor recall performance on flat-presented items was found for only the phonotactically low-frequent sequence in Experiment 2 (with a similar tendency in Exp. 1). Our supplementary analyses also indicated a positive accent typicality effect on phoneme accuracy in the accent-incorrect trials.

The robust effects of phonotactic frequency on phonemic retention and accent pattern typicality on accent retention suggest that the interaction between long-term knowledge and pSTM is a generalizable principle of working memory function. The phonotactic frequency effect observed on phonemic retention in this study provides a cross-language replication of previous studies conducted in English (e.g., Gathercole et al., 1999; Thorn et al., 2005). Note that the present and most of previous studies employed large open sets of materials as memory stimuli and that this might have maximized the contribution of long- term knowledge to STM performance. The present results indicate that the phonotactic effect generalizes to a mora-based language, which has different phonological structures to English. In our study, phonotactic frequency was defined in terms of bi-mora frequency and the mora is a larger phonological unit than a phoneme. Consequently, it would appear that, irrespective of language used and of the size of phonological unit, phonotactic probability has a clear impact on short-term memory.

The novel effect of accent pattern typicality on accent pattern retention indicates that multiple aspects of long-term phonological knowledge (i.e., accent pattern and phonemic structures) simultaneously affect pSTM. The underlying influence of the statistical structure of pitch accents (Sato, 1993) was further supported by an analysis of accent errors. Specifically, we found accent “regularizations” errors (see also Ueno, 2012; Ueno et al., 2014), in which there was a strong tendency for the erroneous accent pattern to adopt a more typical accent pattern. More generally, this finding supports theories of language and short-term memory that emphasize the importance of underlying statistical structures (cf. Seidenberg & McClelland, 1989) and the interaction between them, and further supports an approach that has received support from various computational models (Botvinick & Plaut, 2006; Gupta & Tisdale, 2009a, 2009b; Seidenberg & McClelland, 1989). One might argue that the accent type effect reflects the ease of memorizing the flat pattern, given that we observed no drop of accent across the nonword. Note, however, here that type-1 accent also showed better performance than type-2 accent, even though there was a drop of pitch in both cases. Thus, we suggest that the accent type effect is more likely to reflect in influence of accent typicality in pSTM

We also found a bidirectional relationship between phoneme and accent aspects in pSTM, though the occurrences of these interactions were dependent on the experimental conditions (list length and explicit instruction to recall the accent pattern). One type of interaction was a phonotactic frequency effect on accent retention. This effect might reflect the greater demands of retaining phonotactically low frequent sequences, which expends more of the general pSTM resources thus leaving less for retaining the target accent pattern. The impact of this greater demands could exhibit strongly when required retaining pitch-accent patterns intentionally (Exps. 2 and 3). In contrast, the influence of accent pattern typicality on phonemic retention was quite limited. Together, these facts imply that phoneme rather than accent retention is more resource-demanding in pSTM for Japanese speakers. This default phoneme-dominancy is supported by the fact that, across Experiments 1 and 2, providing an explicit instruction to recall the accent pattern improves accent retention without impacting phonemic accuracy (see Figs. 1 and 2).

Another type of the interaction was found in the lower recall performance of flat-presented nonwords on phonemic retention in Experiments 1 and 2. This effect might reflect the competition that arises between the greater number of flat-type accent neighbors (i.e., a cohort size effect) as noted in Introduction. However, in the case of our nonword recall task, the competition mechanism might not exert a strong effect on pSTM given that it was only present for phonotactically low-frequent nonwords in Experiment 2 and a similar tendency in Experiment 1 (see Fig. 1). One possible explanation is to assume that multiple factors operate in relation to phonotactic frequency. More specifically, there might be a negative effect of competition with neighbors, which contain same accent types, and a positive effect of phonotactic frequency, simultaneously. Allen and Hulme (2006) reported that recognition processes are strongly influenced by negative effects of neighborhood competition but production processes (which influence immediate serial recall) receive an additional contribution from rich long-term knowledge. Thus, it is conceivable that the benefit of high phonotactic frequency overcomes the negative effect of competition, particularly when recalling longer item lists (Exp. 3), in which the contribution of recognition/perceptual processes might be relatively weaker. Other mechanisms might also underpin the lower performance on flat-presented items in nonword recall. One possible difference between flat and the other two accent types (type-1 and type-2) could be the absence and presence of pitch drop. Although this is a post hoc explanation, the presence of pitch drop might make the items perceptually distinctive, which might subsequently facilitate the retention of phoneme sequences.

In accent-incorrect trials, the lower recall performance on flat-presented items was not found but instead an effect of accent typicality was observed: phoneme sequences “recalled” with a flat accent (but presented in another less typical accent) showed higher phoneme recall accuracy than items recalled with type-1 and type-2 accent (Exp. 3). This discrepancy may reflect differences arising from the recognition/perception versus production components underpinning pSTM. In accent-incorrect trials, the recalled accent patterns were generated by participants themselves and it may be that typicality has its greatest effect in speech production (for similar ideas, see Gathercole et al., 1999).

Finally, we note the influence of dialect. Japanese dialects can be categorized into three types: Tokyo, Keihan, and no-accent types (Kindaichi, 2001). The Tokyo dialect is the most common type, centered around Tokyo, and is the Japanese standard type. The second type is Keihan dialect, which centers around Osaka. These two types occupy the whole of Japan except a handful of small no-accent regions: an area around Fukushima, a small part within Fukui and Shizuoka, a consecutive region across Saga, Kumamoto and Kagoshima, and a consecutive region across Ehime and Kochi. Across these three dialects, some words are pronounced with different accent patterns. For example, for the word /ka-ra-su/ meaning crow, the pitch accent is assigned on the first mora in the Tokyo dialect but on the second mora in the Keihan dialect. The no-accent dialect is unique in that people in these regions do not use accent pattern to discriminate between words.

These regional variations of accent did not influence the results of the present experiments. In all cases, as shown by the F 1 significance, the empirical results were highly consistent across participants, though they were drawn from different parts of Japan (Appendix C). Likewise, Otake and Cutler (1999) found that people from no-accent regions responded to Tokyo dialect stimuli in the same, albeit somewhat attenuated, way as native Tokyo dialect speakers in various recognition experiments. This generalized effect presumably reflects daily exposure from broadcasting (Otake & Cutler, 1999) and also active migration of people. Moreover, Ueno and colleagues (Ueno, 2012; Ueno et al., 2014) reported the consistent use of accent patterns presented with a Tokyo dialect by their participants drawn from various areas in Japan.

Conclusion

Three nonword immediate serial recall experiments revealed the interaction between multiple aspects of long-term phonological representation (phonotactic frequency and pitch-accent typicality) in pSTM. These findings add to those already established for phonemic-based phenomena in English (Gathercole et al., 1999; Hulme et al., 1991; Jefferies et al., 2006) and suggest that the interaction between long-term and short-term memory is a generalized principle of working memory (e.g., Baddeley, 2012; Hulme et al., 1991; Patterson et al., 1994).