Skip to main content
Log in

Familiar units prevail over statistical cues in word segmentation

  • Original Article
  • Published:
Psychological Research Aims and scope Submit manuscript


In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. While decreasing the number of words to learn seems to make learning easier (think for instance of a list of to-be-memorized items or of the decreased number of to-be learned words used for infants in the work of Saffran et al., 1996a compared to adults), in fact, decreasing the number of words composing the artificial language reduces the differences between word-internal and word-external TPs. With three words, the TPs within words were 1 and the TPs between words were .5 (compared respectively to 1. and .33 for a four-word language), and the difference in frequency of occurrence decreases between words and part-words.

  2. For completion, we performed Linear Mixed Model (LMM) on the data with participants and items as random effects and Group as fixed effect. The LMM showed a significant effect of Group, F(2, 474) = 40.21, p < .001, a significant difference between the unfamiliarized group and both the part-word familiarized group, F(1, 313) = 44.71, p < .001, and the non-word familiarized group, F(1, 313) = 77.01, p < .001, while the difference between the part-word and nonword familiarized groups failed to reach the conventional significance threshold, F(1, 322) = 2,80, p = .095. These results thus lead to the same conclusions as the results obtained with the ANOVA.

  3. Taking the last two syllables of a word and the first syllable of another word would be another possible segmentation requiring six part-words. However, in that case, the TP of the last syllable of the part-words would be .50 and not 1.00 (as for the words). For the sake of equality, only the part-words composed of the last syllable of a word and the first two syllable of another word were used.

  4. To address the possibility that the preselected range may have been too narrow, analyses were run again with a [−300, 1000 ms] range, ensuring a very broad coverage. This new range was in fact the largest possible one, because a still larger range would have generated some overlaps between the response windows surrounding two successive target syllables. The resulting changes were quite minor. Means differed only by a few milliseconds, and the p values of the statistical tests reported in the main text differed only on their third or fourth decimals, never affecting their interpretation in terms of significance. Unsurprisingly, the rates of false alarms and misses decreased, but remained substantial. The mean rate of false alarms was 5.34 % for the part-word familiarized group and 5.30 % for the nonword-familiarized group. The mean rates of misses were 14.10% and 19.91 %, respectively. These analyses suggest that the relatively high rates of false alarms and misses were not due to ill-fitted exclusion criteria, but to genuine detection errors.

  5. We additionally performed LMM with participants as random effect and Group and Target as fixed effects. There was no effect of Group, F(1, 3723) = 0.295, p = 0.587, but a significant main effect of Target, F(1, 3723) = 59.37, p < 0.001, which was qualified by a significant Group × Target interaction, F(1, 3723) = 28.70, p < 0.001. Subsequent analyses taking into account the division between familiar and unfamiliar part-words were performed. For the part-word familiarized group, the response times were faster for the last syllables of both the familiar part-words, F(1, 1461) = 69.84, p < 0.001, and the unfamiliar part-words, F(1, 1425) = 64.15, p < 0.001, than for the last syllables of the words. There was no significant difference between familiar part-words and unfamiliar part-words, F(1, 998) = 0.02, p = 0.874. For the nonword-familiarized group, a significant difference was observed between familiar part-words and words, F(1, 1366) = 4.91, p = 0.027, while no significant difference was observed for unfamiliar part-words versus words, F(1, 1348) = 2.29, p = 0.131, and familiar part-words versus unfamiliar part-words, F(1, 902) = 0.40, p = 0.528. These results thus lead to very similar conclusions as the results obtained with the ANOVA.

  6. According to the UP interpretation, the mean response times for the last syllables of the part-words should not differ between the nonword familiarized group (i.e., 2nd syllable of the segmented unit if this group segment the speech stream into words) and the part-word familiarized group (i.e., last syllable of the segmented unit if this group segment the speech stream into part-words). Contradicting this prediction, the mean RTs were numerically slower for the nonword-familiarized group than for the part-word familiarized group. However, this difference did not reach significance (p = 0.25).


Download references


The authors are grateful to Pascal Morgan and Cédric Foucault for help with collecting the data.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Bénédicte Poulin-Charronnat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Poulin-Charronnat, B., Perruchet, P., Tillmann, B. et al. Familiar units prevail over statistical cues in word segmentation. Psychological Research 81, 990–1003 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: