Familiar units prevail over statistical cues in word segmentation

Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

doi:10.1007/s00426-016-0793-y

Familiar units prevail over statistical cues in word segmentation

Original Article
Published: 31 August 2016

Volume 81, pages 990–1003, (2017)
Cite this article

Psychological Research Aims and scope Submit manuscript

Bénédicte Poulin-Charronnat ORCID: orcid.org/0000-0003-3244-6747¹,
Pierre Perruchet¹,
Barbara Tillmann² &
…
Ronald Peereman³

586 Accesses
12 Citations
Explore all metrics

Abstract

In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When statistics collide: The use of transitional and phonotactic probability cues to word boundaries

Article 09 March 2021

Statistical word segmentation succeeds given the minimal amount of exposure

Article 26 October 2023

Short- and long-term influences of repeated speech examples on segmentation in an unfamiliar language analog

Article 19 January 2024

Notes

While decreasing the number of words to learn seems to make learning easier (think for instance of a list of to-be-memorized items or of the decreased number of to-be learned words used for infants in the work of Saffran et al., 1996a compared to adults), in fact, decreasing the number of words composing the artificial language reduces the differences between word-internal and word-external TPs. With three words, the TPs within words were 1 and the TPs between words were .5 (compared respectively to 1. and .33 for a four-word language), and the difference in frequency of occurrence decreases between words and part-words.
For completion, we performed Linear Mixed Model (LMM) on the data with participants and items as random effects and Group as fixed effect. The LMM showed a significant effect of Group, F(2, 474) = 40.21, p < .001, a significant difference between the unfamiliarized group and both the part-word familiarized group, F(1, 313) = 44.71, p < .001, and the non-word familiarized group, F(1, 313) = 77.01, p < .001, while the difference between the part-word and nonword familiarized groups failed to reach the conventional significance threshold, F(1, 322) = 2,80, p = .095. These results thus lead to the same conclusions as the results obtained with the ANOVA.
Taking the last two syllables of a word and the first syllable of another word would be another possible segmentation requiring six part-words. However, in that case, the TP of the last syllable of the part-words would be .50 and not 1.00 (as for the words). For the sake of equality, only the part-words composed of the last syllable of a word and the first two syllable of another word were used.
To address the possibility that the preselected range may have been too narrow, analyses were run again with a [−300, 1000 ms] range, ensuring a very broad coverage. This new range was in fact the largest possible one, because a still larger range would have generated some overlaps between the response windows surrounding two successive target syllables. The resulting changes were quite minor. Means differed only by a few milliseconds, and the p values of the statistical tests reported in the main text differed only on their third or fourth decimals, never affecting their interpretation in terms of significance. Unsurprisingly, the rates of false alarms and misses decreased, but remained substantial. The mean rate of false alarms was 5.34 % for the part-word familiarized group and 5.30 % for the nonword-familiarized group. The mean rates of misses were 14.10% and 19.91 %, respectively. These analyses suggest that the relatively high rates of false alarms and misses were not due to ill-fitted exclusion criteria, but to genuine detection errors.
We additionally performed LMM with participants as random effect and Group and Target as fixed effects. There was no effect of Group, F(1, 3723) = 0.295, p = 0.587, but a significant main effect of Target, F(1, 3723) = 59.37, p < 0.001, which was qualified by a significant Group × Target interaction, F(1, 3723) = 28.70, p < 0.001. Subsequent analyses taking into account the division between familiar and unfamiliar part-words were performed. For the part-word familiarized group, the response times were faster for the last syllables of both the familiar part-words, F(1, 1461) = 69.84, p < 0.001, and the unfamiliar part-words, F(1, 1425) = 64.15, p < 0.001, than for the last syllables of the words. There was no significant difference between familiar part-words and unfamiliar part-words, F(1, 998) = 0.02, p = 0.874. For the nonword-familiarized group, a significant difference was observed between familiar part-words and words, F(1, 1366) = 4.91, p = 0.027, while no significant difference was observed for unfamiliar part-words versus words, F(1, 1348) = 2.29, p = 0.131, and familiar part-words versus unfamiliar part-words, F(1, 902) = 0.40, p = 0.528. These results thus lead to very similar conclusions as the results obtained with the ANOVA.
According to the UP interpretation, the mean response times for the last syllables of the part-words should not differ between the nonword familiarized group (i.e., 2nd syllable of the segmented unit if this group segment the speech stream into words) and the part-word familiarized group (i.e., last syllable of the segmented unit if this group segment the speech stream into part-words). Contradicting this prediction, the mean RTs were numerically slower for the nonword-familiarized group than for the part-word familiarized group. However, this difference did not reach significance (p = 0.25).

References

Abla, D., Katahira, K., & Okanoya, K. (2008). On-line assessment of statistical learning by event-related potentials. Journal of Cognitive Neuroscience, 20(6), 952–964. doi:10.1162/jocn.2008.20058.
Article PubMed Google Scholar
Abrams, K., & Bever, T. G. (1969). Syntactic structure modifies attention during speech perception and recognition. The Quarterly Journal of Experimental Psychology, 21, 280–290. doi:10.1080/14640746908400223.
Article PubMed Google Scholar
Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9, 321–324. doi:10.1111/1467-9280.00063.
Article Google Scholar
Bertels, J., Franco, A., & Destrebecqz, A. (2012). How implicit is visual statistical learning? Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1425–1431. doi:10.1037/a0027210.
Article PubMed Google Scholar
Bortfeld, H., Morgan, J. L., Golinkoff, R. M., & Rathbun, K. (2005). Mommy and me: familiar names help launch babies into speech-stream segmentation. Psychological Science, 16, 298–304. doi:10.1111/j.0956-7976.2005.01531.x.
Article PubMed PubMed Central Google Scholar
Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, B33–B44. doi:10.1016/S0010-0277(01)00122-6.
Article PubMed Google Scholar
Cairns, P., Shillcock, R., Chater, N., & Levy, J. (1997). Bootstrapping word boundaries: a bottom-up corpus-based approach to speech segmentation. Cognitive Psychology, 33, 111–153. doi:10.1006/cogp.1997.0649.
Article PubMed Google Scholar
Christiansen, M. H., Allen, J., & Seidenberg, M. S. (1998). Learning to segment speech using multiple cues: a connectionist model. Language and Cognitive Processes, 13, 221–268. doi:10.1080/016909698386528.
Article Google Scholar
Cunillera, T., Càmara, E., Laine, M., & Rodríguez-Fornells, A. (2010). Words as anchors: known words facilitate statistical learning. Experimental Psychology, 57, 134–141. doi:10.1027/1618-3169/a000017.
Article PubMed Google Scholar
Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113–121. doi:10.1037/0096-1523.14.1.113.
Google Scholar
Dahan, D., & Brent, M. R. (1999). On the discovery of novel wordlike units from utterances: an artificial-language study with implications for native-language acquisition. Journal of Experimental Psychology: General, 128, 165–185. doi:10.1037/0096-3445.128.2.165.
Article Google Scholar
Dutoit, T., Pagel, N., Pierret, F., Bataille, O., & Van Der Vrecken, O. (1996). The MBROLA Project: towards a Set of High-Quality Speech Synthesizers Free of Use for Non-Commercial Purposes. Proc. ICSLP’96. Philadelphia, 3, 1393–1396.
Google Scholar
Fernald, A., & Morikawa, H. (1993). Common themes and cultural variations in Japanese and American mothers’ speech to infants. Child Development, 64, 637–656. doi:10.1111/1467-8624.ep9308115002.
Article PubMed Google Scholar
Franco, A., Eberlen, J., Destrebecqz, A., Cleeremans, A., & Bertels, J. (2015a). Rapid serial auditory presentation. A new measure of statistical learning in speech segmentation: Experimental Psychology. doi:10.1027/1618-3169/a000295.
Google Scholar
Franco, A., Gaillard, V., Cleeremans, A., & Destrebecqz, A. (2015b). Assessing segmentation processes by click detection: online measure of statistical learning, or simple interference? Behavior Research Methods,. doi:10.3758/s13428-014-0548-x.
PubMed Google Scholar
Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117, 107–125. doi:10.1016/j.cognition.2010.07.005.
Article PubMed Google Scholar
French, R. M., Addyman, C., & Mareschal, D. (2011). TRACX: a recognition-based connectionist framework for sequence segmentation and chunk extraction. Psychological Review, 118, 614–636. doi:10.1037/a0025255.
Article PubMed Google Scholar
Gebhart, A. L., Aslin, R. N., & Newport, E. L. (2009). Changing structures in midstream: learning along the statistical garden path. Cognitive Science, 33(6), 1087–1116. doi:10.1111/j.1551-6709.2009.01041.x.
Article PubMed PubMed Central Google Scholar
Giroux, I., & Rey, A. (2009). Lexical and sublexical units in speech perception. Cognitive Science, 33, 260–272. doi:10.1111/j.1551-6709.2009.01012.x.
Article PubMed Google Scholar
Gómez, R. (2007). Statistical learning in infant language development. In M. G., Gaskell (Ed.), The Oxford handbook of psycholinguistics (pp. 601-616). New York: Oxford University Press.
Gómez, D. M., Bion, R. A. H., & Mehler, J. (2011). The word segmentation process as revealed by click detection. Language and Cognitive Processes, 26, 212–223. doi:10.1080/01690965.2010.482451.
Article Google Scholar
Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task: access to separable statistical cues by individual learners. Journal of Experimental Psychology: General, 130, 658–680. doi:10.1037/0096-3445.130.4.658.
Article Google Scholar
Johnson, E. K. (2012). Bootstrapping language: Are infant statisticians up to the job? In P. Rebuschat & J. N. Williams (Eds.), Statistical learning and language acquisition (pp. 55–89). Berlin: De Gruyter Mouton.
Google Scholar
Johnson, E. K., & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: when speech cues count more than statistics. Journal of Memory and Language, 44, 548–567. doi:10.1006/jmla.2000.2755.
Article Google Scholar
Johnson, E. K., & Seidl, A. H. (2009). At 11 months, prosody still outranks statistics. Developmental Science, 12, 131–141. doi:10.1111/j.1467-7687.2008.00740.x.
Article PubMed Google Scholar
Johnson, E. K., & Tyler, M. D. (2010). Testing the limits of statistical learning of word segmentation. Developmental Science, 13(2), 339–345. doi:10.1111/j.1467-7687.2009.00886.x.
Article PubMed PubMed Central Google Scholar
Jusczyk, P. W. (1999). How infants begin to extract words from speech. Trends in Cognitive Sciences, 3, 323–328. doi:10.1016/S1364-6613(99)01363-7.
Article PubMed Google Scholar
Jusczyk, P. W., Hohne, E. A., & Bauman, A. (1999a). Infants’ sensitivity to allophonic cues for word segmentation. Perception and Psychophysics, 61, 1465–1476. doi:10.3758/BF03213111.
Article PubMed Google Scholar
Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999b). The beginnings of word segmentation in english-learning infants. Cognitive Psychology, 39, 159–207. doi:10.1006/cogp.1999.0716.
Article PubMed Google Scholar
Kim, R., Seitz, A., Feenstra, H., & Shams, L. (2009). Testing the assumptions of statistical learning: is it long-term and implicit? Neuroscience Letters, 461, 145–149. doi:10.1016/j.neulet.2009.06.030.
Article PubMed Google Scholar
Lew-Williams, C., Pelucchi, B., & Saffran, J. R. (2011). Isolated words enhance statistical language learning in infancy. Developmental Science, 14, 1323–1329. doi:10.1111/j.1467-7687.2011.01079.x.
Article PubMed PubMed Central Google Scholar
Mandel, D. R., Jusczyk, P. W., & Pisoni, D. B. (1995). Infants’ recognition of the sound patterns of their own names. Psychological Science, 6, 314–316. doi:10.1111/j.1467-9280.1995.tb000517.x.
Article PubMed PubMed Central Google Scholar
Mattys, S. L., & Jusczyk, P. W. (2001). Phonotactic cues for segmentation of fluent speech by infants. Cognition, 78, 91–121. doi:10.1016/S001002770000109-8.
Article PubMed Google Scholar
Mattys, S. L., Jusczyk, P. W., Luce, P. A., & Morgan, J. L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465–494. doi:10.1006/cogp.1999.0721.
Article PubMed Google Scholar
McQueen, J. M. (1998). Segmentation of continuous speech using phonotactics. Journal of Memory and Language, 39, 21–46. doi:10.1006/jmla.1998.2568.
Article Google Scholar
Minier, L., Fagot, J., & Rey, A. (2015). The temporal dynamics of regularity extraction in non-human primates. Cognitive Science,. doi:10.1111/cogs.12279.
PubMed Google Scholar
Morgan, J. L., & Saffran, J. R. (1995). Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Development, 66, 911–936. doi:10.1111/1467-8624.ep9509180265.
Article PubMed Google Scholar
Norris, D., McQueen, J. M., Cutler, A., & Butterfield, S. (1997). The possible-word constraint in the segmentation of continuous speech. Cognitive Psychology, 34, 191–243. doi:10.1006/cogp.1997.0671.
Article PubMed Google Scholar
Perruchet, P., & Poulin-Charronnat, B. (2012). Beyond transitional probability computations: extracting word-like units when only statistical information is available. Journal of Memory and Language, 66, 807–818. doi:10.1016/j.jml.2012.02.010.
Article Google Scholar
Perruchet, P., Poulin-Charronnat, B., Tillmann, B., & Peereman, R. (2014). New evidence for chunk-based models in word segmentation. Acta Psychologica, 149, 1–8. doi:10.1016/j.actpsy.2014.01.015.
Article PubMed Google Scholar
Perruchet, P., & Tillmann, B. (2010). Exploiting multiple sources of information in learning an artificial language: human data and modeling. Cognitive Science, 34, 255–285. doi:10.1111/j.1551-6709.2009.01074.x.
Article PubMed Google Scholar
Perruchet, P., Tyler, M. D., Galland, N., & Peereman, R. (2004). Learning nonadjacent dependencies: no need for algebraic-like computations. Journal of Experimental Psychology: General, 133(4), 573–583.
Article Google Scholar
Perruchet, P., & Vinter, A. (1998). PARSER: a model for word segmentation. Journal of Memory and Language, 39, 246–263. doi:10.1016/jmla.1998.2576.
Article Google Scholar
Radeau, M., & Morais, J. (1990). The uniqueness point effect in the shadowing of spoken words. Speech Communication, 9(2), 155–164. doi:10.1016/0167-6393(90)900068-K.
Article Google Scholar
Robinet, V., Lemaire, B., & Gordon, M. B. (2011). MDLChunker: a MDL-based cognitive model of inductive learning. Cognitive Science, 35, 1352–1389. doi:10.1111/j.1551-6709.2011.01188.x.
Article PubMed Google Scholar
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996a). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. doi:10.1126/science.274.5294.1926.
Article PubMed Google Scholar
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996b). Word segmentation: the role of distributional cues. Journal of Memory and Language, 35, 606–621. doi:10.1006/jmla.1996.0032.
Article Google Scholar
Sanders, L. D., Newport, E. L., & Neville, H. J. (2002). Segmenting nonsense: an event-related potential index of perceived onsets in continuous speech. Nature Neuroscience, 5(7), 700–703. doi:10.1018/nn873.
Article PubMed PubMed Central Google Scholar
Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. Developmental Psychology, 39, 706–716. doi:10.1037/0012-1649.39.4.706.
Article PubMed Google Scholar
Turk-Browne, N. B., Jungé, J. A., & Scholl, B. J. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General, 134(4), 552–564. doi:10.1037/0096-3445.134.4.552.
Article Google Scholar
Valian, V., & Coulson, S. (1988). Anchor points in language learning: the role of marker frequency. Journal of Memory and Language, 27, 71–86. doi:10.1016/0749-596X(88)90049-6.
Article Google Scholar
Weiss, D. J., Gerfen, C., & Mitchel, A. D. (2009). Speech segmentation in a simulated bilingual environment: a challenge for statistical learning? Language Learning and Development, 5(1), 30–49. doi:10.1080/15475440802340101.
Article PubMed PubMed Central Google Scholar
Yang, C. D. (2004). Universal grammar, statistics or both? Trends in Cognitive Sciences, 8(10), 451–456. doi:10.1016/j.tics.2004.08.006.
Article PubMed Google Scholar

Download references

Acknowledgments

The authors are grateful to Pascal Morgan and Cédric Foucault for help with collecting the data.

Author information

Authors and Affiliations

Université Bourgogne Franche-Comté, LEAD-CNRS UMR5022, F-21000, Dijon, France
Bénédicte Poulin-Charronnat & Pierre Perruchet
CNRS UMR5292, INSERM U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Université of Lyon I, Lyon, France
Barbara Tillmann
Univ. Grenoble Alpes, CNRS UMR5105, LPNC, F-38000, Grenoble, France
Ronald Peereman

Authors

Bénédicte Poulin-Charronnat
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Perruchet
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Tillmann
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Peereman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bénédicte Poulin-Charronnat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poulin-Charronnat, B., Perruchet, P., Tillmann, B. et al. Familiar units prevail over statistical cues in word segmentation. Psychological Research 81, 990–1003 (2017). https://doi.org/10.1007/s00426-016-0793-y

Download citation

Received: 13 October 2015
Accepted: 10 August 2016
Published: 31 August 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00426-016-0793-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Familiar units prevail over statistical cues in word segmentation

Abstract

Access this article

Similar content being viewed by others

When statistics collide: The use of transitional and phonotactic probability cues to word boundaries

Statistical word segmentation succeeds given the minimal amount of exposure

Short- and long-term influences of repeated speech examples on segmentation in an unfamiliar language analog

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Familiar units prevail over statistical cues in word segmentation

Abstract

Access this article

Similar content being viewed by others

When statistics collide: The use of transitional and phonotactic probability cues to word boundaries

Statistical word segmentation succeeds given the minimal amount of exposure

Short- and long-term influences of repeated speech examples on segmentation in an unfamiliar language analog

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation