Skip to main content
Log in

Statistical word segmentation succeeds given the minimal amount of exposure

  • Brief Report
  • Published:
Psychonomic Bulletin & Review Aims and scope Submit manuscript

Abstract

One of the first tasks in language acquisition is word segmentation, a process to extract word forms from continuous speech streams. Statistical approaches to word segmentation have been shown to be a powerful mechanism, in which word boundaries are inferred from sequence statistics. This approach requires the learner to represent the frequency of units from syllable sequences, though accounts differ on how much statistical exposure is required. In this study, we examined the computational limit with which words can be extracted from continuous sequences. First, we discussed why two occurrences of a word in a continuous sequence is the computational lower limit for this word to be statistically defined. Next, we created short syllable sequences that contained certain words either two or four times. Learners were presented with these syllable sequences one at a time, immediately followed by a test of the novel words from these sequences. We found that, with the computationally minimal amount of two exposures, words were successfully segmented from continuous sequences. Moreover, longer syllable sequences providing four exposures to words generated more robust learning results. The implications of these results are discussed in terms of how learners segment and store the word candidates from continuous sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The computation of TPs makes use of the absolute frequency for different units: p(b|a) = count (ab)/count (a).

  2. Incidentally, both of these studies report null results, and that learning did not change by lengthening the exposure period.

  3. Many statistical word-segmentation studies use fade-in or fade-out to avoid providing word boundary information to learners, as sequence boundaries can potentially be regarded as cues for word boundaries. We examined this issue empirically in our analysis; see Results section.

  4. This is another way in which our design differed from the Batterink (2017) design. In Batterink (2017), all the words in syllable streams have equal frequency (a characteristic from Saffran et al., 1996), which makes it difficult to disentangle transitional probability and frequency as the source of the learning effect: these factors can only be disentangled if the words in the sequence have a lower frequency than part-words, as Aslin et al. (1998) explained. We designed all of our sequences with an unbalanced frequency design from Aslin et al. (1998).

  5. In the example sequence in Fig. 1, a counterbalance sequence is constructed by using [8,1] and [2,7] as low-frequency words, and [3,4] and [5,6] as part words, which means that the four words making up this counterbalancing sequence are [4,5], [6,3], [8,1] and [2,7], with a frequency of 4, 4, 2, and 2 respectively.

  6. For all the syllables used in the study, we use a pinyin representation where the syllables are followed by a number from 1 through 4, such that the number represents the tone the syllable is in. In this example, both syllables xian and mo are in the second tone.

  7. In Stata syntax, the equation here is: mixed key i.rc || langrecode: || subject:, where rc codes for word/part-word given the current sequence and counterbalancing condition.

  8. In Stata syntax, the equation here is: mixed key i.rc##i.times || langrecode: || subject: , where rc codes for word/part-word given the current sequence and counterbalancing condition.

  9. In Stata syntax, the equation here is: mixed key i.rc##i.langt || langrecode: || subject:

  10. This was a linear regression for each condition, which in Stata syntax is: reg effect_size time_points.

  11. In Stata syntax, the equations here are:

    mixed key i.firstword || langrecode: || subject: if rc == 1

    mixed key i.lastword || langrecode: || subject: if rc == 1

  12. Though as we discussed in the Introduction, more exposure does not generate more learning in many cases. In the current experimental conditions, the reason for the increase in learning from more exposures may stem from the fact that the minimal exposure amount (two exposures) generated a very modest amount of learning to begin with.

References

  • Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9(4), 321–324.

    Article  Google Scholar 

  • Batterink, L. J. (2017). Rapid statistical learning supporting word extraction from continuous speech. Psychological Science, 28(7), 921–928.

    Article  PubMed  PubMed Central  Google Scholar 

  • Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), 1–20. https://doi.org/10.5334/joc.10

    Article  Google Scholar 

  • Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45–50.

    Article  Google Scholar 

  • Bulgarelli, F., & Weiss, D. J. (2016). Anchors aweigh: The impact of overlearning on entrenchment effects in statistical learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(10), 1621.

    PubMed  Google Scholar 

  • Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.

    Article  Google Scholar 

  • Chen, J., & Ten Cate, C. (2017). Bridging the gap: Learning of acoustic nonadjacent dependencies by a songbird. Journal of Experimental Psychology: Animal Learning and Cognition, 43(3), 295.

    PubMed  Google Scholar 

  • Erickson, L. C., & Thiessen, E. D. (2015). Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. Developmental Review, 37, 66–108.

    Article  Google Scholar 

  • Finn, A. S., & Kam, C. L. H. (2008). The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation. Cognition, 108(2), 477–499.

    Article  PubMed  Google Scholar 

  • Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117(2), 107–125.

    Article  PubMed  Google Scholar 

  • Gebhart, A. L., Newport, E. L., & Aslin, R. N. (2009). Statistical learning of adjacent and nonadjacent dependencies among nonlinguistic sounds. Psychonomic Bulletin & Review, 16(3), 486–490.

    Article  Google Scholar 

  • Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of experimental psychology, 4(1), 11–26.

    Article  Google Scholar 

  • Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of experimental psychology, 45(3), 188.

    Article  PubMed  Google Scholar 

  • Lazartigues, L., Mathy, F., & Lavigne, F. (2021). Statistical learning of unbalanced exclusive-or temporal sequences in humans. Plos one, 16(2), e0246826.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lazartigues, L., Mathy, F., & Lavigne, F. (2023). Probability, Dependency, and Frequency Are Not All Equally Involved in Statistical Learning. Experimental Psychology, 69(5), 241–252.

    Article  Google Scholar 

  • Lew-Williams, C., & Saffran, J. R. (2012). All words are not created equal: Expectations about word length guide infant statistical learning. Cognition, 122(2), 241–246. https://doi.org/10.1016/j.cognition.2011.10.007

    Article  PubMed  Google Scholar 

  • Mirman, D., Graf Estes, K., & Magnuson, J. S. (2010). Computational modeling of statistical learning: Effects of transitional probability versus frequency and links to word learning. Infancy, 15(5), 471–486.

    Article  PubMed  Google Scholar 

  • Misyak, J. B., & Christiansen, M. H. (2012). Statistical learning and language: An individual differences study. Language Learning, 62(1), 302–331.

    Article  Google Scholar 

  • Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). On-line individual differences in statistical learning predict language processing. Frontiers in psychology, 1, 31.

    Article  PubMed  PubMed Central  Google Scholar 

  • Newport, E. L., & Aslin, R. N. (2004). Learning at a distance I. Statistical learning of non-adjacent dependencies. Cognitive Psychology, 48(2), 127–162. https://doi.org/10.1016/S0010-0285(03)00128-2

  • Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from performance measures. Cognitive psychology, 19(1), 1–32.

    Article  Google Scholar 

  • Pelucchi, B., Hay, J. F., & Saffran, J. R. (2009). Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition, 113(2), 244–247.

    Article  PubMed  PubMed Central  Google Scholar 

  • Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in cognitive sciences, 10(5), 233–238.

    Article  PubMed  Google Scholar 

  • Perruchet, P., & Vinter, A. (1998). PARSER: A model for word segmentation. Journal of memory and language, 39(2), 246–263.

    Article  Google Scholar 

  • Popov, V., & Reder, L. M. (2020). Frequency effects on memory: A resource-limited theory. Psychological review, 127(1), 1.

    Article  PubMed  Google Scholar 

  • Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928. https://doi.org/10.1126/science.274.5294.1926

    Article  PubMed  Google Scholar 

  • Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52.

    Article  PubMed  Google Scholar 

  • Santolin, C., & Saffran, J. R. (2018). Constraints on statistical learning across species. Trends in Cognitive Sciences, 22(1), 52–63.

    Article  PubMed  Google Scholar 

  • Swingley, D. (2005). Statistical clustering and the contents of the infant vocabulary. Cognitive psychology, 50(1), 86–132.

    Article  PubMed  Google Scholar 

  • Toro, J. M., & Trobalón, J. B. (2005). Statistical computations over a speech stream in a rodent. Perception & Psychophysics, 67(5), 867–875.

    Article  Google Scholar 

  • Wang, F. H., Hutton, E. A., & Zevin, J. D. (2019). Statistical learning of unfamiliar sounds as trajectories through a perceptual similarity space. Cognitive Science, 43(8), e12740.

    Article  PubMed  Google Scholar 

  • Wang, F. H., Zevin, J. D., Trueswell, J. C., & Mintz, T. H. (2020). Top-down grouping affects adjacent dependency learning. Psychonomic Bulletin & Review, 27, 1052–1058.

    Article  Google Scholar 

  • Wang, F. H., Zevin, J., & Mintz, T. H. (2019). Successfully learning non-adjacent dependencies in a continuous artificial language stream. Cognitive Psychology, 113, 101223.

    Article  PubMed  Google Scholar 

  • Weiss, D. J., Gerfen, C., & Mitchel, A. D. (2009). Speech Segmentation in a Simulated Bilingual Environment: A Challenge for Statistical Learning? Language Learning and Development, 5(1), 30–49. https://doi.org/10.1080/15475440802340101

    Article  PubMed  PubMed Central  Google Scholar 

  • Yu, W., Wang, L., Qu, X., Wang, T., Zhang, J., & Liang, D. (2021). Transitional probabilities and expectation for word length impact verbal statistical learning. Acta Psychologica Sinica, 53(6), 565–574.

    Article  Google Scholar 

  • Zhou, X., & Marslen-Wilson, W. (1997). The abstractness of phonological representation in the Chinese mental lexicon. Cognitive Processing of Chinese and other Asian languages, 3-26.

Download references

Author Note

The reported experiments were not preregistered. The data reported in this paper is available, at https://osf.io/che4q/?view_only=cca901e1ecb748178ecae2b6a5f1c31c. This work was supported by the Natural Science Foundation of China (No. 3217051) to Suiping Wang.

Author information

Authors and Affiliations

Authors

Contributions

Felix Hao Wang: Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Meili Luo: Project administration, Writing ‐ review & editing. Suiping Wang: Resources, Writing ‐ review & editing.

Corresponding authors

Correspondence to Felix Hao Wang or Suiping Wang.

Ethics declarations

Conflict of interest

There is no conflict of interest to report.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 56 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao Wang, F., Luo, M. & Wang, S. Statistical word segmentation succeeds given the minimal amount of exposure. Psychon Bull Rev (2023). https://doi.org/10.3758/s13423-023-02386-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13423-023-02386-z

Keywords

Navigation