Abstract
One of the first tasks in language acquisition is word segmentation, a process to extract word forms from continuous speech streams. Statistical approaches to word segmentation have been shown to be a powerful mechanism, in which word boundaries are inferred from sequence statistics. This approach requires the learner to represent the frequency of units from syllable sequences, though accounts differ on how much statistical exposure is required. In this study, we examined the computational limit with which words can be extracted from continuous sequences. First, we discussed why two occurrences of a word in a continuous sequence is the computational lower limit for this word to be statistically defined. Next, we created short syllable sequences that contained certain words either two or four times. Learners were presented with these syllable sequences one at a time, immediately followed by a test of the novel words from these sequences. We found that, with the computationally minimal amount of two exposures, words were successfully segmented from continuous sequences. Moreover, longer syllable sequences providing four exposures to words generated more robust learning results. The implications of these results are discussed in terms of how learners segment and store the word candidates from continuous sequences.
Similar content being viewed by others
Notes
The computation of TPs makes use of the absolute frequency for different units: p(b|a) = count (ab)/count (a).
Incidentally, both of these studies report null results, and that learning did not change by lengthening the exposure period.
Many statistical word-segmentation studies use fade-in or fade-out to avoid providing word boundary information to learners, as sequence boundaries can potentially be regarded as cues for word boundaries. We examined this issue empirically in our analysis; see Results section.
This is another way in which our design differed from the Batterink (2017) design. In Batterink (2017), all the words in syllable streams have equal frequency (a characteristic from Saffran et al., 1996), which makes it difficult to disentangle transitional probability and frequency as the source of the learning effect: these factors can only be disentangled if the words in the sequence have a lower frequency than part-words, as Aslin et al. (1998) explained. We designed all of our sequences with an unbalanced frequency design from Aslin et al. (1998).
In the example sequence in Fig. 1, a counterbalance sequence is constructed by using [8,1] and [2,7] as low-frequency words, and [3,4] and [5,6] as part words, which means that the four words making up this counterbalancing sequence are [4,5], [6,3], [8,1] and [2,7], with a frequency of 4, 4, 2, and 2 respectively.
For all the syllables used in the study, we use a pinyin representation where the syllables are followed by a number from 1 through 4, such that the number represents the tone the syllable is in. In this example, both syllables xian and mo are in the second tone.
In Stata syntax, the equation here is: mixed key i.rc || langrecode: || subject:, where rc codes for word/part-word given the current sequence and counterbalancing condition.
In Stata syntax, the equation here is: mixed key i.rc##i.times || langrecode: || subject: , where rc codes for word/part-word given the current sequence and counterbalancing condition.
In Stata syntax, the equation here is: mixed key i.rc##i.langt || langrecode: || subject:
This was a linear regression for each condition, which in Stata syntax is: reg effect_size time_points.
In Stata syntax, the equations here are:
mixed key i.firstword || langrecode: || subject: if rc == 1
mixed key i.lastword || langrecode: || subject: if rc == 1
Though as we discussed in the Introduction, more exposure does not generate more learning in many cases. In the current experimental conditions, the reason for the increase in learning from more exposures may stem from the fact that the minimal exposure amount (two exposures) generated a very modest amount of learning to begin with.
References
Aslin, R. N., Saffran, J. R., & Newport, E. L. (1998). Computation of conditional probability statistics by 8-month-old infants. Psychological Science, 9(4), 321–324.
Batterink, L. J. (2017). Rapid statistical learning supporting word extraction from continuous speech. Psychological Science, 28(7), 921–928.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), 1–20. https://doi.org/10.5334/joc.10
Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45–50.
Bulgarelli, F., & Weiss, D. J. (2016). Anchors aweigh: The impact of overlearning on entrenchment effects in statistical learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(10), 1621.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.
Chen, J., & Ten Cate, C. (2017). Bridging the gap: Learning of acoustic nonadjacent dependencies by a songbird. Journal of Experimental Psychology: Animal Learning and Cognition, 43(3), 295.
Erickson, L. C., & Thiessen, E. D. (2015). Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. Developmental Review, 37, 66–108.
Finn, A. S., & Kam, C. L. H. (2008). The curse of knowledge: First language knowledge impairs adult learners’ use of novel statistics for word segmentation. Cognition, 108(2), 477–499.
Frank, M. C., Goldwater, S., Griffiths, T. L., & Tenenbaum, J. B. (2010). Modeling human performance in statistical word segmentation. Cognition, 117(2), 107–125.
Gebhart, A. L., Newport, E. L., & Aslin, R. N. (2009). Statistical learning of adjacent and nonadjacent dependencies among nonlinguistic sounds. Psychonomic Bulletin & Review, 16(3), 486–490.
Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of experimental psychology, 4(1), 11–26.
Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of experimental psychology, 45(3), 188.
Lazartigues, L., Mathy, F., & Lavigne, F. (2021). Statistical learning of unbalanced exclusive-or temporal sequences in humans. Plos one, 16(2), e0246826.
Lazartigues, L., Mathy, F., & Lavigne, F. (2023). Probability, Dependency, and Frequency Are Not All Equally Involved in Statistical Learning. Experimental Psychology, 69(5), 241–252.
Lew-Williams, C., & Saffran, J. R. (2012). All words are not created equal: Expectations about word length guide infant statistical learning. Cognition, 122(2), 241–246. https://doi.org/10.1016/j.cognition.2011.10.007
Mirman, D., Graf Estes, K., & Magnuson, J. S. (2010). Computational modeling of statistical learning: Effects of transitional probability versus frequency and links to word learning. Infancy, 15(5), 471–486.
Misyak, J. B., & Christiansen, M. H. (2012). Statistical learning and language: An individual differences study. Language Learning, 62(1), 302–331.
Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). On-line individual differences in statistical learning predict language processing. Frontiers in psychology, 1, 31.
Newport, E. L., & Aslin, R. N. (2004). Learning at a distance I. Statistical learning of non-adjacent dependencies. Cognitive Psychology, 48(2), 127–162. https://doi.org/10.1016/S0010-0285(03)00128-2
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from performance measures. Cognitive psychology, 19(1), 1–32.
Pelucchi, B., Hay, J. F., & Saffran, J. R. (2009). Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition, 113(2), 244–247.
Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in cognitive sciences, 10(5), 233–238.
Perruchet, P., & Vinter, A. (1998). PARSER: A model for word segmentation. Journal of memory and language, 39(2), 246–263.
Popov, V., & Reder, L. M. (2020). Frequency effects on memory: A resource-limited theory. Psychological review, 127(1), 1.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928. https://doi.org/10.1126/science.274.5294.1926
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52.
Santolin, C., & Saffran, J. R. (2018). Constraints on statistical learning across species. Trends in Cognitive Sciences, 22(1), 52–63.
Swingley, D. (2005). Statistical clustering and the contents of the infant vocabulary. Cognitive psychology, 50(1), 86–132.
Toro, J. M., & Trobalón, J. B. (2005). Statistical computations over a speech stream in a rodent. Perception & Psychophysics, 67(5), 867–875.
Wang, F. H., Hutton, E. A., & Zevin, J. D. (2019). Statistical learning of unfamiliar sounds as trajectories through a perceptual similarity space. Cognitive Science, 43(8), e12740.
Wang, F. H., Zevin, J. D., Trueswell, J. C., & Mintz, T. H. (2020). Top-down grouping affects adjacent dependency learning. Psychonomic Bulletin & Review, 27, 1052–1058.
Wang, F. H., Zevin, J., & Mintz, T. H. (2019). Successfully learning non-adjacent dependencies in a continuous artificial language stream. Cognitive Psychology, 113, 101223.
Weiss, D. J., Gerfen, C., & Mitchel, A. D. (2009). Speech Segmentation in a Simulated Bilingual Environment: A Challenge for Statistical Learning? Language Learning and Development, 5(1), 30–49. https://doi.org/10.1080/15475440802340101
Yu, W., Wang, L., Qu, X., Wang, T., Zhang, J., & Liang, D. (2021). Transitional probabilities and expectation for word length impact verbal statistical learning. Acta Psychologica Sinica, 53(6), 565–574.
Zhou, X., & Marslen-Wilson, W. (1997). The abstractness of phonological representation in the Chinese mental lexicon. Cognitive Processing of Chinese and other Asian languages, 3-26.
Author Note
The reported experiments were not preregistered. The data reported in this paper is available, at https://osf.io/che4q/?view_only=cca901e1ecb748178ecae2b6a5f1c31c. This work was supported by the Natural Science Foundation of China (No. 3217051) to Suiping Wang.
Author information
Authors and Affiliations
Contributions
Felix Hao Wang: Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Meili Luo: Project administration, Writing ‐ review & editing. Suiping Wang: Resources, Writing ‐ review & editing.
Corresponding authors
Ethics declarations
Conflict of interest
There is no conflict of interest to report.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hao Wang, F., Luo, M. & Wang, S. Statistical word segmentation succeeds given the minimal amount of exposure. Psychon Bull Rev (2023). https://doi.org/10.3758/s13423-023-02386-z
Accepted:
Published:
DOI: https://doi.org/10.3758/s13423-023-02386-z