Research on Language and Computation

, Volume 8, Issue 2–3, pp 107–132 | Cite as

Online Learning Mechanisms for Bayesian Models of Word Segmentation

  • Lisa PearlEmail author
  • Sharon Goldwater
  • Mark Steyvers
Open Access


In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavior under uncertainty, a hypothesis that has been supported by a number of modeling studies across various domains (e.g., Griffiths and Tenenbaum, Cognitive Psychology, 51, 354–384, 2005; Xu and Tenenbaum, Psychological Review, 114, 245–272, 2007). The models in these studies aim to explain why humans behave as they do given the task and data they encounter, but typically avoid some questions addressed by more traditional psychological models, such as how the observed behavior is produced given constraints on memory and processing. Here, we use the task of word segmentation as a case study for investigating these questions within a Bayesian framework. We consider some limitations of the infant learner, and develop several online learning algorithms that take these limitations into account. Each algorithm can be viewed as a different method of approximating the same ideal learner. When tested on corpora of English child-directed speech, we find that the constrained learner’s behavior depends non-trivially on how the learner’s limitations are implemented. Interestingly, sometimes biases that are helpful to an ideal learner hinder a constrained learner, and in a few cases, constrained learners perform equivalently or better than the ideal learner. This suggests that the transition from a computational-level solution for acquisition to an algorithmic-level one is not straightforward.


Algorithmic level Bayesian models Computational level English Ideal learning Online learning Processing limitations Word segmentation 



We would like to thank the audiences at the PsychoComputational Models of Human Language workshop in 2009, BUCLD 34, three anonymous reviewers, Alexander Clark, William Sakas, Tom Griffiths, and Michael Frank. We would also like to give a special thanks to Jim White for his insight about the differences in performance between the ideal and online Bayesian learners. This work was supported by NSF grant BCS-0843896 to the first author and CORCL grant MI 14B-2009-2010 to the first and third authors.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. Anderson J. R., Schooler L. J. (2000) The adaptive nature of memory. In: Tulving E., Craik F. I. M. (eds) The Oxford handbook of memory. Oxford University Press, Oxford, pp 557–570Google Scholar
  2. Bernstein-Ratner N. (1984) Patterns of vowel modification in motherese. Journal of Child Language 11: 557–578Google Scholar
  3. Blanchard D., Heinz J., Golinkoff R. (2010) Modeling the contribution of phonotactic cues to word segmentation. Journal of Child Language 27: 487–511CrossRefGoogle Scholar
  4. Brent M. (1999) An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 34: 71–105CrossRefGoogle Scholar
  5. Brown S., Steyvers M. (2009) Detecting and predicting changes. Cognitive Psychology 58: 49–67CrossRefGoogle Scholar
  6. Christiansen M., Allen J., Seidenberg M. (1998) Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes 13: 221–268CrossRefGoogle Scholar
  7. Curtin S., Mintz T., Christansen M. (2005) Stress changes the representational landscape: Evidence from word segmentation in infants. Cognition 96: 233–262CrossRefGoogle Scholar
  8. Ferguson T. (1973) A Bayesian analysis of some nonparametric problems. Annals of Statistics 1: 209–230CrossRefGoogle Scholar
  9. Fleck, M. (2008). Lexicalized phonotactic word segmentation. In Proceedings of the association for computational linguistics (pp. 130–138).Google Scholar
  10. Frank M. C., Goodman N. D., Tenenbaum J. (2009) Using speakers’ referential intentions to model early cross-situational word learning. Psychological Science 20: 579–585CrossRefGoogle Scholar
  11. Gambell T., Yang C. (2006) Word segmentation: Quick but not dirty. Manuscript. Yale University, New HavenGoogle Scholar
  12. Griffiths T. L., Chater N., Kemp C., Perfors A., Tenenbaum J. B. (2010) Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences 14: 357–364CrossRefGoogle Scholar
  13. Griffiths T. L., Kemp C., Tenenbaum J. B. (2008) Bayesian models of cognition. In: Sun Ron (Ed.) The Cambridge handbook of computational cognitive modeling. Cambridge University Press, CambridgeGoogle Scholar
  14. Griffiths T. L., Tenenbaum J. B. (2005) Structure and strength in causal induction. Cognitive Psychology 51: 354–384CrossRefGoogle Scholar
  15. Goldwater (2006). Nonparametric Bayesian models of lexical acquisition. Ph.D. thesis, Brown University.Google Scholar
  16. Goldwater S., Griffiths T., Johnson M. (2007) Distributional cues to word boundaries: Context is important. In: Caunt-Nulton H., Kulatilake S., Woo I. (eds) BUCLD 31: Proceedings of the 31st annual Boston university conference on language development. Cascadilla Press, Somerville, MA, pp 239–250Google Scholar
  17. Goldwater S., Griffiths T.L., Johnson M. (2009) A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112(1): 21–54CrossRefGoogle Scholar
  18. Hewlett, D., & Cohen, P. (2009). Bootstrap voting experts. In Proceedings of the twenty-first international joint conference on artificial intelligence (IJCAI-09) (pp. 1071–1076). Available at
  19. Johnson E., Jusczyk P. (2001) Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language 44: 548–567CrossRefGoogle Scholar
  20. Johnson, M., Griffiths, T., & Goldwater, S. (2007). Bayesian inference for PCFGs via Markov Cain Monte Carlo. In Proceedings of the meeting of the North American association for computational linguistics.Google Scholar
  21. Jusczyk P., Goodman M., Baumann A. (1999a) Nine-month-olds’ attention to sound similarities in syllables. Journal of Memory & Language 40: 62–82CrossRefGoogle Scholar
  22. Jusczyk P., Hohne E., Baumann A. (1999b) Infants’ sensitivity to allophonic cues for word segmentation. Perception and Psychophysics 61: 1465–1476CrossRefGoogle Scholar
  23. Juszcyk P., Houston D., Newsome M. (1999c) The beginnings of word segmentation in English-learning infants. Cognitive Psychology 39: 159–207CrossRefGoogle Scholar
  24. MacWhinney B. (2000) The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJGoogle Scholar
  25. Marr D. (1982) Vision. Freeman, San FranciscoGoogle Scholar
  26. Marthi, B., Pasula, H., Russell, S., & Peres, Y., et al. (2002). Decayed MCMC Filtering. In Proceedings of 18th UAI (pp. 319–326).Google Scholar
  27. Mattys S., Jusczyk P., Luce P., Morgan J. (1999) Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology 38: 465–494CrossRefGoogle Scholar
  28. McClelland J. L., Botvinick M. M., Noelle D. C., Plaut D. C., Rogers T. T., Seidenberg M. S., Smith L. B. (2010) Letting structure emerge: Connectionist and dynamical systems approaches to understanding cognition. Trends in Cognitive Sciences 14: 348–356CrossRefGoogle Scholar
  29. Morgan J., Bonamo K., Travis L. (1995) Negative evidence on negative evidence. Developmental Psychology 31: 180–197CrossRefGoogle Scholar
  30. Newport E. (1990) Maturational constraints on language learning. Cognitive Science 14: 11–28CrossRefGoogle Scholar
  31. Oaksford M., Chater N. (1998) Rational models of cognition. Oxford University Press, Oxford, EnglandGoogle Scholar
  32. Pelucchi B., Hay J., Saffran J. (2009) Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition 113: 244–247CrossRefGoogle Scholar
  33. Perruchet P., Desaulty S. (2008) A role for backward transitional probabilities in word segmentation?. Memory and Cognition 36: 1299–1305CrossRefGoogle Scholar
  34. Peters A. (1983) The Units of Language Acquisition, Monographs in Applied Psycholinguistics. Cambridge University Press, New YorkGoogle Scholar
  35. Saffran J., Aslin R., Newport E. (1996) Statistical learning by 8-month-olds. Science 274: 1926–1928CrossRefGoogle Scholar
  36. Saffran J. R. (2001) The use of predictive dependencies in language learning. Journal of Memory and Language 44: 493–513CrossRefGoogle Scholar
  37. Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (in press). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review.Google Scholar
  38. Seidl A., Johnson E. (2006) Infant word segmentation revisited: Edge alignment facilitates target extraction. Developmental Science 9(6): 565–573CrossRefGoogle Scholar
  39. Shi, L., Griffiths, T. L., Feldman, N. H., & Sanborn, A. N. (in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review.Google Scholar
  40. Swingley D. (2005) Statistical clustering and contents of the infant vocabulary. Cognitive Psychology 50: 86–132CrossRefGoogle Scholar
  41. Teh Y., Jordan M., Beal M., Blei D. (2006) Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476): 1566–1581CrossRefGoogle Scholar
  42. Tenenbaum J., Griffiths T. (2001) Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences 24: 629–641Google Scholar
  43. Tenenbaum J., Griffiths T., Kemp C. (2006) Theory-based models of inductive learning and reasoning. Trends in Cognitive Sciences 10: 309–318CrossRefGoogle Scholar
  44. Thiessen E., Saffran J. R. (2003) When cues collide: Use of stress and statistical cues to word boundaries by 7- to 9-month-old infants. Developmental Psychology 39: 706–716CrossRefGoogle Scholar
  45. Xu F., Tenenbaum J. B. (2007) Word learning as Bayesian inference. Psychological Review 114: 245–272CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Department of Cognitive SciencesUniversity of CaliforniaIrvineUSA
  2. 2.School of InformaticsUniversity of EdinburghEdinburghUK

Personalised recommendations