Online Learning Mechanisms for Bayesian Models of Word Segmentation
- 645 Downloads
In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavior under uncertainty, a hypothesis that has been supported by a number of modeling studies across various domains (e.g., Griffiths and Tenenbaum, Cognitive Psychology, 51, 354–384, 2005; Xu and Tenenbaum, Psychological Review, 114, 245–272, 2007). The models in these studies aim to explain why humans behave as they do given the task and data they encounter, but typically avoid some questions addressed by more traditional psychological models, such as how the observed behavior is produced given constraints on memory and processing. Here, we use the task of word segmentation as a case study for investigating these questions within a Bayesian framework. We consider some limitations of the infant learner, and develop several online learning algorithms that take these limitations into account. Each algorithm can be viewed as a different method of approximating the same ideal learner. When tested on corpora of English child-directed speech, we find that the constrained learner’s behavior depends non-trivially on how the learner’s limitations are implemented. Interestingly, sometimes biases that are helpful to an ideal learner hinder a constrained learner, and in a few cases, constrained learners perform equivalently or better than the ideal learner. This suggests that the transition from a computational-level solution for acquisition to an algorithmic-level one is not straightforward.
KeywordsAlgorithmic level Bayesian models Computational level English Ideal learning Online learning Processing limitations Word segmentation
We would like to thank the audiences at the PsychoComputational Models of Human Language workshop in 2009, BUCLD 34, three anonymous reviewers, Alexander Clark, William Sakas, Tom Griffiths, and Michael Frank. We would also like to give a special thanks to Jim White for his insight about the differences in performance between the ideal and online Bayesian learners. This work was supported by NSF grant BCS-0843896 to the first author and CORCL grant MI 14B-2009-2010 to the first and third authors.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Anderson J. R., Schooler L. J. (2000) The adaptive nature of memory. In: Tulving E., Craik F. I. M. (eds) The Oxford handbook of memory. Oxford University Press, Oxford, pp 557–570Google Scholar
- Bernstein-Ratner N. (1984) Patterns of vowel modification in motherese. Journal of Child Language 11: 557–578Google Scholar
- Fleck, M. (2008). Lexicalized phonotactic word segmentation. In Proceedings of the association for computational linguistics (pp. 130–138).Google Scholar
- Gambell T., Yang C. (2006) Word segmentation: Quick but not dirty. Manuscript. Yale University, New HavenGoogle Scholar
- Griffiths T. L., Kemp C., Tenenbaum J. B. (2008) Bayesian models of cognition. In: Sun Ron (Ed.) The Cambridge handbook of computational cognitive modeling. Cambridge University Press, CambridgeGoogle Scholar
- Goldwater (2006). Nonparametric Bayesian models of lexical acquisition. Ph.D. thesis, Brown University.Google Scholar
- Goldwater S., Griffiths T., Johnson M. (2007) Distributional cues to word boundaries: Context is important. In: Caunt-Nulton H., Kulatilake S., Woo I. (eds) BUCLD 31: Proceedings of the 31st annual Boston university conference on language development. Cascadilla Press, Somerville, MA, pp 239–250Google Scholar
- Hewlett, D., & Cohen, P. (2009). Bootstrap voting experts. In Proceedings of the twenty-first international joint conference on artificial intelligence (IJCAI-09) (pp. 1071–1076). Available at http://www.ijcai.org/papers09/contents.php.
- Johnson, M., Griffiths, T., & Goldwater, S. (2007). Bayesian inference for PCFGs via Markov Cain Monte Carlo. In Proceedings of the meeting of the North American association for computational linguistics.Google Scholar
- MacWhinney B. (2000) The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJGoogle Scholar
- Marr D. (1982) Vision. Freeman, San FranciscoGoogle Scholar
- Marthi, B., Pasula, H., Russell, S., & Peres, Y., et al. (2002). Decayed MCMC Filtering. In Proceedings of 18th UAI (pp. 319–326).Google Scholar
- Oaksford M., Chater N. (1998) Rational models of cognition. Oxford University Press, Oxford, EnglandGoogle Scholar
- Peters A. (1983) The Units of Language Acquisition, Monographs in Applied Psycholinguistics. Cambridge University Press, New YorkGoogle Scholar
- Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (in press). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review.Google Scholar
- Shi, L., Griffiths, T. L., Feldman, N. H., & Sanborn, A. N. (in press). Exemplar models as a mechanism for performing Bayesian inference. Psychonomic Bulletin & Review.Google Scholar
- Tenenbaum J., Griffiths T. (2001) Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences 24: 629–641Google Scholar