Understanding spoken language through TalkBank

  • Brian MacWhinneyEmail author


Ongoing advances in computer technology have opened up a deluge of new datasets for understanding human behavior (Goldstone & Lupyan, 2016). Many of these datasets provide information on the use of written language. However, data on naturally occurring spoken-language conversations are much more difficult to obtain. A major exception to this is the TalkBank system, which provides online multimedia data for 14 types of spoken-language data: language in aphasia, child language, stuttering, child phonology, autism spectrum disorder, bilingualism, Conversation Analysis, classroom discourse, dementia, right hemisphere damage, Danish conversation, second language learning, traumatic brain injury, and daylong recordings in the home. The present report reviews these resources and describes the ways they are being used to further our understanding of human language and communication.


Child language Aphasia Conversation analysis Bilingualism Second language acquisition Phonology Computational linguistics Corpora 



  1. Baroni, M., & Kilgarriff, A. (2006). Large linguistically-processed Web corpora for multiple languages. Paper presented at the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, Trento.CrossRefGoogle Scholar
  2. Bernstein Ratner, N., & MacWhinney, B. (2018). Fluency Bank: A new resource for fluency research and practice Journal of Fluency Disorders, 56, 69–80. CrossRefPubMedGoogle Scholar
  3. Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., & Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods, Instruments, & Computers, 40, 540–545. CrossRefGoogle Scholar
  4. Clahsen, H., Rothweiler, M., Woest, A., & Marcus, G. (1992). Regular and irregular inflection in the acquisition of German noun plurals. Cognition, 45, 225–255.CrossRefPubMedGoogle Scholar
  5. Costa, T. (2010). The acquisition of the consonantal system in European Portuguese: Focus on place and manner features (PhD dissertation), University of Lisbon, Lisbon, Portugal.Google Scholar
  6. Donoho, D. L. (2010). An invitation to reproducible computational research. Biostatistics, 11, 385–388.CrossRefPubMedGoogle Scholar
  7. dos Santos, C. (2007). Développement phonologique en Français langue maternelle: Une étude de cas. (PhD dissertation), University Lumière Lyon 2, Lyon.Google Scholar
  8. Freudenthal, D., Pine, J., & Gobet, F. (2010). Explaining quantitative variation in the rate of optional infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37, 643–669.CrossRefPubMedGoogle Scholar
  9. Garfinkel, H. (1967). Studies in ethnomethodology. Englewood Cliffs: Prentice-Hall.Google Scholar
  10. Givon, T. (2005). Context as other minds: The pragmatics of sociality, cognition, and communication. Philadelphia: John Benjamins.CrossRefGoogle Scholar
  11. Goldstone, R., & Lupyan, G. (2016). Discovering psychological principles by mining naturally occurring data sets. Topics in Cognitive Science, 8, 548–568. CrossRefPubMedGoogle Scholar
  12. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., . . . Sainath, T. N. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29, 82–97.CrossRefGoogle Scholar
  13. Le Franc, A., Riebling, E., Karadayi, J., Yun, W., Scaff, C., Metze, F., & Cristia, A. (2018). The ACLEW DiViMe: An easy-to-use diarization tool. Paper presented at Interspeech 2018, Mumbai, India.Google Scholar
  14. Lee, L. (1974). Developmental Sentence Analysis. Evanston, IL: Northwestern University Press.Google Scholar
  15. Lehrer, R., & Curtis, C. L. (2000). Why are some solids perfect? Teaching Children Mathematics, 6, 324.Google Scholar
  16. Leonard, L., & McGregor, K. (1991). Unusual phonological patterns and their underlying representations: A case study. Journal of Child Language, 18, 261–271.CrossRefPubMedGoogle Scholar
  17. Lubetich, S., & Sagae, K. (2014). Data-driven measurement of child language development with simple syntactic templates. Paper presented at the 25th International Conference on Computational Linguistics (COLING 2014), Dublin.Google Scholar
  18. MacWhinney, B. (2008). Enriching CHILDES for morphosyntactic analysis. In H. Behrens (Ed.), Trends in corpus research: Finding structure in data (pp. 165–198). Amsterdam, : John Benjamins.Google Scholar
  19. MacWhinney, B. (2014). Presentation. In L. Scliar-Cabral (Ed.), O português na plataforma CHILDES (pp. 9–20). Florianopolis, Portugal: Editora Insular.Google Scholar
  20. MacWhinney, B. (2015). Introduction: Language emergence. In B. MacWhinney & W. O’Grady (Eds.), Handbook of language emergence (pp. 1–32). New York: Wiley.Google Scholar
  21. MacWhinney, B. (2017). A shared platform for studying second language acquisition. Language Learning, 67, 254–275.CrossRefGoogle Scholar
  22. MacWhinney, B., & Fromm, D. (2016). AphasiaBank as big data. Seminars in Speech and Language, 37, 10–22. CrossRefPubMedPubMedCentralGoogle Scholar
  23. MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 29, 121–157.CrossRefGoogle Scholar
  24. Malvern, D., Richards, B., Chipere, N., & Purán, P. (2004). Lexical diversity and language development. New York: Palgrave Macmillan.CrossRefGoogle Scholar
  25. Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., Xu, F., & Clahsen, H. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, 57(4). CrossRefGoogle Scholar
  26. McAllister Byun, T. (2012). Positional velar fronting: An updated articulatory account. Journal of Child Language, 39, 1043–1076.CrossRefPubMedGoogle Scholar
  27. McCauley, S., Monaghan, P., & Christiansen, M. (2015). Usage-based language learning. In B. MacWhinney & W. O’Grady (Eds.), The handbook of language emergence (pp. 415–436). New York: Wiley.Google Scholar
  28. Metze, F., Riebling, E., Warlaumont, A. S., & Bergelson, E. (2016). Virtual machines and containers as a platform for experimentation. Paper presented at Interspeech 2016, San Francisco, CA. 10.21437/Interspeech.2016-997Google Scholar
  29. Miller, J., & Chapman, R. (1983). SALT: Systematic analysis of language transcripts, user’s manual. Madison: University of Wisconsin Press.Google Scholar
  30. Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021. CrossRefGoogle Scholar
  31. Myers-Scotton, J. (2005). Supporting a differential access hypothesis: Code switching and other contact data. In J. F. Kroll & A. M. B. DeGroot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 326–348). New York: Oxford University Press.Google Scholar
  32. Ngon, C., Martin, A., Dupoux, E., Cabrol, D., Dutat, M., & Peperkamp, S. (2013). (Non) words, (non) words, (non)words: Evidence for a protolexicon during the first year of life. Developmental Science, 16, 24–34. CrossRefPubMedGoogle Scholar
  33. Parisse, C., & Le Normand, M.-T. (2000). Automatic disambiguation of the morphosyntax in spoken language corpora. Behavior Research Methods, Instruments, & Computers, 32, 468–481. CrossRefGoogle Scholar
  34. Pennebaker, J. W. (2012). Opening up: The healing power of expressing emotions. New York: Guilford Press.Google Scholar
  35. Pine, J. M., & Lieven, E. V. M. (1997). Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics, 18, 123–138.CrossRefGoogle Scholar
  36. Redeker, G. (1984). On differences between spoken and written language. Discourse Processes, 7, 43–55. CrossRefGoogle Scholar
  37. Rochon, E., Saffran, E., Berndt, R., & Schwartz, M. (2000). Quantitative analysis of aphasic sentence production: Further development and new data. Brain and Language, 72, 193–218.CrossRefPubMedGoogle Scholar
  38. Rose, Y., & MacWhinney, B. (2014). The PhonBank Project: Data and software-assisted methods for the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The Oxford handbook of corpus phonology (pp. 380–401). Oxford: Oxford University Press.Google Scholar
  39. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50, 696–735. CrossRefGoogle Scholar
  40. Sagae, K., Davis, E., Lavie, A., MacWhinney, B., & Wintner, S. (2010). Morphosyntactic annotation of CHILDES transcripts. Journal of Child Language, 37, 705–729. CrossRefPubMedPubMedCentralGoogle Scholar
  41. Scarborough, H. S. (1990). Index of productive syntax. Applied Psycholinguistics, 11, 1–22. CrossRefGoogle Scholar
  42. Skehan, P., Foster, P., & Shum, S. (2016). Ladders and snakes in second language fluency. International Review of Applied Linguistics, 54, 97–112.CrossRefGoogle Scholar
  43. Stigler, J., Gallimore, R., & Hiebert, J. (2000). Using video surveys to compare classrooms and teaching across cultures: Examples and lessons from the TIMSS video studies. Educational Psychologist, 35, 87–100.CrossRefGoogle Scholar
  44. Thompson, C. K., Shapiro, L. P., Tait, M. E., Jacobs, B. J., Schneider, S. L., & Ballard, K. J. (1995). A system for the linguistic analysis of agrammatic language production. Brain and Language, 51, 124–129.Google Scholar
  45. Valian, V., Solt, S., & Stewart, J. (2009). Abstract categories or limited-scope formulae? The case of children’s determiners. Journal of Child Language, 36, 743–778.CrossRefPubMedGoogle Scholar
  46. VanDam, M., Warlaumont, A. S., Bergelson, E., Cristia, A., Soderstrom, M., Palma, P. D., & MacWhinney, B. (2016). HomeBank: An online repository of daylong child-centered audio recordings. Seminars in Speech and Language, 37, 128–142. CrossRefPubMedPubMedCentralGoogle Scholar
  47. Vihman, M., & Croft, W. (2007). Phonological development: Toward a “radical” templatic phonology. Linguistics, 45, 683–725.CrossRefGoogle Scholar
  48. Wexler, K. (1998). Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua, 106, 23–79.CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Department of PsychologyCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations