childes-db: A flexible and reproducible interface to the child language data exchange system
The Child Language Data Exchange System (CHILDES) has played a critical role in research on child language development, particularly in characterizing the early language learning environment. Access to these data can be both complex for novices and difficult to automate for advanced users, however. To address these issues, we introduce childes-db, a database-formatted mirror of CHILDES that improves data accessibility and usability by offering novel interfaces, including browsable web applications and an R application programming interface (API). Along with versioned infrastructure that facilitates reproducibility of past analyses, these interfaces lower barriers to analyzing naturalistic parent–child language, allowing for a wider range of researchers in language and cognitive development to easily leverage CHILDES in their work.
KeywordsChild language Corpus linguistics Reproducibility R packages Research software
- Bååth, R. (2010). Childfreq: An online tool to explore word frequencies in child language. Lucs Minor, 16, 1–6.Google Scholar
- Bird, S., & Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the Association for Computational Linguistics Workshop on Interactive Poster and Demonstration sessions.Google Scholar
- Chang, F. (2017). The luCID language researcher’s toolkit [computer software]. Retrieved from http://www.lucid.ac.uk/resources/for-researchers/toolkit/ .
- Eriksson, M., Marschik, P.B., Tulviste, T., Almgren, M., Pérez Pereira, M., Wehberg, S., ..., Gallego, C. (2012). Differences between girls and boys in emerging language skills: Evidence from 10 language communities. British Journal of Developmental Psychology, 30(2), 326–343.PubMedCrossRefGoogle Scholar
- Fenson, L., Dale, P.S., Reznick, J.S., Bates, E., Thal, D.J., Pethick, S.J., ..., Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, i–185.Google Scholar
- Kline, M. (2012). CLANtoR. https://doi.org/http://github.com/mekline/CLANtoR/. GitHub. https://doi.org/10.5281/zenodo.1196626.
- MacWhinney, B. (2000) The CHILDES project: The Database Vol. 2. Hove: Psychology Press.Google Scholar
- Malvern, D.D., & Richards, B.J. (1997). A new measure of lexical diversity. British Studies in Applied Linguistics, 12, 58– 71.Google Scholar
- Marcus, G.F., Pinker, S., Ullman, M., Hollander, M., Rosen, T.J., Xu, F., & Clahsen, H. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, i–178.Google Scholar
- McCarthy, P.M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD). Dissertation Abstracts International, 66, 12.Google Scholar
- R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for statistical computing. Retrieved from https://www.R-project.org/.
- Snyder, W. (2007) Child language: The parametric approach. London: Oxford University Press.Google Scholar
- Templin, M. (1957). Certain language skills in children: Their development and interrelationships (monograph series no 26). Minneapolis: University of Minnesota, the Institute of Child Welfare.Google Scholar
- Wickham, H., & Grolemund, G. (2016) R for data science: Import, tidy, transform, visualize, and model data. Sebastopol: O’Reilly Media, Inc.Google Scholar
- Wickham, H., Francois, R., Henry, L., & Müller, K. (2017). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr.
- Yurovsky, D., Wagner, K., Barner, D., & Frank, M.C. (2015). Signatures of domain-general categorization mechanisms in color word learning. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Google Scholar