Behavior Research Methods

, Volume 47, Issue 4, pp 1085–1094 | Cite as

childLex: a lexical database of German read by children

  • Sascha SchroederEmail author
  • Kay-Michael Würzner
  • Julian Heister
  • Alexander Geyken
  • Reinhold Kliegl


This article introduces childLex, an online database of German read by children. childLex is based on a corpus of children’s books and comprises 10 million words that were syntactically annotated and lemmatized. childLex reports linguistic norms for lexical, superlexical, and sublexical variables in three different age groups: 6–8 (grades 1–2), 9–10 (grades 3–4), and 11–12 years (grades 5–6). Here, we describe how childLex was collected and analyzed. In addition, we provide information about the distributions of word frequency, word length, and orthographic neighborhood size, as well as their intercorrelations. Finally, we explain how childLex can be accessed using a Web interface.


Lexical database Child language Reading development 


Author Note

We thank Alexa Brand, Svenja Brinkmeier, Sarah König, Laura Nohr, Kristina Ruff, Sarah Salin, and Felix Seidel for their help in selecting and preparing the corpus material. We are also grateful to Ursula Flitner, Library of the Max Planck Institute for Human Development, and Benjamin Scheffler, State Library Berlin, and the ZEIT for supporting this study, providing library loan statistics, and making children’s self-reports available.


  1. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823. doi: 10.1111/j.1467-9280.2006.01787.x CrossRefPubMedGoogle Scholar
  2. Baayen, R. H. (2001). Word frequency distributions. Dordrecht, The Netherlands: Kluwer. doi: 10.1007/978-94-010-0844-0 CrossRefGoogle Scholar
  3. Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (Release 2) [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.Google Scholar
  4. Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single syllable words. Journal of Experimental Psychology: General, 133, 283–316. doi: 10.1037/0096-3445.133.2.283 CrossRefGoogle Scholar
  5. Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., & Uszkoreit, H. (2004). TIGER: Linguistic interpretation of a German corpus. Research on Language and Computation, 2, 597–620. doi: 10.1007/s11168-004-7431-3 CrossRefGoogle Scholar
  6. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412–424. doi: 10.1027/1618-3169/a000123 CrossRefPubMedGoogle Scholar
  7. Carroll, J. B., Davies, P., & Richman, B. (Eds.). (1971). The American Heritage word frequency book. Boston, MA: Houghton Mifflin.Google Scholar
  8. Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535–555). Hillsdale, NJ: Erlbaum.Google Scholar
  9. Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual-route model of visual word recognition and reading aloud. Psychological Review, 108, 204–256. doi: 10.1037/0033-295X.108.1.204 CrossRefPubMedGoogle Scholar
  10. Corral, S., Ferrero, M., & Goikotxea, E. (2009). LEXIN: A lexical database from Spanish kindergarten and first-grade readers. Behavior Research Methods, 41, 1009–1017. doi: 10.3758/BRM.41.4.1009 CrossRefPubMedGoogle Scholar
  11. Forney, G. D., Jr. (1973). The Viterbi algorithm. Proceedings of the IEEE, 61, 268–278.CrossRefGoogle Scholar
  12. Fox, A. (2005). The structure of German (2nd ed.). Oxford, UK: Oxford University Press.Google Scholar
  13. Geyken, A. (2007). The DWDS corpus: A reference corpus for the German language of the 20th century. In C. Fellbaum (Ed.), Collocations and idioms: Linguistic, lexicographic, and computational aspects (pp. 23–41). London, UK: Continuum.Google Scholar
  14. Geyken, A., & Hanneforth, T. (2006). TAGH: A complete morphology for German based on weighted finite state automata. In A. Yli-Jyrä, L. Karttunen, & J. Karhumäki (Eds.), Finite state methods and natural language processing (pp. 55–66). Berlin, Germany: Springer. doi: 10.1007/11780885_7 CrossRefGoogle Scholar
  15. Grainger, J., & Ziegler, J. C. (2011). A dual-route approach to orthographic processing. Frontiers in Psychology, 2(54), 1–13. doi: 10.3389/fpsyg.2011.00054 Google Scholar
  16. Hanke, J., Vajjala, S., & Meurers, D. (2012). Readability classification for German using lexical, syntactic, and morphological features. In M. Kay & C. Boitet (Eds.), Proceedings of COLING 2012: Technical Papers (pp. 1063–1080). Mumbai, India: Indian Institute of Technology Bombay, COLING 2012 Organizing Committee.Google Scholar
  17. Heister, J., Würzner, K.-M., Bubenzer, J., Pohl, E., Henneforth, T., Geyken, A., & Kliegl, R. (2011). dlexDB – eine lexikalische Datenbank für die psychologische Forschung [dlexDB: A lexical database for psychological research]. Psychologische Rundschau, 62, 10–20. doi: 10.1026/0033-3042/a000029 CrossRefGoogle Scholar
  18. Jurish, B. (2003). Part-of-speech tagging with finite state morphology. Berlin, Germany: Poster presented at the conference Collocations and Idioms: Linguistic, Computational, and Psycholinguistic Perspectives.Google Scholar
  19. Jurish, B., & Würzner, K.-M. (2013). Word and sentence tokenization width hidden Markov models. Journal of Language Technology and Computational Linguistics, 28, 61–83.Google Scholar
  20. Kuperman, V., & Bertram, R. (2012). Moving spaces: Spelling alternation in English noun–noun compounds. Language and Cognitive Processes, 28, 939–966. doi: 10.1080/01690965.2012.701757 CrossRefGoogle Scholar
  21. Lété, B., Sprenger-Charolles, L., & Colé, P. (2004). MANULEX: A grade-level lexical database from French elementary school readers. Behavior Research Methods, 36, 156–166. doi: 10.3758/BF03195560 CrossRefGoogle Scholar
  22. Martínez, J. A., & García, M. E. (2008). ONESC: A database of orthographic neighbors for Spanish read by children. Behavior Research Methods, 40, 191–197. doi: 10.3758/BRM.40.1.191 CrossRefGoogle Scholar
  23. Masterson, J., Stuart, M., Dixon, M., & Lovejoy, S. (2010). Children’s printed word database: Continuities and changes over time in children’s early reading vocabulary. British Journal of Psychology, 101, 221–242. doi: 10.1348/000712608X371744 CrossRefPubMedGoogle Scholar
  24. Naumann, C. L. (1999). Orientierungswortschatz. Die wichtigsten Wörter und Regeln für die Rechtschreibung Klasse 1 bis 6 [Orientation vocabulary: The most important words and spelling rules for grades 1 to 6]. Weinheim, Germany: Beltz.Google Scholar
  25. Pregel, D., & Rickheit, G. (1987). Der Wortschatz im Grundschulalter [Vocabulary at primary school age]. Hildesheim, Germany: Olms.Google Scholar
  26. Schiller, A., Teufel, S., & Stöckert, G. (1999). Guidelines für das Tagging deutscher Korpora mit STTS [Guidelines for tagging German corpora using STTS]. Germany: Unpublished manuscript, University of Stuttgart.Google Scholar
  27. Soares, A. P., Medeiros, J. C., Simões, A., Machado, J., Costa, A., Iriarte, Á., & Comesaña, M. (2014). ESCOLEX: A grade-level lexical database from European Portuguese elementary to middle school textbooks. Behavior Research Methods, 46, 240–253. doi: 10.3758/s13428-013-0350-1 CrossRefPubMedGoogle Scholar
  28. Stanat, P., Pant, H. A., Böhme, K., & Richter, D. (Eds.). (2012). Kompetenzen von Schülerinnen und Schülern am Ende der vierten Jahrgangsstufe in den Fächern Mathematik und Deutsch [Mathematics and German literacy at the end of grade 4]. Münster, Germany: Waxmann.Google Scholar
  29. Stanovich, K. E. (2000). Progress in understanding reading: Scientific foundations and new frontiers. New York, NY: Guilford.Google Scholar
  30. Thorndike, E. L. (1921). The teacher’s word book. New York, NY: Columbia University Press.Google Scholar
  31. van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190. doi: 10.1080/17470218.2013.850521 CrossRefGoogle Scholar
  32. Yarkoni, T., Balota, D., & Yap, M. (2009). Moving beyond Coltheart’s N : A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971–979. doi: 10.3758/PBR.15.5.971 CrossRefGoogle Scholar
  33. Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster, NY: Touchstone.Google Scholar
  34. Ziegler, J. C., & Goswami, U. (2005). Reading acquisition, developmental dyslexia, and skilled reading across languages: A psycholinguistic grain size theory. Psychological Bulletin, 131, 3–29. doi: 10.1037/0033-2909.131.1.3 CrossRefPubMedGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2014

Authors and Affiliations

  • Sascha Schroeder
    • 1
    Email author
  • Kay-Michael Würzner
    • 2
  • Julian Heister
    • 3
  • Alexander Geyken
    • 2
  • Reinhold Kliegl
    • 3
  1. 1.MPRG Reading Education and Development (REaD)Max Planck Institute for Human DevelopmentBerlinGermany
  2. 2.Digital Dictionary of the German Language ProjectBerlin–Brandenburg Academy of SciencesBerlinGermany
  3. 3.Department of PsychologyUniversity of PotsdamPotsdamGermany

Personalised recommendations