The Chinese language has more native speakers than any other language, but research on the reading of Chinese characters is still not as well-developed as it is for the reading of words in alphabetic languages. Two areas notably lacking are the paucity of megastudies in Chinese and the relatively infrequent use of the lexical decision paradigm to investigate single-character recognition. The Chinese Lexicon Project, described in this article, is a database of lexical decision latencies for 2,500 Chinese single characters in simplified script, collected from a sample of native mainland Chinese (Mandarin) speakers (N = 35). This resource will provide a valuable adjunct to influential mega-databases, such as the English, French, and Dutch Lexicon Projects. Using two separate analyses, some advantages associated with megastudies are exemplified. These include the selection of the strongest measure to represent Chinese character frequency (Cai & Brysbaert’s (PLoS ONE 5(6): e10729, 2010) subtitle contextual diversity frequency count), and the conducting of virtual studies to replicate and clarify existing findings. The unique morpho-syllabic nature of the Chinese writing system makes it a valuable case study for functional language contrasts. Moreover, this is the first publicly available large-scale repository of behavioral responses pertaining to Chinese language processing (the behavioral dataset is attached to this article, as a supplemental file available for download). For these reasons, the data should be of substantial interest to psychologists, linguists, and other researchers.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
A mega-naming study was previously conducted by Liu, Shu, and Li (2007). Unfortunately, their naming latencies were not released for public access.
The Web of Science search was conducted on July 2, 2012. Results from the search can be generic, so each of the articles generated by the search was checked manually to remove the irrelevant titles.
The numerical breakdown of participants excluded on the basis of poor performance on the screening tasks or lexical decision task is as follow: 1 participant was eliminated on the basis of The Chinese Author Recognition Test, 2 were eliminated on the basis of The HSK Chinese Proficiency Test, and 8 participants scored less than 85% accuracy on the lexical decision task and thus, their data were discarded.
We should point out that cncorpus was not created solely for psycholinguistic research. As a national corpus, it probably serves other functions—for example, providing historical linguists information on the evolution of character use and so forth. One recommendation would be for the corpus to include an option for users to select, say, which period and type of text to begin computing character/word statistics from, thus reducing dead weight.
In the virtual replication of Leong et al. (1987), 数 (數 in traditional script) was included in the stimuli as a character with many strokes. The inclusion of 数 does not violate the stroke manipulation, since its simplified form has 13 strokes, which is above the cutoff of “many strokes” (Leong et al.’s cutoff is placed at 12). The traditional 數 has 15 strokes. In any case, we also ran an additional series of analyses that excluded 数. The same pattern of findings was elicited [response time: F 1(1, 34) = 5.26, MSE = 1,985.36, p < .03, η partial = .13].
B. Chen et al. (2009) created two sets of stimuli for their three experiments (Experiment 1 [tachistoscopic task] used stimuli set 1, while Experiments 2 and 3 [visual duration threshold and lexical decision tasks] used stimuli set 2). Both sets of stimuli were created on the basis of the same design and requirements. These two sets of stimuli were thus combined for the drawing of our stimuli. Out of the 128 unique characters found in the combined set (61 early AoA, 67 late AoA; there were repetitions in the two stimuli sets), 78 of them are present in the Chinese Lexicon Project (38 early AoA, 40 late AoA). The value 78 is rather close to the original number of stimuli used in Chen et al.’s lexical decision experiment (72 characters were chosen as stimuli—i.e., 36 early AoA, 36 late AoA).
Adelman, J. S., & Brown, G. D. A. (2008). Modeling lexical decision: The form of frequency and diversity effects. Psychological Review, 115(1), 214–229.
Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17(9), 814–823.
Baayen, R. H., Piepenbrock, R., & Van Rijn, H. (1993). The CELEX lexical database on CD-ROM. Philadephia, PA: Linguistic Data Consortium.
Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology. Human Perception and Performance, 10(3), 340–357.
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology. General, 133(2), 283–316.
Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., ... Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459.
Balota, D. A., Yap, M. J., Hutchison, K.A., & Cortese, M. J. (2012). Megastudies: Large scale analysis of lexical processes. In James S. Adelman (Series and Vol. Ed). Visual word recognition, Volume 1: Models and methods, orthography and phonology (pp. 90–115). Hove, East Sussex: Psychological Press.
Bonin, P., Chalard, M., Méot, A., & Fayol, M. (2001). Age-of-acquisition and word frequency in the lexical decision task: Further evidence from the French language. Current Psychology of Cognition, 20(6), 401–443.
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.
Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavior Research Methods, Instruments, & Computers, 30(2), 272–277.
Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5(6), e10729. doi:10.1371/journal.pone.0010729
Chen, B., Dent, K., You, W., & Wu, G. (2009). Age of acquisition affects early orthographic processing during Chinese character recognition. Acta Psychologica, 130(3), 196–203.
Chen, B., & Peng, D. (2001). 汉语双字多义词的识别优势效应 [The effects of polysemy in two-character word identification]. Acta Psychologica Sinica, 33(4), 300–304.
Chen, H.-C., & Zhou, X. (1999). Processing East Asian languages: An introduction. Language & Cognitive Processes, 14(5/6), 425–428.
Cortese, M. J. (1998). Revisiting serial position effects in reading. Journal of Memory and Language, 39(4), 652–665.
Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33(6), 934–945.
Da, J. (2004). A corpus-based study of character and bigram frequencies in Chinese e-texts and its implications for Chinese language instruction. In P. Zhang, T. Xie, & J. Xu (Eds.), Proceedings of the 4th International Conference on New Technologies in Teaching and Learning Chinese: The studies on the theory and methodology of the digitized Chinese teaching to foreigners (pp. 501–511). Beijing: The Tsinghua University Press.
Dong, L.-C. (2005). 说文解字考证 [An investigation of Chinese characters’ etymology]. Beijing: Writer’s Publishing House.
Faust, M. E., Balota, D. A., Spieler, D. H., & Ferraro, F. R. (1999). Individual differences in information-processing rate and amount: Implications for group differences in response latency. Psychological Bulletin, 125(6), 777–799.
Feng, Z. (2002). 中国语料库研究的历史与现状 [Evolution and present situation of corpus research in China]. Journal of Chinese Language and Computing, 12(1), 43–62.
Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A., ... Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42(2), 488–496.
Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology. General, 113(2), 256–281.
Gu, J.-P. (2007). 字解: 字形图解字典 [A compendium of Chinese characters]. Singapore: Chinese Heritage Lodge.
Hoosain, R. (1991). Psycholinguistic implications for linguistic relativity: A case study of Chinese. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Institute of Applied Linguistics. (2009). 国家语委现代汉语语料库介绍 [Introduction to the Modern Chinese corpus (cncorpus) by the State Language Commission]. Retrieved March 29, 2011, from Chinese Linguistic Data web site: www.cncorpus.org
Institute of Applied Linguistics. (2010). 现代汉语语料库汉字频率表 [Modern Chinese corpus character frequency list]. Retrieved March 29, 2011, from Chinese Linguistic Data web site: www.cncorpus.org
Institute of Linguistics in the Chinese Academy of Social Sciences. (2008). 现代汉语词典 [Modern Chinese dictionary] (5th ed.). Beijing: The Commercial Press.
Katz, L., & Frost, R. (1992). The reading process is different for different orthographies: The orthographic depth hypothesis. In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, and meaning (pp. 67–84). Amsterdam: Elsevier North-Holland.
Keuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono- and disyllabic words and nonwords. Frontiers in Psychology, 1, 174. doi:10.3389/fpsyg.2010.00174
Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304.
Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence: Brown University Press.
Language Teaching and Research Institute of Beijing Language and Culture University. (1986). 现代汉语频率词典 [Dictionary of modern Chinese frequency]. Beijing: Beijing Language and Culture University Press.
Lee, S.-Y., & Krashen, S. (1996). Free voluntary reading and writing competence in Taiwanese high school students. Perceptual and Motor Skills, 83, 687–690.
Leong, C. K., Cheng, P.-W., & Mulcahy, R. (1987). Automatic processing of morphemic orthography by mature readers. Language and Speech, 30(2), 181–196.
Li, P., Tan, L. H., Bates, E., & Tzeng, O. J. L. (2006). Introduction: New frontiers in Chinese psycholinguistics. In P. Li (Series and Vol. Ed.), L.H. Tan, E. Bates, & O. J. L. Tzeng (Vol. Eds.), Handbook of East Asian psycholinguistics: Vol. 1. Chinese (pp. 1–9). Cambridge, UK: Cambridge University Press.
Liu, Y. (2009). 汉语词汇研究统计述评 [A review of Chinese vocabulary statistic studies]. Chinese Language Learning, 30(1), 62–69.
Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39(2), 192–198.
Liu, Y., Wang, R. D., & Zhou, H. (2009). 现代汉语概论 (留学生版) [Modern Chinese: An overview]. Shanghai: Shanghai Educational Publishing House.
Lu, S. C. (1989). 字词频率词典 (以拼音为序): 新加坡《小学华文教材》 [Frequency dictionary of Chinese characters, words and phrases used in Singapore primary school textbooks]. Singapore: Center of Research for Chinese, National University of Singapore.
Lu, S. C. (1992). 字词频率词典 (以拼音为序): 新加坡《中学华文教材》 [Frequency dictionary of Chinese characters, words and phrases used in Singapore secondary school textbooks]. Singapore: Center of Research for Chinese, National University of Singapore.
Myers, J., Huang, Y.-C., & Wang, W. (2006). Frequency effects in the processing of Chinese inflection. Journal of Memory and Language, 54(3), 300–323.
Ostler, N. (2008). World languages. In P. K. Austin (Ed.), 1000 languages: The worldwide history of living and lost tongues (pp. 10–34). UK: Thames & Hudson.
Peng, D., Deng, Y., & Chen, B. (2003). 汉语多义单字词的识别优势效应 [The polysemy effect in Chinese one-character word identification]. Acta Psychologica Sinica, 35(5), 569–575.
Perfetti, C. A., Zhang, S., & Berent, I. (1992). Reading in English and Chinese: Evidence for a ‘universal’ phonological principle. In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, and meaning (pp. 227–248). Amsterdam: Elsevier North-Holland.
Reynolds, M., & Besner, D. (2006). Reading aloud is not automatic: Processing capacity is required to generate a phonological code from print. Journal of Experimental Psychology. Human Perception and Performance, 32(6), 1303–1323.
Rogers, H. (2005). Writing systems: A linguistic approach. Malden, MA: Blackwell Publishing.
Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning & Verbal Behavior, 9(5), 487–494.
Scarborough, D. L., Cortese, C., & Scarborough, H. S. (1977). Frequency and repetition effects in lexical memory. Journal of Experimental Psychology. Human Perception and Performance, 3(1), 1–17.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-prime (Version 1.2) [Computer software]. Pittsburgh: Psychology Software Tools Inc.
Seidenberg, M. S. (1985). The time course of phonological code activation in two writing systems. Cognition, 19(1), 1–30.
Share, D. L. (2008). On the Anglocentricities of current reading research and practice: The perils of overreliance on an “outlier” orthography. Psychological Bulletin, 134(4), 584–615.
Stanovich, K. E., & Cunningham, A. E. (1993). Where does knowledge come from? Specific associations between print exposure and information acquisition. Journal of Educational Psychology, 85(2), 211–229.
Stanovich, K. E., & West, R. F. (1989). Exposure to print and orthographic processing. Reading Research Quarterly, 24(4), 402–433.
State Language Commission & News Bureau. (1988). 现代汉语通用字表 [List of Commonly Used Characters]. Retrieved June 30, 2011, from http://www.china-language.gov.cn/wenziguifan/shanghi/014c.htm
Stone, G. O., Vanhoy, M., & Van Orden, G. C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language, 36(3), 337–359.
Sun, C. (2006). Chinese: A linguistic introduction. New York: Cambridge University Press.
Tsai, P.-S., Yu, B. H.-Y., Lee, C.-Y., Tzeng, O. J. L., Hung, D. L., & Wu, D. H. (2009). An event-related potential study of the concreteness effect between Chinese nouns and verbs. Brain Research, 1253, 149–160.
Urbaniak, G. C., & Plous, S. (2011). Research Randomizer (Version 3.0) [Computer software]. Retrieved on January 1, 2011, from http://www.randomizer.org/
Wang, J. (2001). Recent progress in corpus linguistics in China. International Journal of Corpus Linguistics, 6(2), 281–304.
Xiao, R., Rayson, P. A., & McEnery, T. (2009). Frequency dictionary of Mandarin Chinese: Core vocabulary for learners. London: Routledge.
Yap, M. J., Rickard Liow, S. J., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods, 42(4), 992–1003.
Yin, B., & Rohsenow, J. S. (1994). Modern Chinese characters. Beijing: Sinolingua.
Yip, M. (2002). Tone. New York: Cambridge University Press.
You, W., Chen, B., & Dunlap, S. (2009). Frequency trajectory effects in Chinese character recognition: Evidence for the arbitrary mapping hypothesis. Cognition, 110(1), 39–50.
Electronic Supplementary Materials
Below is the link to the electronic supplementary material.
(ZIP 999 kb)
About this article
Cite this article
Sze, W.P., Rickard Liow, S.J. & Yap, M.J. The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behav Res 46, 263–273 (2014). https://doi.org/10.3758/s13428-013-0355-9
- Visual word recognition
- Reaction time