Behavior Research Methods

, Volume 46, Issue 1, pp 263–273 | Cite as

The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters

  • Wei Ping SzeEmail author
  • Susan J. Rickard Liow
  • Melvin J. Yap


The Chinese language has more native speakers than any other language, but research on the reading of Chinese characters is still not as well-developed as it is for the reading of words in alphabetic languages. Two areas notably lacking are the paucity of megastudies in Chinese and the relatively infrequent use of the lexical decision paradigm to investigate single-character recognition. The Chinese Lexicon Project, described in this article, is a database of lexical decision latencies for 2,500 Chinese single characters in simplified script, collected from a sample of native mainland Chinese (Mandarin) speakers (N = 35). This resource will provide a valuable adjunct to influential mega-databases, such as the English, French, and Dutch Lexicon Projects. Using two separate analyses, some advantages associated with megastudies are exemplified. These include the selection of the strongest measure to represent Chinese character frequency (Cai & Brysbaert’s (PLoS ONE 5(6): e10729, 2010) subtitle contextual diversity frequency count), and the conducting of virtual studies to replicate and clarify existing findings. The unique morpho-syllabic nature of the Chinese writing system makes it a valuable case study for functional language contrasts. Moreover, this is the first publicly available large-scale repository of behavioral responses pertaining to Chinese language processing (the behavioral dataset is attached to this article, as a supplemental file available for download). For these reasons, the data should be of substantial interest to psychologists, linguists, and other researchers.


Mandarin Visual word recognition Megastudy Reaction time Nonalphabetic Logograph 

Supplementary material (1000 kb)
Supplementary material (ZIP 999 kb)


  1. Adelman, J. S., & Brown, G. D. A. (2008). Modeling lexical decision: The form of frequency and diversity effects. Psychological Review, 115(1), 214–229.PubMedCrossRefGoogle Scholar
  2. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17(9), 814–823.PubMedCrossRefGoogle Scholar
  3. Baayen, R. H., Piepenbrock, R., & Van Rijn, H. (1993). The CELEX lexical database on CD-ROM. Philadephia, PA: Linguistic Data Consortium.Google Scholar
  4. Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology. Human Perception and Performance, 10(3), 340–357.PubMedCrossRefGoogle Scholar
  5. Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology. General, 133(2), 283–316.PubMedCrossRefGoogle Scholar
  6. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., ... Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459.PubMedCrossRefGoogle Scholar
  7. Balota, D. A., Yap, M. J., Hutchison, K.A., & Cortese, M. J. (2012). Megastudies: Large scale analysis of lexical processes. In James S. Adelman (Series and Vol. Ed). Visual word recognition, Volume 1: Models and methods, orthography and phonology (pp. 90–115). Hove, East Sussex: Psychological Press.Google Scholar
  8. Bonin, P., Chalard, M., Méot, A., & Fayol, M. (2001). Age-of-acquisition and word frequency in the lexical decision task: Further evidence from the French language. Current Psychology of Cognition, 20(6), 401–443.Google Scholar
  9. Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.PubMedCrossRefGoogle Scholar
  10. Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavior Research Methods, Instruments, & Computers, 30(2), 272–277.CrossRefGoogle Scholar
  11. Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5(6), e10729. doi: 10.1371/journal.pone.0010729 PubMedCentralPubMedCrossRefGoogle Scholar
  12. Chen, B., Dent, K., You, W., & Wu, G. (2009). Age of acquisition affects early orthographic processing during Chinese character recognition. Acta Psychologica, 130(3), 196–203.PubMedCrossRefGoogle Scholar
  13. Chen, B., & Peng, D. (2001). 汉语双字多义词的识别优势效应 [The effects of polysemy in two-character word identification]. Acta Psychologica Sinica, 33(4), 300–304.Google Scholar
  14. Chen, H.-C., & Zhou, X. (1999). Processing East Asian languages: An introduction. Language & Cognitive Processes, 14(5/6), 425–428.CrossRefGoogle Scholar
  15. Cortese, M. J. (1998). Revisiting serial position effects in reading. Journal of Memory and Language, 39(4), 652–665.CrossRefGoogle Scholar
  16. Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33(6), 934–945.PubMedCrossRefGoogle Scholar
  17. Da, J. (2004). A corpus-based study of character and bigram frequencies in Chinese e-texts and its implications for Chinese language instruction. In P. Zhang, T. Xie, & J. Xu (Eds.), Proceedings of the 4th International Conference on New Technologies in Teaching and Learning Chinese: The studies on the theory and methodology of the digitized Chinese teaching to foreigners (pp. 501–511). Beijing: The Tsinghua University Press.Google Scholar
  18. Dong, L.-C. (2005). 说文解字考证 [An investigation of Chinese characters’ etymology]. Beijing: Writer’s Publishing House.Google Scholar
  19. Faust, M. E., Balota, D. A., Spieler, D. H., & Ferraro, F. R. (1999). Individual differences in information-processing rate and amount: Implications for group differences in response latency. Psychological Bulletin, 125(6), 777–799.PubMedCrossRefGoogle Scholar
  20. Feng, Z. (2002). 中国语料库研究的历史与现状 [Evolution and present situation of corpus research in China]. Journal of Chinese Language and Computing, 12(1), 43–62.Google Scholar
  21. Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A., ... Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42(2), 488–496.PubMedCrossRefGoogle Scholar
  22. Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology. General, 113(2), 256–281.PubMedCrossRefGoogle Scholar
  23. Gu, J.-P. (2007). 字解: 字形图解字典 [A compendium of Chinese characters]. Singapore: Chinese Heritage Lodge.Google Scholar
  24. Hoosain, R. (1991). Psycholinguistic implications for linguistic relativity: A case study of Chinese. Hillsdale, New Jersey: Lawrence Erlbaum Associates.Google Scholar
  25. Institute of Applied Linguistics. (2009). 国家语委现代汉语语料库介绍 [Introduction to the Modern Chinese corpus (cncorpus) by the State Language Commission]. Retrieved March 29, 2011, from Chinese Linguistic Data web site:
  26. Institute of Applied Linguistics. (2010). 现代汉语语料库汉字频率表 [Modern Chinese corpus character frequency list]. Retrieved March 29, 2011, from Chinese Linguistic Data web site:
  27. Institute of Linguistics in the Chinese Academy of Social Sciences. (2008). 现代汉语词典 [Modern Chinese dictionary] (5th ed.). Beijing: The Commercial Press.Google Scholar
  28. Katz, L., & Frost, R. (1992). The reading process is different for different orthographies: The orthographic depth hypothesis. In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, and meaning (pp. 67–84). Amsterdam: Elsevier North-Holland.CrossRefGoogle Scholar
  29. Keuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono- and disyllabic words and nonwords. Frontiers in Psychology, 1, 174. doi: 10.3389/fpsyg.2010.00174 PubMedCentralPubMedCrossRefGoogle Scholar
  30. Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304.PubMedCentralPubMedCrossRefGoogle Scholar
  31. Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence: Brown University Press.Google Scholar
  32. Language Teaching and Research Institute of Beijing Language and Culture University. (1986). 现代汉语频率词典 [Dictionary of modern Chinese frequency]. Beijing: Beijing Language and Culture University Press.Google Scholar
  33. Lee, S.-Y., & Krashen, S. (1996). Free voluntary reading and writing competence in Taiwanese high school students. Perceptual and Motor Skills, 83, 687–690.CrossRefGoogle Scholar
  34. Leong, C. K., Cheng, P.-W., & Mulcahy, R. (1987). Automatic processing of morphemic orthography by mature readers. Language and Speech, 30(2), 181–196.PubMedGoogle Scholar
  35. Li, P., Tan, L. H., Bates, E., & Tzeng, O. J. L. (2006). Introduction: New frontiers in Chinese psycholinguistics. In P. Li (Series and Vol. Ed.), L.H. Tan, E. Bates, & O. J. L. Tzeng (Vol. Eds.), Handbook of East Asian psycholinguistics: Vol. 1. Chinese (pp. 1–9). Cambridge, UK: Cambridge University Press.Google Scholar
  36. Liu, Y. (2009). 汉语词汇研究统计述评 [A review of Chinese vocabulary statistic studies]. Chinese Language Learning, 30(1), 62–69.Google Scholar
  37. Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39(2), 192–198.PubMedCrossRefGoogle Scholar
  38. Liu, Y., Wang, R. D., & Zhou, H. (2009). 现代汉语概论 (留学生版) [Modern Chinese: An overview]. Shanghai: Shanghai Educational Publishing House.Google Scholar
  39. Lu, S. C. (1989). 字词频率词典 (以拼音为序): 新加坡《小学华文教材》 [Frequency dictionary of Chinese characters, words and phrases used in Singapore primary school textbooks]. Singapore: Center of Research for Chinese, National University of Singapore.Google Scholar
  40. Lu, S. C. (1992). 字词频率词典 (以拼音为序): 新加坡《中学华文教材》 [Frequency dictionary of Chinese characters, words and phrases used in Singapore secondary school textbooks]. Singapore: Center of Research for Chinese, National University of Singapore.Google Scholar
  41. Myers, J., Huang, Y.-C., & Wang, W. (2006). Frequency effects in the processing of Chinese inflection. Journal of Memory and Language, 54(3), 300–323.CrossRefGoogle Scholar
  42. Ostler, N. (2008). World languages. In P. K. Austin (Ed.), 1000 languages: The worldwide history of living and lost tongues (pp. 10–34). UK: Thames & Hudson.Google Scholar
  43. Peng, D., Deng, Y., & Chen, B. (2003). 汉语多义单字词的识别优势效应 [The polysemy effect in Chinese one-character word identification]. Acta Psychologica Sinica, 35(5), 569–575.Google Scholar
  44. Perfetti, C. A., Zhang, S., & Berent, I. (1992). Reading in English and Chinese: Evidence for a ‘universal’ phonological principle. In R. Frost & L. Katz (Eds.), Orthography, phonology, morphology, and meaning (pp. 227–248). Amsterdam: Elsevier North-Holland.CrossRefGoogle Scholar
  45. Reynolds, M., & Besner, D. (2006). Reading aloud is not automatic: Processing capacity is required to generate a phonological code from print. Journal of Experimental Psychology. Human Perception and Performance, 32(6), 1303–1323.PubMedCrossRefGoogle Scholar
  46. Rogers, H. (2005). Writing systems: A linguistic approach. Malden, MA: Blackwell Publishing.Google Scholar
  47. Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning & Verbal Behavior, 9(5), 487–494.CrossRefGoogle Scholar
  48. Scarborough, D. L., Cortese, C., & Scarborough, H. S. (1977). Frequency and repetition effects in lexical memory. Journal of Experimental Psychology. Human Perception and Performance, 3(1), 1–17.CrossRefGoogle Scholar
  49. Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-prime (Version 1.2) [Computer software]. Pittsburgh: Psychology Software Tools Inc.Google Scholar
  50. Seidenberg, M. S. (1985). The time course of phonological code activation in two writing systems. Cognition, 19(1), 1–30.PubMedCrossRefGoogle Scholar
  51. Share, D. L. (2008). On the Anglocentricities of current reading research and practice: The perils of overreliance on an “outlier” orthography. Psychological Bulletin, 134(4), 584–615.PubMedCrossRefGoogle Scholar
  52. Stanovich, K. E., & Cunningham, A. E. (1993). Where does knowledge come from? Specific associations between print exposure and information acquisition. Journal of Educational Psychology, 85(2), 211–229.CrossRefGoogle Scholar
  53. Stanovich, K. E., & West, R. F. (1989). Exposure to print and orthographic processing. Reading Research Quarterly, 24(4), 402–433.CrossRefGoogle Scholar
  54. State Language Commission & News Bureau. (1988). 现代汉语通用字表 [List of Commonly Used Characters]. Retrieved June 30, 2011, from
  55. Stone, G. O., Vanhoy, M., & Van Orden, G. C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language, 36(3), 337–359.CrossRefGoogle Scholar
  56. Sun, C. (2006). Chinese: A linguistic introduction. New York: Cambridge University Press.CrossRefGoogle Scholar
  57. Tsai, P.-S., Yu, B. H.-Y., Lee, C.-Y., Tzeng, O. J. L., Hung, D. L., & Wu, D. H. (2009). An event-related potential study of the concreteness effect between Chinese nouns and verbs. Brain Research, 1253, 149–160.PubMedCrossRefGoogle Scholar
  58. Urbaniak, G. C., & Plous, S. (2011). Research Randomizer (Version 3.0) [Computer software]. Retrieved on January 1, 2011, from
  59. Wang, J. (2001). Recent progress in corpus linguistics in China. International Journal of Corpus Linguistics, 6(2), 281–304.CrossRefGoogle Scholar
  60. Xiao, R., Rayson, P. A., & McEnery, T. (2009). Frequency dictionary of Mandarin Chinese: Core vocabulary for learners. London: Routledge.Google Scholar
  61. Yap, M. J., Rickard Liow, S. J., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods, 42(4), 992–1003.PubMedCrossRefGoogle Scholar
  62. Yin, B., & Rohsenow, J. S. (1994). Modern Chinese characters. Beijing: Sinolingua.Google Scholar
  63. Yip, M. (2002). Tone. New York: Cambridge University Press.CrossRefGoogle Scholar
  64. You, W., Chen, B., & Dunlap, S. (2009). Frequency trajectory effects in Chinese character recognition: Evidence for the arbitrary mapping hypothesis. Cognition, 110(1), 39–50.PubMedCrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2013

Authors and Affiliations

  • Wei Ping Sze
    • 1
    Email author
  • Susan J. Rickard Liow
    • 1
  • Melvin J. Yap
    • 1
  1. 1.Department of PsychologyNational University of SingaporeSingaporeSingapore

Personalised recommendations