Behavior Research Methods

, Volume 42, Issue 4, pp 992–1003 | Cite as

The Malay Lexicon Project: A database of lexical statistics for 9,592 words

  • Melvin J. Yap
  • Susan J. Rickard Liow
  • Sajlia Binte Jalil
  • Siti Syuhada Binte Faizal


Malay, a language spoken by 250 million people, has a shallow alphabetic orthography, simple syllable structures, and transparent affixation—characteristics that contrast sharply with those of English. In the present article, we first compare the letter—phoneme and letter—syllable ratios for a sample of alphabetic orthographies to highlight the importance of separating language-specific from language-universal reading processes. Then, in order to develop a better understanding of word recognition in orthographies with more consistent mappings to phonology than English, we compiled a database of lexical variables (letter length, syllable length, phoneme length, morpheme length, word frequency, orthographic and phonological neighborhood sizes, and orthographic and phonological Levenshtein distances) for 9,592 Malay words. Separate hierarchical regression analyses for Malay and English revealed how the consistency of orthography—phonology mappings selectively modulates the effects of different lexical variables on lexical decision and speeded pronunciation performance. The database of lexical and behavioral measures for Malay is available at supplemental.


Word Recognition Lexical Decision Lexical Decision Task Visual Word Recognition Lexical Variable 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material (739 kb)
Supplementary material, approximately 340 KB.


  1. Algarabel, S., Ruiz, J. C., & Sanmartín, J. (1988). The University of Valencia’s computerized word pool. Behavior Research Methods, Instruments, & Computers, 20, 398–403.CrossRefGoogle Scholar
  2. Andrews, S. (2006). From inkmarks to ideas: Current issues in lexical processing. Hove, U.K.: Psychology Press.Google Scholar
  3. Awang, S. (2004). Teras pendidikan bahasa Melayu: Asas pegangan guru [Core of Malay language education: Teachers’ foundational beliefs]. Bentong, Pahang: PTS Publications Sdn Bhd.Google Scholar
  4. Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX lexical database. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.Google Scholar
  5. Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception & Performance, 10, 340–357. doi:10.1037/0096-1523.10.3.340CrossRefGoogle Scholar
  6. Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316. doi:10.1037/0096-3445.133.2.283CrossRefGoogle Scholar
  7. Balota, D., Yap, M. J., & Cortese, M. J. (2006). Visual word recognition: The journey from features to meaning (a travel update). In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed., pp. 285–375). Amsterdam: Academic Press.CrossRefGoogle Scholar
  8. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., et al. (2007). The English lexicon project. Behavior Research Methods, 39, 445–459.PubMedCrossRefGoogle Scholar
  9. Borgwaldt, S. R., Hellwig, F. W., & De Groot, A. M. B. (2005). Onset entropy matters: Letter-to-phoneme mappings in seven languages. Reading & Writing, 18, 211–229. doi:10.1007/s11145-005-3001-9CrossRefGoogle Scholar
  10. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. doi:10.3758/BRM.41.4.977PubMedCrossRefGoogle Scholar
  11. Caravolas, M. (2004). Spelling development in alphabetic writing systems: A cross-linguistic perspective. European Psychologist, 9, 3–14. doi:10.1027/1016-9040.9.1.3CrossRefGoogle Scholar
  12. Caravolas, M., & Bruck, M. (1993). The effects of oral and written language input on children’s phonological awareness: A cross-linguistic study. Journal of Experimental Child Psychology, 55, 1–30. doi:10.1006/jecp.1993.1001CrossRefGoogle Scholar
  13. Chateau, D., & Jared, D. (2003). Spelling—sound consistency effects in disyllabic word naming. Journal of Memory & Language, 48, 255–280. doi:10.1016/S0749-596X(02)00521-1CrossRefGoogle Scholar
  14. Coltheart, M., Davelaar, E., Jonasson, J., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535–555). Hillsdale, NJ: Erlbaum.Google Scholar
  15. Coltheart, M., RaSTLE, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204–256. doi:10.1037/0033-295X.108.1.204PubMedCrossRefGoogle Scholar
  16. Davis, C. J. (2005). N-watch: A program for deriving neighborhood size and other psycholinguistic characteristics. Behavior Research Methods, 37, 65–70.PubMedCrossRefGoogle Scholar
  17. Dewan Bahasa dan Pustaka (2007). Kamus Dewan Edisi Keempat (4th ed.). Kuala Lumpur, Malaysia: Author.Google Scholar
  18. Durgunoglu, A.Y., & Oney, B. (1999). A cross-linguistic comparison of phonological awareness and word recognition. Reading & Writing, 11, 281–299. doi:10.1023/A:1008093232622CrossRefGoogle Scholar
  19. Ellis, N. C., & Hooper, A. M. (2001). Why learning to read is easier in Welsh than in English: Orthographic transparency effects evinced with frequency-matched tests. Applied Psycholinguistics, 22, 571–599. doi:10.1017/S0142716401004052CrossRefGoogle Scholar
  20. Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A., et al. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42, 488–496. doi:10.3758/BRM.42.2.488PubMedCrossRefGoogle Scholar
  21. Frost, R., & Katz, L. (Eds.) (1992). Orthography, phonology, morphology, and meaning. Amsterdam: Elsevier.Google Scholar
  22. Frost, R., Katz, L., & Bentin, S. (1987). Strategies for visual word recognition and orthographical depth: A multilingual comparison. Journal of Experimental Psychology: Human Perception & Performance, 13, 104–115. doi:10.1037/0096-1523.13.1.104CrossRefGoogle Scholar
  23. Hassan, A. (1974). The morphology of Malay. Kuala Lumpur: Dewan Bahasa dan Pustaka.Google Scholar
  24. Jalil, S., & Rickard Liow, S. J. (2008). How does home language influence early spellings? Phonologically plausible errors of diglossic Malay children. Applied Psycholinguistics, 29, 535–552.CrossRefGoogle Scholar
  25. Karim, N. S., Onn, F. M., Haji Musa, H., & Mahmood, A. H. (2008). Tatabahasa Dewan (Edisi Ketiga). Kuala Lumpur, Malaysia: Dewan Bahasa dan Pustaka.Google Scholar
  26. Koh, B. B. (1978). The teaching of Malay affixes. Kuala Lumpur, Malaysia: Fajar Bakti.Google Scholar
  27. Ktori, M., van Heuven, W. J. B., & Pitchford, N. J. (2008). GreekLex: A lexical database of modern Greek. Behavior Research Methods, 40, 773–783. doi:10.3758/BRM.40.3.773PubMedCrossRefGoogle Scholar
  28. Lee, L. W. (2008). Development and validation of a reading-related assessment battery in Malay for the purpose of dyslexia assessment. Annals of Dyslexia, 58, 37–57. doi:10.1007/s11881-007-0011-0PubMedCrossRefGoogle Scholar
  29. Leppanen, U., Niemi, P., Aunola, K., & Nurmi, J.-E. (2006). Development of reading and spelling Finnish from preschool to grade 1 and grade 2. Scientific Studies of Reading, 10, 3–30. doi:10.1207/s1532799xssr1001_2CrossRefGoogle Scholar
  30. Lervåg, A., Bråten, I., & Hulme, C. (2009). The cognitive and linguistic foundations of early reading development: A Norwegian latent variable longitudinal study. Developmental Psychology, 45, 764–781. doi:10.1037/a0014132PubMedCrossRefGoogle Scholar
  31. Lukatela, G., Popadič, D., Ognjenovič, P., & Turvey, M. T. (1980). Lexical decision in a phonologically shallow orthography. Memory & Cognition, 8, 124–132.CrossRefGoogle Scholar
  32. Mohamed, P. G. (2003). Pragmatik bahasa Melayu Baku di media massa Singapura [The pragmatics of “Baku” Malay language in the Singapore mass media] (Working Paper, Special Teaching Programme, Malay). Singapore: National Institute of Education.Google Scholar
  33. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106, 226–254. doi:10.1037/0096-3445.106.3.226CrossRefGoogle Scholar
  34. New, B., Ferrand, L., Pallier, C., & Brysbaert, M. (2006). Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project. Psychonomic Bulletin & Review, 13, 45–52.CrossRefGoogle Scholar
  35. New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36, 516–524.CrossRefGoogle Scholar
  36. Perry, C., Ziegler, J. C., & Zorzi, M. (2007). Nested incremental modeling in the development of computational theories: The CDP+ model of reading aloud. Psychological Review, 114, 273–315. doi:10.1037/0033-295X.114.2.273PubMedCrossRefGoogle Scholar
  37. Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1989). Positron emission tomographic studies of the processing of single words. Journal of Cognitive Neuroscience, 1, 153–170. doi:10.1162/jocn.1989.1.2.153CrossRefGoogle Scholar
  38. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56–115. doi:10.1037/0033-295X.103.1.56PubMedCrossRefGoogle Scholar
  39. Rickard Liow, S. J., & Lee, L. C. (2004). Metalinguistic awareness and semi-syllabic scripts: Children’s spelling errors in Malay. Reading & Writing, 17, 7–26. doi:10.1023/B:READ.0000013833.79570.deCrossRefGoogle Scholar
  40. Rickard Liow, S. J., Yap, M. J., Lee, L. C., & Ramos, S. D. S. (2008, November). Influence of lexical—orthographic variables in children’s spelling skills. Poster presented at the 49th Annual Meeting of the Psychonomic Society, Chicago.Google Scholar
  41. Schneider, W., Eschman, A., & Zuccolotto, A. (2001). E-Prime user’s guide. Pittsburgh: Psychology Software Tools, Inc.Google Scholar
  42. Seymour, P. H. K., Aro, M., & Erskine, J. M. (2003). Foundation literacy acquisition in European orthographies. British Journal of Psychology, 94, 143–174. doi:10.1348/000712603321661859PubMedCrossRefGoogle Scholar
  43. Share, D. (2008). On the Anglocentricities of current reading research and practice: The perils of overreliance on an “outlier” orthography. Psychological Bulletin, 134, 584–615. doi:10.1037/0033-2909.134.4.584PubMedCrossRefGoogle Scholar
  44. Spencer, K. A. (2009). Feedforward, -backward, and neutral transparency measures for British English. Behavior Research Methods, 41, 220–227. doi:10.3758/BRM.41.1.220PubMedCrossRefGoogle Scholar
  45. Tadmor, U. (2009). Malay—Indonesian. In B. Comrie (Ed.), The world’s major languages (2nd ed., pp. 791–818). London: Routledge.Google Scholar
  46. Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology: General, 124, 107–136. doi:10.1037/0096-3445.124.2.107CrossRefGoogle Scholar
  47. Van Casteren, M., & Davis, M. H. (2007). Match: A program to assist in matching the conditions of factorial experiments. Behavior Research Methods, 39, 973–978.PubMedCrossRefGoogle Scholar
  48. Winskel, H., & Widjaja, V. (2007). Phonological awareness, letter knowledge, and literacy development in Indonesian beginner readers and spellers. Applied Psycholinguistics, 28, 23–45. doi:10.1017/S0142716407070026CrossRefGoogle Scholar
  49. Wydell, T. N., & Butterworth, B. (1999). A case study of an English—Japanese bilingual with monolingual dyslexia. Cognition, 70, 273–305. doi:10.1016/S0010-0277(99)00016-5PubMedCrossRefGoogle Scholar
  50. Yap, M. J., & Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory & Language, 60, 502–529. doi:10.1016/j.jml.2009.02.001CrossRefGoogle Scholar
  51. Yarkoni, T., Balota, D. A., & Yap, M. J. (2008). Moving beyond Coltheart’sN: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971–979. doi:10.3758/PBR.15.5.971CrossRefGoogle Scholar
  52. Yates, M. (2005). Phonological neighbors speed visual word processing: Evidence from multiple tasks. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 1385–1397. doi:10.1037/0278-7393.31.6.1385CrossRefGoogle Scholar
  53. Yates, M., Locker, L., & Simpson, G. B. (2004). The influence of phonological neighborhood on visual word perception. Psychonomic Bulletin & Review, 11, 452–457.CrossRefGoogle Scholar
  54. Ziegler, J. C., & Goswami, U. (2005). Reading acquisition, developmental dyslexia, and skilled reading across languages: A psycholinguistic grain size theory. Psychological Bulletin, 131, 3–29. doi:10.1037/0033-2909.131.1.3PubMedCrossRefGoogle Scholar
  55. Ziegler, J. C., Perry, C., & Coltheart, M. (2000). The DRC model of visual word recognition and reading aloud: An extension to German. European Journal of Cognitive Psychology, 12, 413–430. doi:10.1080/09541440050114570CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  • Melvin J. Yap
    • 1
  • Susan J. Rickard Liow
    • 1
  • Sajlia Binte Jalil
    • 2
  • Siti Syuhada Binte Faizal
    • 3
  1. 1.Department of Psychology, Faculty of Arts and Social SciencesNational University of SingaporeSingaporeRepublic of Singapore
  2. 2.Changi General HospitalSingapore
  3. 3.Washington UniversitySt. Louis

Personalised recommendations