Abstract
In recent years large datasets of lexical processing times have been released for several languages, including English, French, Spanish, and Dutch. Such datasets have enabled us to study, compare, and model the global effects of many psycholinguistic measures such as word frequency, orthographic neighborhood (ON) size, and word length. We have compiled and publicly released a frequency and ON dictionary of 64,546 words and 1800 plausible NWs from a language that has been relatively little studied by psycholinguists: Persian. We have also collected visual lexical decision reaction times for 1800 Persian words and nonwords. Persian offers an interesting psycholinguistic environment for several reasons, including that it has few long words and has resultantly dense orthographic neighborhoods. These characteristics provide us with an opportunity to contrast how these factors affect lexical access by comparing them to several other languages. The results suggest that sensitivity to word length and orthographic neighbourhood may reflect the statistical structure of a particular language, rather than being a universal element of lexical processing. The dictionary and LDRT data are available from https://osf.io/tb4m6/.
Similar content being viewed by others
Data availability
The data and materials for all experiments are available at https://osf.io/tb4m6/.
References
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B. … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459
Baluch, B. (1993). Lexical decisions in Persian: A test of the orthographic depth hypothesis. International Journal of Psychology, 28, 19–29
Baluch, B., & Besner, D. (1991). Visual word recognition: Evidence for strategic control of lexical and nonlexical routines in oral reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17(4), 644
Bakhtiar, M., & Weekes, B. (2015). Lexico-semantic effects on word naming in Persian: Does age of acquisition have an effect? Memory & Cognition, 43(2), 298–313
Boudelaa, S., & Marslen-Wilson, W. D. (2010). Aralex: A lexical database for modern standard Arabic. Behavior Research Methods, 42(2), 481–487
Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441
Butler, B., & Hains, S. (1979). Individual differences in word recognition latency. Memory and Cognition, 7, 68–76
Baayen, R. H., Dijkstra, T., & Schreuder, R. (1997). Singulars and plurals in Dutch: Evidence for a parallel dual route model. Journal of Memory and Language, 36, 94–117
De Jong, N. H., Schreuder, R., & Baayen, H. R. (2000). The morphological family size effect and morphology. Language and Cognitive Processes, 15(4–5), 329–365
Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204–256
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F. … Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440–8451, July 2020
Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A. … Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42(2), 488–496
Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of experimental psychology: General, 113(2), 256
Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the 8th International Language Resources and Evaluation (LREC’12)
Grainger, J., & Jacobs, A. M. (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518–565
Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304
Majliš, M., & Žabokrtský, Z. (2011). Language Richness of the Web. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation [LREC ‘12]. Downloaded from: http://lrec-conf.org/proceedings/lrec2012/index.html
Momenian, M., Nilipour, R., & Oghabian, M. (2015). Age/order of vocabulary acquisition effects in the foreign language: A lexical decision task. Language Related Research, 5(5), 229–250
Morrison, C. M., & Ellis, A. W. (1995). The roles of word frequency and age of acquisition in word naming and lexical decision. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 116–133
Morrison, C. M., & Ellis, A. W. (2000). Real age of acquisition effects in word naming and lexical decision. British Journal of Psychology, 91(2), 167–180
Mokhlesin, M., Ahadi, H., Bakhtiari, J., Ahmadizadeh, Z., & Kasbi, F. (2015). Persian norms for affective dimensions and lexico-semantic features of words. Koomesh, 17(1), 60–76
Najafi, A. (2014). A psycholinguistic view of recognition of derived words in Persian mental lexicon and its application in term selection. Journal of Language Research (Faculty of Letters and Humanities, University of Tehran), 4(2), 181–198
Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., & Raja, F. (2006). Creating a Feasible Corpus for Persian POS Tagging. Technical Report, no. TR3/06, University of Wollongong in Dubai
Peirce, J. W., Gray, J. R., Simpson, S., MacAskill, M. R., Höchenberger, R., Sogo, H. … Lindeløv, J. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51, 195–203.https://doi.org/10.3758/s13428-018-01193-y
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56–115
Razi, F. (1987). Farhang-e vazhehaye Farsi-ye sare baraye vazhehaye Arabi dar Farsi-ye Mo’aser. Markaz: Tehran
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed developmental model of word recognition and naming. Psychological Review, 96, 523–568
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423
Taft, M. (1979). Recognition of affixed words and the word frequency effect. Memory and Cognition, 7, 263–272
Taghva, K., Young, R., Coombs, J., Pereda, R., Beckley, R., & Sadeh, M. (2003, April). Farsi searching and display technologies. In: Proceedings of the 2003 Symposium on Document Image Understanding Technology (pp. 41–46)
Vandierendonck, A. (2017). A comparison of methods to combine speed and accuracy measures of performance: A rejoinder on the binning procedure. Behavior Research Methods, 49(2), 653–673
Wenzek, G., Lachaux, M., Conneau, A., Chaudhary, V., Guzmán, F., Joulin, A., & Grave, E. (2020). CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003–4012, May 2020
Westbury, C., Hollis, G., & Shaoul, C. (2007). LINGUA: The language-independent neighbourhood generator of the University of Alberta. The Mental Lexicon, 2(2), 271–284
Westbury, C. (2014). You can’t drink a word: Lexical and individual emotionality affect subjective familiarity judgments. Journal of Psycholinguistic Research, 43(5), 631–649
Whaley, C. P. (1978). Word-non-word classification time. Journal of Verbal Learning and Verbal Behavior, 17, 143–154
Windfuhr, G. (Ed.). (2009). The Iranian Languages. Psychology Press
Yap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods, 42(4), 992–1003
Young, R. L. (2003). The design and implementation of an input/output subsystem for a Farsi language search engine. [Unpublished bachelor’s thesis]. University of Nevada
Yousef, D. (2018). Persian: A Comprehensive Grammar. New York, NY: Routledge
Zipf, G. (1935). The Psychobiology of Language. Houghton-Mifflin
Zorzi, M., Houghton, G., & Butterworth, B. (1998). Two routes or one in reading aloud? A connectionist dual-process model. Journal of Experimental Psychology: Human Perception & Performance, 24, 1131–1161
Author information
Authors and Affiliations
Contributions
FM co-designed the experiment, collected the data, and co-wrote the manuscript. CW co-designed the experiment, analyzed the data, and co-wrote the manuscript. GH & HH provided technical assistance in collecting and analyzing the Persian corpora.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests or conflicts of interest.
Consent
This work was conducted in accord with a memorandum of understanding on Educational, Research and Technological Cooperation between the Faculty of Humanities at Persian Gulf University and the Department of Psychology at the University of Alberta, under the ethical oversight at Persian Gulf University. Participation in the study was voluntary. All participants gave informed consent to participate after having the experiment explained and being informed that their individual data would remain anonymous.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nemati, F., Westbury, C., Hollis, G. et al. The Persian Lexicon Project: minimized orthographic neighbourhood effects in a dense language. J Psycholinguist Res 51, 957–979 (2022). https://doi.org/10.1007/s10936-022-09863-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10936-022-09863-x