Skip to main content
Log in

The Persian Lexicon Project: minimized orthographic neighbourhood effects in a dense language

  • Published:
Journal of Psycholinguistic Research Aims and scope Submit manuscript

Abstract

In recent years large datasets of lexical processing times have been released for several languages, including English, French, Spanish, and Dutch. Such datasets have enabled us to study, compare, and model the global effects of many psycholinguistic measures such as word frequency, orthographic neighborhood (ON) size, and word length. We have compiled and publicly released a frequency and ON dictionary of 64,546 words and 1800 plausible NWs from a language that has been relatively little studied by psycholinguists: Persian. We have also collected visual lexical decision reaction times for 1800 Persian words and nonwords. Persian offers an interesting psycholinguistic environment for several reasons, including that it has few long words and has resultantly dense orthographic neighborhoods. These characteristics provide us with an opportunity to contrast how these factors affect lexical access by comparing them to several other languages. The results suggest that sensitivity to word length and orthographic neighbourhood may reflect the statistical structure of a particular language, rather than being a universal element of lexical processing. The dictionary and LDRT data are available from https://osf.io/tb4m6/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data and materials for all experiments are available at https://osf.io/tb4m6/.

References

  • Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B. … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459

    Article  Google Scholar 

  • Baluch, B. (1993). Lexical decisions in Persian: A test of the orthographic depth hypothesis. International Journal of Psychology, 28, 19–29

    Article  Google Scholar 

  • Baluch, B., & Besner, D. (1991). Visual word recognition: Evidence for strategic control of lexical and nonlexical routines in oral reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17(4), 644

    Google Scholar 

  • Bakhtiar, M., & Weekes, B. (2015). Lexico-semantic effects on word naming in Persian: Does age of acquisition have an effect? Memory & Cognition, 43(2), 298–313

    Article  Google Scholar 

  • Boudelaa, S., & Marslen-Wilson, W. D. (2010). Aralex: A lexical database for modern standard Arabic. Behavior Research Methods, 42(2), 481–487

    Article  Google Scholar 

  • Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441

    PubMed  Google Scholar 

  • Butler, B., & Hains, S. (1979). Individual differences in word recognition latency. Memory and Cognition, 7, 68–76

    Article  Google Scholar 

  • Baayen, R. H., Dijkstra, T., & Schreuder, R. (1997). Singulars and plurals in Dutch: Evidence for a parallel dual route model. Journal of Memory and Language, 36, 94–117

    Article  Google Scholar 

  • De Jong, N. H., Schreuder, R., & Baayen, H. R. (2000). The morphological family size effect and morphology. Language and Cognitive Processes, 15(4–5), 329–365

    Article  Google Scholar 

  • Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204–256

    Article  Google Scholar 

  • Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F. … Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), p. 8440–8451, July 2020

  • Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A. … Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42(2), 488–496

    Article  Google Scholar 

  • Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of experimental psychology: General, 113(2), 256

    Article  Google Scholar 

  • Goldhahn, D., Eckart, T., & Quasthoff, U. (2012). Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the 8th International Language Resources and Evaluation (LREC’12)

  • Grainger, J., & Jacobs, A. M. (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518–565

    Article  Google Scholar 

  • Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304

    Article  Google Scholar 

  • Majliš, M., & Žabokrtský, Z. (2011). Language Richness of the Web. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation [LREC ‘12]. Downloaded from: http://lrec-conf.org/proceedings/lrec2012/index.html

  • Momenian, M., Nilipour, R., & Oghabian, M. (2015). Age/order of vocabulary acquisition effects in the foreign language: A lexical decision task. Language Related Research, 5(5), 229–250

    Google Scholar 

  • Morrison, C. M., & Ellis, A. W. (1995). The roles of word frequency and age of acquisition in word naming and lexical decision. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 116–133

    Google Scholar 

  • Morrison, C. M., & Ellis, A. W. (2000). Real age of acquisition effects in word naming and lexical decision. British Journal of Psychology, 91(2), 167–180

    Article  Google Scholar 

  • Mokhlesin, M., Ahadi, H., Bakhtiari, J., Ahmadizadeh, Z., & Kasbi, F. (2015). Persian norms for affective dimensions and lexico-semantic features of words. Koomesh, 17(1), 60–76

  • Najafi, A. (2014). A psycholinguistic view of recognition of derived words in Persian mental lexicon and its application in term selection. Journal of Language Research (Faculty of Letters and Humanities, University of Tehran), 4(2), 181–198

    Google Scholar 

  • Oroumchian, F., Tasharofi, S., Amiri, H., Hojjat, H., & Raja, F. (2006). Creating a Feasible Corpus for Persian POS Tagging. Technical Report, no. TR3/06, University of Wollongong in Dubai

  • Peirce, J. W., Gray, J. R., Simpson, S., MacAskill, M. R., Höchenberger, R., Sogo, H. … Lindeløv, J. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51, 195–203.https://doi.org/10.3758/s13428-018-01193-y

    Article  PubMed  PubMed Central  Google Scholar 

  • Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56–115

    Article  Google Scholar 

  • Razi, F. (1987). Farhang-e vazhehaye Farsi-ye sare baraye vazhehaye Arabi dar Farsi-ye Mo’aser. Markaz: Tehran

    Google Scholar 

  • Seidenberg, M. S., & McClelland, J. L. (1989). A distributed developmental model of word recognition and naming. Psychological Review, 96, 523–568

    Article  Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423

    Article  Google Scholar 

  • Taft, M. (1979). Recognition of affixed words and the word frequency effect. Memory and Cognition, 7, 263–272

    Article  Google Scholar 

  • Taghva, K., Young, R., Coombs, J., Pereda, R., Beckley, R., & Sadeh, M. (2003, April). Farsi searching and display technologies. In: Proceedings of the 2003 Symposium on Document Image Understanding Technology (pp. 41–46)

  • Vandierendonck, A. (2017). A comparison of methods to combine speed and accuracy measures of performance: A rejoinder on the binning procedure. Behavior Research Methods, 49(2), 653–673

    Article  Google Scholar 

  • Wenzek, G., Lachaux, M., Conneau, A., Chaudhary, V., Guzmán, F., Joulin, A., & Grave, E. (2020). CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC), p. 4003–4012, May 2020

  • Westbury, C., Hollis, G., & Shaoul, C. (2007). LINGUA: The language-independent neighbourhood generator of the University of Alberta. The Mental Lexicon, 2(2), 271–284

    Article  Google Scholar 

  • Westbury, C. (2014). You can’t drink a word: Lexical and individual emotionality affect subjective familiarity judgments. Journal of Psycholinguistic Research, 43(5), 631–649

    Article  Google Scholar 

  • Whaley, C. P. (1978). Word-non-word classification time. Journal of Verbal Learning and Verbal Behavior, 17, 143–154

    Article  Google Scholar 

  • Windfuhr, G. (Ed.). (2009). The Iranian Languages. Psychology Press

  • Yap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods, 42(4), 992–1003

    Article  Google Scholar 

  • Young, R. L. (2003). The design and implementation of an input/output subsystem for a Farsi language search engine. [Unpublished bachelor’s thesis]. University of Nevada

  • Yousef, D. (2018). Persian: A Comprehensive Grammar. New York, NY: Routledge

    Book  Google Scholar 

  • Zipf, G. (1935). The Psychobiology of Language. Houghton-Mifflin

  • Zorzi, M., Houghton, G., & Butterworth, B. (1998). Two routes or one in reading aloud? A connectionist dual-process model. Journal of Experimental Psychology: Human Perception & Performance, 24, 1131–1161

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

FM co-designed the experiment, collected the data, and co-wrote the manuscript. CW co-designed the experiment, analyzed the data, and co-wrote the manuscript. GH & HH provided technical assistance in collecting and analyzing the Persian corpora.

Corresponding author

Correspondence to Fatemeh Nemati.

Ethics declarations

Conflicts of interest

The authors have no competing interests or conflicts of interest.

Consent

This work was conducted in accord with a memorandum of understanding on Educational, Research and Technological Cooperation between the Faculty of Humanities at Persian Gulf University and the Department of Psychology at the University of Alberta, under the ethical oversight at Persian Gulf University. Participation in the study was voluntary. All participants gave informed consent to participate after having the experiment explained and being informed that their individual data would remain anonymous.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nemati, F., Westbury, C., Hollis, G. et al. The Persian Lexicon Project: minimized orthographic neighbourhood effects in a dense language. J Psycholinguist Res 51, 957–979 (2022). https://doi.org/10.1007/s10936-022-09863-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10936-022-09863-x

Keywords

Navigation