Advertisement

Behavior Research Methods

, Volume 50, Issue 4, pp 1461–1481 | Cite as

Procura-PALavras (P-PAL): A Web-based interface for a new European Portuguese lexical database

  • Ana Paula SoaresEmail author
  • Álvaro Iriarte
  • José João de Almeida
  • Alberto Simões
  • Ana Costa
  • João Machado
  • Patrícia França
  • Montserrat Comesaña
  • Andreia Rauber
  • Anabela Rato
  • Manuel Perea
Article

Abstract

In this article, we present Procura-PALavras (P-PAL), a Web-based interface for a new European Portuguese (EP) lexical database. Based on a contemporary printed corpus of over 227 million words, P-PAL provides a broad range of word attributes and statistics, including several measures of word frequency (e.g., raw counts, per-million word frequency, logarithmic Zipf scale), morpho-syntactic information (e.g., parts of speech [PoSs], grammatical gender and number, dominant PoS, and frequency and relative frequency of the dominant PoS), as well as several lexical and sublexical orthographic (e.g., number of letters; consonant–vowel orthographic structure; density and frequency of orthographic neighbors; orthographic Levenshtein distance; orthographic uniqueness point; orthographic syllabification; and trigram, bigram, and letter type and token frequencies), and phonological measures (e.g., pronunciation, number of phonemes, stress, density and frequency of phonological neighbors, transposed and phonographic neighbors, syllabification, and biphone and phone type and token frequencies) for ~53,000 lemmatized and ~208,000 nonlemmatized EP word forms. To obtain these metrics, researchers can choose between two word queries in the application: (i) analyze words previously selected for specific attributes and/or lexical and sublexical characteristics, or (ii) generate word lists that meet word requirements defined by the user in the menu of analyses. For the measures it provides and the flexibility it allows, P-PAL will be a key resource to support research in all cognitive areas that use EP verbal stimuli. P-PAL is freely available at http://p-pal.di.uminho.pt/tools.

Keywords

Lexical databases Word frequency Orthographic word statistics Phonological word statistics European Portuguese 

References

  1. Adelman, J. S., & Brown, G. D. A. (2007). Phonographic neighbors, not orthographic neighbors, determine word naming latencies. Psychonomic Bulletin & Review, 14, 455–459.  https://doi.org/10.3758/BF03194088 CrossRefGoogle Scholar
  2. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823.  https://doi.org/10.1111/j.1467-9280.2006.01787.x CrossRefPubMedGoogle Scholar
  3. Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11, 295–328.Google Scholar
  4. Baayen, R. H., Dijkstra, T., & Schreuder, R. (1997). Singulars and plurals in Dutch: Evidence for a parallel dual-route model. Journal of Memory and Language, 37, 94–117.  https://doi.org/10.1006/jmla.1997.2509 CrossRefGoogle Scholar
  5. Baayen, R. H., Feldman, L. B., & Schreuder, R. (2006). Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language, 55, 290–313. doi: https://doi.org/10.1016/j.jml.2006.03.008 CrossRefGoogle Scholar
  6. Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX Lexical Database (CD-ROM). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.Google Scholar
  7. Balota, D., Yap, M., & Cortese, M. (2006). Visual word recognition: The journey from features to meaning (a travel update). In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed., pp. 285–375). Amsterdam, The Netherlands: Academic Press.CrossRefGoogle Scholar
  8. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., . . .Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.  https://doi.org/10.3758/BF03193014 CrossRefPubMedGoogle Scholar
  9. Bédard, P., Audet, A.-M., Drouin, P., Roy, J.-P., Rivard, J., & Tremblay, P. (2017). SyllabO+: A new tool to study sublexical phenomena in spoken Québec French. Behavior Research Methods, 49, 1852–1863.  https://doi.org/10.3758/s13428-016-0829-7 CrossRefPubMedGoogle Scholar
  10. Boudelaa, S., & Marslen-Wilson, W. D. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42, 481–487.  https://doi.org/10.3758/BRM.42.2.481 CrossRefPubMedGoogle Scholar
  11. Breland, H. M. (1996). Word frequency and word difficulty: A comparison of counts in four corpora. Psychological Science, 7, 96–99.CrossRefGoogle Scholar
  12. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412–424.  https://doi.org/10.1027/1618-3169/a000123 CrossRefPubMedGoogle Scholar
  13. Brysbaert, M., & Diependaele, K. (2013). Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice. Behavior Research Methods, 45, 422–430.  https://doi.org/10.3758/s13428-012-0270-5 CrossRefPubMedGoogle Scholar
  14. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. doi: https://doi.org/10.3758/BRM.41.4.977 CrossRefPubMedGoogle Scholar
  15. Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44, 991–997.  https://doi.org/10.3758/s13428-012-0190-4 CrossRefPubMedGoogle Scholar
  16. Campos, A. D., Mendes Oliveira, H., & Soares, A. P. (2018). The role of syllables in intermediate-depth stress-timed languages: Masked priming evidence in European Portuguese. Reading and Writing, 31, 1209–1229.  https://doi.org/10.1007/s11145-018-9835-8 CrossRefGoogle Scholar
  17. Casteleiro, J. M. (2001). Dicionário da língua portuguesa contemporânea da Academia das Ciências de Lisboa [Dictionary of the Portuguese contemporary language of the Lisbon Academy of Sciences]. Lisbon, Portugal: Academia das Ciências de Lisboa/Editorial Verbo.Google Scholar
  18. Chetail, F., & Mathey, S. (2010). InfoSyll: A syllabary providing statistical information on phonological and orthographic syllables. Journal of Psycholinguistic Research, 39, 485–504.CrossRefPubMedGoogle Scholar
  19. Coltheart, M. (1981). The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology, 33A, 497–505.  https://doi.org/10.1080/14640748108400805 CrossRefGoogle Scholar
  20. Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI (pp. 535–555). Hillsdale, NJ: Erlbaum.Google Scholar
  21. Davis, C. J. (2005). N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics. Behavior Research Methods, 37, 65–70.  https://doi.org/10.3758/BF03206399 CrossRefPubMedGoogle Scholar
  22. Davis, C. J., & Andrews, S. (2001). Inhibitory effects of transposed-letter similarity for words and non-words of different lengths. Australian Journal of Psychology, 53, 50.Google Scholar
  23. Davis, C. J., & Perea, M. (2005). BuscaPalabras: A program for deriving orthographic and phonological neighborhood statistics and other psycholinguistic indices in Spanish. Behavior Research Methods, 37, 665–671.  https://doi.org/10.3758/BF03192738 CrossRefPubMedGoogle Scholar
  24. Davis, C. J., Perea, M., & Acha, J. (2009). Re(de)fining the orthographic neighbourhood: The role of addition and deletion neighbours in lexical decision and reading. Journal of Experimental Psychology: Human Perception and Performance, 35, 1550–1570.  https://doi.org/10.1037/a0014253 PubMedGoogle Scholar
  25. Duchon, A., Perea, M., Sebastián-Gallés, N., Martí, A., & Carreiras, M. (2013). EsPal: One-stop shopping for Spanish word properties. Behavior Research Methods, 45, 1246–1258.  https://doi.org/10.3758/s13428-013-0326-1 CrossRefPubMedGoogle Scholar
  26. Duñabeitia, J. A., Cholin, J., Corral, J., Perea, M., & Carreiras, M. (2010). SYLLABARIUM: An online application for deriving complete statistics for Basque and Spanish orthographic syllables. Behavior Research Methods, 42, 118–125.  https://doi.org/10.3758/BRM.42.1.118 CrossRefPubMedGoogle Scholar
  27. Forster, K. I. (2000). The potential for experimenter bias effects in word recognition experiments. Memory & Cognition, 28, 1109–1115.  https://doi.org/10.3758/BF03211812 CrossRefGoogle Scholar
  28. Frost, R., Katz, L., & Bentin, S. (1987). Strategies for visual word recognition and orthographical depth: A multilingual comparison. Journal of Experimental Psychology: Human Perception and Performance, 13, 104–115.  https://doi.org/10.1037/0096-1523.13.1.104 PubMedGoogle Scholar
  29. Gomes, I., & Castro, S. L. (2003). Porlex, a lexical database in European Portuguese. Psychologica, 32, 91–108.Google Scholar
  30. Goswami, U., Ziegler, J. C., Dalton, L., & Schneider, W. (2001). Pseudohomophone effects and phonological recoding procedures in reading development in English and German. Journal of Memory and Language, 45, 648–664.  https://doi.org/10.1006/jmla.2001.2790 CrossRefGoogle Scholar
  31. Grainger J., & Ziegler, J. C. (2011). A dual-route approach to orthographic processing. Frontiers in Psychology, 2, 54.  https://doi.org/10.3389/fpsyg.2011.00054 CrossRefPubMedPubMedCentralGoogle Scholar
  32. Grzybek, P. (2006). History and methodology of word length studies: The state of the art. In P. Grzybek (Ed.), Contributions to the science of text and language: Word length studies and related issues (pp. 15–90). Dordrecht, The Netherlands: Springer.CrossRefGoogle Scholar
  33. Hand, C. J., O’Donnell, P. J., & Sereno, S. C. (2012). Word-initial letters influence fixation durations during fluent reading. Frontiers in Psychology, 3, 85:1–19.  https://doi.org/10.3389/fpsyg.2012.00085 Google Scholar
  34. Heister, J., Würzner, K.-M., Bubenzer, J., Pohl, E., Hanneforth, T., Geyken, A., & Kliegl, R. (2011). dlexDB—Eine lexikalische Datenbank für die psychologische und linguistische Forschung. Psychologische Rundschau, 62, 10–20.  https://doi.org/10.1026/0033-3042/a000029 CrossRefGoogle Scholar
  35. Hermena, E. W., Liversedge, S. P., & Drieghe, D. (2016). Parafoveal processing of Arabic diacritical marks. Journal of Experimental Psychology: Human Perception and Performance, 42, 2021–2038.PubMedGoogle Scholar
  36. Hofmann, M. J., Stenneken, P., Conrad, M., & Jacobs, A. (2007). Sublexical frequency measures for orthographic and phonological units in German. Behavior Research Methods, 39, 620–629.CrossRefPubMedGoogle Scholar
  37. Johnson, N. E., & Pugh, K. R. (1994). A cohort model of visual word recognition. Cognitive Psychology, 26, 240–346.CrossRefPubMedGoogle Scholar
  38. Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new measure for Dutch words frequency based on film subtitles. Behavior Research Methods, 42, 643–650.  https://doi.org/10.3758/BRM.42.3.643 CrossRefPubMedGoogle Scholar
  39. Ktori, M., van Heuven, W. J. B., & Pitchford, N. J. (2008). GreekLex: A lexical database of Modern Greek. Behavior Research Methods, 40, 773–783.  https://doi.org/10.3758/BRM.40.3.773 CrossRefPubMedGoogle Scholar
  40. Kwantes, P. J., & Mewhort, D. J. K. (1999). Evidence for sequential processing in visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 25, 376–381.  https://doi.org/10.1037/0096-1523.25.2.376 Google Scholar
  41. Kyparissiadis, A., van Heuven, W. J. B., Pitchford, N. J., & Ledgeway, T. (2017). GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information. PLoS ONE, 12, e0172493.  https://doi.org/10.1371/journal.pone.0172493 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19, 1–36.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Mathey, S., & Zagar, D. (2000). The neighborhood distribution effect in visual word recognition: Words with single and twin neighbors. Journal of Experimental Psychology: Human Perception and Performance, 26, 184–205.  https://doi.org/10.1037/0096-1523.26.1.184 PubMedGoogle Scholar
  44. Miller, B., Juhasz, B. J., & Rayner, K. (2006). The orthographic uniqueness point and eye movements during reading. British Journal of Psychology, 97, 191–216.CrossRefPubMedGoogle Scholar
  45. Nascimento, M. F. B., Marques M. L. G., & Cruz, M. L. S. (1987). Português fundamental: Métodos e documentos. Vol. II, Tomo I: Inquérito de frequência [Basic Portuguese: Methods and documents. Vol. II, Tomo I: Frequency survey]. Lisbon, Portugal: INIC, Centro de Linguística da Universidade de Lisboa.Google Scholar
  46. Nascimento, M. F. B., Pereira, L. A. S., & Saramago, J. (2000). Portuguese Corpora at CLUL. In Proceedings of the Second International Conference on Language Resources and Evaluation (Vol. II, pp. 1603–1607). Athens, Greece: European Language Resources Association.Google Scholar
  47. Nascimento, M. F. B., Rivenc, M. L. P., & Cruz, M. L. S. (1987). Português fundamental: Métodos e documentos. Vol. II, Tomo II: Inquérito de disponibilidade [Basic Portuguese: Methods and documents. Vol. II, Tomo II: Availability survey]. Lisbon, Portugal: INIC, Centro de Linguística da Universidade de Lisboa.Google Scholar
  48. New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661–677.  https://doi.org/10.1017/S014271640707035X CrossRefGoogle Scholar
  49. New, B., & Grainger, J. (2011). On letter frequency effects. Acta Psychologica, 138, 322–328.  https://doi.org/10.1016/j.actpsy.2011.07.001 CrossRefPubMedGoogle Scholar
  50. New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36, 516–524.  https://doi.org/10.3758/BF03195598 CrossRefGoogle Scholar
  51. New, B., & Spinelli, E. (2013). Diphones-fr: A French database of diphones positional frequency. Behavior Research Methods, 45, 758–764.CrossRefPubMedGoogle Scholar
  52. Parmentier, F. B. R., Comesaña, M., & Soares, A. P. (2017). Disentangling the effects of word frequency and contextual diversity on serial recall performance. Quarterly Journal of Experimental Psychology, 70, 1–17.  https://doi.org/10.1080/17470218.2015.1105268 CrossRefGoogle Scholar
  53. Peereman, R., & Content, A. (1997). Orthographic and phonological neighborhoods in naming: Not all neighbors are equally influential in orthographic space. Journal of Memory and Language, 37, 382–410.CrossRefGoogle Scholar
  54. Perea, M., Soares, A. P., & Comesaña, M. (2013). Contextual diversity is a main determinant of word-identification times in young readers. Journal of Experimental Child Psychology, 116, 37–44.CrossRefPubMedGoogle Scholar
  55. Perea, M., Urkia, M., Davis, C. J., Agirre, A., Laseka, E., & Carreiras, M. (2006). E-Hitz: A word frequency list and a program for deriving psycholinguistic statistics in an agglutinative language (Basque). Behavior Research Methods, 38, 610–615.  https://doi.org/10.3758/BF03193893 CrossRefPubMedGoogle Scholar
  56. Sebastián-Gallés, N., Martí, M. A., Cuetos, F., & Carreiras, M. (2000). LEXESP: Una base de datos informatizado del español (Spanish Computerized Lexicon). Barcelona, Spain: Ediciones de la Universitat de Barcelona.Google Scholar
  57. Simões, A. M., & Almeida, J. J. (2001). Jspell: Um módulo de análise morfológica para uso em Processamento de Linguagem Natural. In A. Gonçalves & C. N. Correia (Eds.), Actas do Encontro Nacional da Associação Portuguesa de Linguística (pp. 485-495). Lisboa, Portugal: Associação Portuguesa de Linguística.Google Scholar
  58. Sinclair, J. (2005). Corpus and text: Basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 1–16). Oxford, UK: Oxbow.Google Scholar
  59. Soares, A. P., Iriarte, A., Almeida, J. J., Simões, A., Costa, A., França, P., … Comesaña, M. (2014). Procura-PALavras (P-PAL): Uma nova medida de frequência lexical do Português Europeu contemporâneo [Procura-PALavras (P-PAL): A new measure of word frequency for contemporary European Portuguese]. Psicologia: Reflexão e Crítica, 27, 1–14.Google Scholar
  60. Soares, A. P., Machado, J., Costa, A., Iriarte, A., Simões, A., de Almeida, J. J., … Perea, M. (2015). On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese. Quarterly Journal of Experimental Psychology, 68, 680–696.  https://doi.org/10.1080/17470218.2014.964271
  61. Thorndike, E. L. (1921). The teacher’s word book. New York, NY: Teachers College, Columbia University.Google Scholar
  62. van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190.  https://doi.org/10.1080/17470218.2013.850521 CrossRefGoogle Scholar
  63. Yap, M., & Balota, D. (2015). Visual word recognition. In A. Pollastsek & R. Treiman (Eds.), The Oxford handbook of reading (pp. 26–43). New York, NY: Oxford University Press.Google Scholar
  64. Yarkoni, T., Balota, D., & Yap, M. J. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971–979.  https://doi.org/10.3758/PBR.15.5.971 CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Ana Paula Soares
    • 1
    Email author
  • Álvaro Iriarte
    • 2
  • José João de Almeida
    • 3
  • Alberto Simões
    • 2
    • 3
  • Ana Costa
    • 1
  • João Machado
    • 1
  • Patrícia França
    • 1
  • Montserrat Comesaña
    • 1
  • Andreia Rauber
    • 1
    • 4
  • Anabela Rato
    • 1
    • 5
  • Manuel Perea
    • 6
  1. 1.Human Cognition Lab, CIPsi, School of PsychologyUniversity of MinhoBragaPortugal
  2. 2.Centre for Humanistic StudiesUniversity of MinhoBragaPortugal
  3. 3.Computer Science and Technology CenterUniversity of MinhoBragaPortugal
  4. 4.Computational Linguistics DepartmentUniversity of TübingenTübingenGermany
  5. 5.Department of Spanish & PortugueseUniversity of TorontoTorontoCanada
  6. 6.Department of MethodologyUniversity of ValènciaValenciaSpain

Personalised recommendations