Using Internet search engines to estimate word frequency

  • Irene V. Blair
  • Geoffrey R. Urland
  • Jennifer E. Ma
Article

Abstract

The present research investigated Internet search engines as arapid, cost-effective alternative for estimating word frequencies. Frequency estimates for 382 words were obtained and compared across four methods: (1) Internet search engines, (2) the Kučera and Francis (1967) analysis of a traditional linguistic corpus, (3) the CELEX English linguistic database (Baayen, Piepenbrock, & Gulikers, 1995), and (4) participant ratings of familiarity. The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Additional results suggested that Internet search engines are an excellent option when traditional word frequency analyses do not contain the necessary data (e.g., estimates for forenames and slang). In contrast, participants’ familiarity judgments did not correspond well with the more objective estimates of word frequency. Researchers are advised to use search engines with large databases (e.g., AltaVista) to ensure the greatest representativeness of the frequency estimates.

References

  1. AllSearchEngines.Com homepage (May, 2000). Available: http://www.allsearchengines.com.Google Scholar
  2. Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.Google Scholar
  3. Balota, D. A., &Rayner, K. (1991). Word recognition processes in foveal and parafoveal vision: The range of influence of lexical variables. In D. Besner & G. W. Humphreys (Eds.),Basic processes in reading: Visual word recognition (pp. 198–232). Hillsdale, NJ: Erlbaum.Google Scholar
  4. Blair, I. V., &Banaji, M. R. (1996). Automatic and controlled processes in stereotype priming.Journal of Personality & Social Psychology,70, 1142–1163.CrossRefGoogle Scholar
  5. Brysbaert, M., Lange, M., &Wijnendaele, I. V. (2000). The effects of age-of-acquisition and frequency-of-occurrence in visual word recognition: Further evidence from the Dutch language.European Journal of Cognitive Psychology,12, 65–85.CrossRefGoogle Scholar
  6. Chalmers, K. A., Humphreys, M. S., &Dennis, S. (1997). A naturalistic study of the word frequency effect in episodic recognition.Memory & Cognition,25, 780–784.CrossRefGoogle Scholar
  7. Chomsky, N. (1965)Aspects of the theory of syntax. Cambridge, MA: MIT Press.Google Scholar
  8. Dasgupta, N., McGhee, D. E., Greenwald, A. G., &Banaji, M. R. (2000). Automatic preference for White Americans: Eliminating the familiarity explanation.Journal of Experimental Social Psychology,36, 316–328.CrossRefGoogle Scholar
  9. Devine, P. G. (1989). Stereotypes and prejudice: Their automatic and controlled components.Journal of Personality & Social Psychology,56, 680–690.CrossRefGoogle Scholar
  10. Francis, W. N., &Kucera, H. (1982).Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.Google Scholar
  11. Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy.Journal of Experimental Psychology: General,113, 256–281.CrossRefGoogle Scholar
  12. Greenwald, A. G., McGhee, D. E., &Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test.Journal of Personality & Social Psychology,74, 1464–1480.CrossRefGoogle Scholar
  13. Judd, G. M., &McClelland, G. H. (1989).Data analysis: A model comparison approach. San Diego: Harcourt Brace Jovanovich.Google Scholar
  14. Kansas City Public Library (2000, March).Introduction to search engines. Available: http://www.kcpl.lib.mo.us/search.Google Scholar
  15. Kasof, J. (1993). Sex bias in the naming of stimulus persons.Psychological Bulletin,113, 140–163.PubMedCrossRefGoogle Scholar
  16. Kučera, H., &Francis, W. N. (1967).Computationalanalysis of presentday American English. Providence, RI: Brown University Press.Google Scholar
  17. Leita, C. (2000, May). InfoPeople Search Tools Chart. Available: 2000 InFoPeople Project at http://infopeople.org/src/chart.html.Google Scholar
  18. McEnery, T., &Wilson, A. (1996).Corpus linguistics. Edinburgh: Edinburgh University Press.Google Scholar
  19. New, B., Pallier, C., Ferrand, L., & Matos, R. (in press). Une base de données lexicales du français contemporain sur internet: LEXIQUE [A lexical database of contemporary French on the Internet: LEXIQUE],L’Année Psychologique.Google Scholar
  20. Peterzell, D. H., Sinclair, G. P., Healy, A. F., &Bourne, L. E. (1990). Identification of letters in the predesignated target paradigm: A word superiority effect for the common wordthe.American Journal of Psychology,103, 299–315.PubMedCrossRefGoogle Scholar
  21. Rubenstein, H., Garfield, L., &Millikan, J. A. (1970). Homographic entries in the internal lexicon.Journal of Verbal Learning & Verbal Behavior,9, 487–494.CrossRefGoogle Scholar
  22. Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words (3rd ed.). New York: Columbia University, Teachers College Press.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2002

Authors and Affiliations

  • Irene V. Blair
    • 1
  • Geoffrey R. Urland
    • 1
  • Jennifer E. Ma
    • 2
  1. 1.Department of PsychologyUniversity of ColoradoBoulder
  2. 2.University of KansasLawrence

Personalised recommendations