Abstract
The present research investigated Internet search engines as arapid, cost-effective alternative for estimating word frequencies. Frequency estimates for 382 words were obtained and compared across four methods: (1) Internet search engines, (2) the Kučera and Francis (1967) analysis of a traditional linguistic corpus, (3) the CELEX English linguistic database (Baayen, Piepenbrock, & Gulikers, 1995), and (4) participant ratings of familiarity. The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Additional results suggested that Internet search engines are an excellent option when traditional word frequency analyses do not contain the necessary data (e.g., estimates for forenames and slang). In contrast, participants’ familiarity judgments did not correspond well with the more objective estimates of word frequency. Researchers are advised to use search engines with large databases (e.g., AltaVista) to ensure the greatest representativeness of the frequency estimates.
Article PDF
Similar content being viewed by others
References
AllSearchEngines.Com homepage (May, 2000). Available: http://www.allsearchengines.com.
Baayen, R. H., Piepenbrock, R., &Gulikers, L. (1995).The CELEX lexical database [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Balota, D. A., &Rayner, K. (1991). Word recognition processes in foveal and parafoveal vision: The range of influence of lexical variables. In D. Besner & G. W. Humphreys (Eds.),Basic processes in reading: Visual word recognition (pp. 198–232). Hillsdale, NJ: Erlbaum.
Blair, I. V., &Banaji, M. R. (1996). Automatic and controlled processes in stereotype priming.Journal of Personality & Social Psychology,70, 1142–1163.
Brysbaert, M., Lange, M., &Wijnendaele, I. V. (2000). The effects of age-of-acquisition and frequency-of-occurrence in visual word recognition: Further evidence from the Dutch language.European Journal of Cognitive Psychology,12, 65–85.
Chalmers, K. A., Humphreys, M. S., &Dennis, S. (1997). A naturalistic study of the word frequency effect in episodic recognition.Memory & Cognition,25, 780–784.
Chomsky, N. (1965)Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Dasgupta, N., McGhee, D. E., Greenwald, A. G., &Banaji, M. R. (2000). Automatic preference for White Americans: Eliminating the familiarity explanation.Journal of Experimental Social Psychology,36, 316–328.
Devine, P. G. (1989). Stereotypes and prejudice: Their automatic and controlled components.Journal of Personality & Social Psychology,56, 680–690.
Francis, W. N., &Kucera, H. (1982).Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.
Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy.Journal of Experimental Psychology: General,113, 256–281.
Greenwald, A. G., McGhee, D. E., &Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test.Journal of Personality & Social Psychology,74, 1464–1480.
Judd, G. M., &McClelland, G. H. (1989).Data analysis: A model comparison approach. San Diego: Harcourt Brace Jovanovich.
Kansas City Public Library (2000, March).Introduction to search engines. Available: http://www.kcpl.lib.mo.us/search.
Kasof, J. (1993). Sex bias in the naming of stimulus persons.Psychological Bulletin,113, 140–163.
Kučera, H., &Francis, W. N. (1967).Computationalanalysis of presentday American English. Providence, RI: Brown University Press.
Leita, C. (2000, May). InfoPeople Search Tools Chart. Available: 2000 InFoPeople Project at http://infopeople.org/src/chart.html.
McEnery, T., &Wilson, A. (1996).Corpus linguistics. Edinburgh: Edinburgh University Press.
New, B., Pallier, C., Ferrand, L., & Matos, R. (in press). Une base de données lexicales du français contemporain sur internet: LEXIQUE [A lexical database of contemporary French on the Internet: LEXIQUE],L’Année Psychologique.
Peterzell, D. H., Sinclair, G. P., Healy, A. F., &Bourne, L. E. (1990). Identification of letters in the predesignated target paradigm: A word superiority effect for the common wordthe.American Journal of Psychology,103, 299–315.
Rubenstein, H., Garfield, L., &Millikan, J. A. (1970). Homographic entries in the internal lexicon.Journal of Verbal Learning & Verbal Behavior,9, 487–494.
Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words (3rd ed.). New York: Columbia University, Teachers College Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by NIH Grant MH63372 to I.V.B., a NSF Graduate Research Fellowship to G.R.U., and an NIH postdoctoral fellowship to J.E.M.
Rights and permissions
About this article
Cite this article
Blair, I.V., Urland, G.R. & Ma, J.E. Using Internet search engines to estimate word frequency. Behavior Research Methods, Instruments, & Computers 34, 286–290 (2002). https://doi.org/10.3758/BF03195456
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03195456