, Volume 39, Issue 4, pp 1025-1028

Spoken word frequency counts based on 1.6 million words in American English

Abstract

Written word frequency (e.g., Francis & Kucera, 1982; Kucera & Francis, 1967) constitutes a popular measure of word familiarity, which is highly predictive of word recognition. Far less often, researchers employ spoken frequency counts in their studies. This discrepancy can be attributed most readily to the conspicuous absence of a sizeable spoken frequency count for American English. The present article reports the construction of a 1.6-million-word spoken frequency database derived from the Michigan Corpus of Academic Spoken English (Simpson, Swales, & Briggs, 2002). We generated spoken frequency counts for 34,922 words and extracted speaker attributes from the source material to generate relative frequencies of words spoken by each speaker category. We assess the predictive validity of these counts, and discuss some possible applications outside of word recognition studies.