Compiling a Corpus-Based List of Words Commonly Mispronounced

  • Magdalena ZającEmail author
Part of the Second Language Learning and Teaching book series (SLLT)


The inspiration for the paper was professor Sobkowiak’s list of Words Commonly Mispronounced (Sobkowiak, 2001), a collection of over six hundred pronunciation errors that are habitually made by Polish learners of English. The paper explores the ways in which lists such as Words Commonly Mispronounced could be “upgraded” using corpus linguistic tools. The paper describes the results obtained in a previous study (Zając & Pęzik, 2012), whose aim was to compile a corpus-based index of frequent mispronunciations in the speech of Polish learners of English and which used data from the spoken component of the Polish Learner English Corpus PLEC. The paper discusses the list obtained by Zając and Pęzik, describes and evaluates the process of creating the list, and compares the corpus-based index with Words Commonly Mispronounced. The difficulties related to the compilation of lists of common mispronunciations (both corpus-based and “traditional”) are also examined. The general conclusion that can be drawn from the analysis is that employing corpus linguistic tools to examine L2 pronunciation errors may enable one to create a thorough and reliable collection of commonly mispronounced words, which can constitute an effective and powerful tool in pronunciation teaching and learning. At the same time, careful examination of the corpus-based list and the process of its creation reveal that, just as in the case of compiling a list of common mispronunciations using “traditional” methods, creating a corpus-based index of pronunciation errors entails certain problems that need to be addressed when attempting to produce such a list.


Function Word Regular Feature Unstressed Syllable British National Corpus Regional Accent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The study by Zając & Pęzik (2012) is part of a research project funded in the years 2010-2012 by a grant from the Polish Ministry of Science and Higher Education (N N104 205039).


  1. Archibald, J. (1997). The acquisition of English stress by speakers of nonaccentual languages: lexical storage versus computation of stress. Linguistics, 35, 167-181.Google Scholar
  2. Barańska, A. (2011). The examination of the vowel length in the pronunciation of the English adjectives with -able, -ate, -ative suffixes. Gavagai Journal, 1, 3-16.Google Scholar
  3. Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8, 243-257.Google Scholar
  4. Davies, M. (2008). The Corpus of Contemporary American English: 450 million words, 1990-present. Accessed 28 April 2014.
  5. Lindsey, G. (2012). The British English vowel system. Speech Talk. Thoughts on English, speech & language. Accessed 28 April 2014.
  6. Matysiak, A. (2012). Is English word stress beyond the scope of Polish advanced learners of English? – different strategies of indicating the stress position within multi-syllable words. Gavagai Journal, 2, 22-33.Google Scholar
  7. Piske, T., Flege, J. E., MacKay, I., & Meador, D. (2002). The Production of English Vowels by Early and Late Italian-English Bilinguals. Phonetica, 59, 49-71.Google Scholar
  8. Sobkowiak, W. (2001). English phonetics for Poles. (2nd ed.) Poznań: Wydawnictwo Poznańskie.Google Scholar
  9. Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. Proceedings of the 6 th International Conference on Language Resources and Evaluation (LREC 2008).Google Scholar
  10. Szpyra-Kozłowska, J., & Stasiak, S. (2010). From focus on sounds to focus on words in English pronunciation instruction. Research in Language, 8, 1-12, Google Scholar
  11. The British National Corpus, version 3 (BNC XML Edition) (2007). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Accessed 28 April 2014.
  12. Waniek-Klimczak, E. (2002). How to predict the unpredictable – English word stress from a Polish perspective. In E. Waniek-Klimczak, & P. J. Melia (Eds.), Accents and speech in teaching English phonetics and phonology (pp. 221-242). Berlin: Peter Lang.Google Scholar
  13. Wells, J. C. (2005). Goals in teaching English pronunciation. In K. Dziubalska-Kołaczyk, & J. Przedlacka (Eds.), English pronunciation models: A changing scene (pp. 101-112). Bern: Peter Lang.Google Scholar
  14. Wells, J. C. (2008). Longman Pronunciation Dictionary. Harlow: Pearson Education Limited.Google Scholar
  15. Zając, M., & Pęzik, P. (2012). Developing a corpus-based index of commonly mispronounced words. Paper presented at Accents 2012 - VI th International Conference on Native and Non-native Accents of English, Łódź, Poland, 6-8 December, 2012.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of ŁódźŁódźPoland

Personalised recommendations