Using Technology for Pronunciation Teaching, Learning, and Assessment

  • Martha C. Pennington
  • Pamela Rogerson-Revell
Part of the Research and Practice in Applied Linguistics book series (RPAL)


Use of technologies to facilitate language learning, Computer-Assisted Language Learning (CALL) and Computer-Assisted Pronunciation Training (CAPT), are developing rapidly. The multimodality of new technologies and the capacity for individualized, customized, and self-paced “anytime-anywhere” study with automated feedback on performance is especially beneficial for pronunciation. The same technologies that have value for pronunciation teaching and learning can be applied to pronunciation assessment. A wide range of technologies and software have been designed specifically for pronunciation teaching/learning, and speech technologies can also be applied to pronunciation pedagogy and assessment, as well as to remediation in phonological impairment. We review these technologies critically in relation to available research with an eye to their shortcomings as well as their utility, effectiveness, and potential for learning, teaching, and testing pronunciation.


  1. Albertson, K. (1982). Teaching pronunciation with visual feedback. NALLD Journal, 17, 18–33.Google Scholar
  2. Alsabaan, M., & Ramsay, A. (2014). Diagnostic CALL tool for Arabic learners. In S. Jager, L. Bradley, E. J. Meima, & S. Thouësny (Eds.), CALL design: Principles and practice, Proceedings of the 2014 EUROCALL Conference, Groningen, The Netherlands (pp. 6–11). Dublin:
  3. Altman, J. (2013). Taming the dragon: Effective use of dragon naturally speaking speech recognition software as an avenue to universal access. Writing & Pedagogy, 5(2), 333–348. Scholar
  4. Balogh, J., Bernstein, J., Suzuki, M., & Lennig, M. (2011). Automatically scored spoken language tests for air traffic controllers and pilots. Paper presented at International Aviation Training Symposium 2006, Oklahoma City. Versant White Paper. Retrieved February 22, 2018, from
  5. Benoît, C., Mohammadi, T., & Kandel, S. (1994). Effects of phonetic context on audio-visual intelligibility of French. Journal of Speech and Hearing Research, 37(5), 1195–1203.
  6. Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing, 27(3), 355–377. Scholar
  7. Bosseler, A., & Massaro, D. W. (2003). Development and evaluation of a computer-animated tutor for vocabulary and language learning for children with autism. Journal of Autism and Developmental Disorders, 33(6), 653–672.
  8. Chou, F. (2005). Ya-Ya language box: A portable device for English pronunciation training with speech recognition technologies. Proceedings of Interspeech 2005, Lisbon, Portugal (pp. 169–172). Retrieved February 22, 2018, from
  9. Chun, D. M. (2013). Computer-assisted pronunciation teaching. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics. Oxford: Wiley-Blackwell. Scholar
  10. Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64.
  11. Coniam, D. (2002). Technology as an awareness-raising tool for sensitising teachers to features of stress and rhythm in English. Language Awareness, 11(1), 30–42. Scholar
  12. Cucchiarini, C., Neri, A., & Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51(10), 853–863. Scholar
  13. Cucchiarini, C. Strik, H., Binnenpoorte, D., & Boves, L. (2002). Pronunciation evaluation in read and spontaneous speech: A comparison between human ratings and automatic scores. In A. James & J. Leather (Eds.), New sounds 2000, Proceedings of the Fourth International Symposium on the Acquisition of Second-Language Speech, University of Amsterdam, September 2000 (pp. 72–79). Klagenfurt, Austria: University of Austria.Google Scholar
  14. Cucchiarini, C., Strik, H., & Boves, L. (2000a). Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30(2-s3), 109–119. Scholar
  15. Cucchiarini, C., Strik, H., & Boves, L. (2000b). Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America, 107(2), 989–999. Scholar
  16. de Bot, K. (1983). Visual feedback of intonation I: Effectiveness and induced practice behavior. Language and Speech, 26(4), 331–350.
  17. de Bot, K., & Mailfert, K. (1982). The teaching of intonation: Fundamental research and classroom applications. TESOL Quarterly, 16(1), 71–77.
  18. de Meo, A., Vitale, M., Pettorino, M., Cutugno, F., & Origlia, A. (2013). Imitation/self-imitation in computer assisted prosody training for Chinese learners of L2 Italian. In J. Levis & K. LeVelle (Eds.), Proceedings of the 4th pronunciation in second language learning and teaching conference, August 2012 (pp. 90–100). Ames, IA: Iowa State University.Google Scholar
  19. Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching and research. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  20. Derwing, T. M., Munro, M. J., & Carbonaro, M. D. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(4), 592–603.
  21. Díaz-Vera, J. (2012). Great expectations: Formalizing and transforming mobile-assisted language learning. In J. Díaz-Vera (Ed.), Left to my own devices: Learner autonomy and mobile-assisted language learning (pp. xi–xix). Bingley, UK; Leiden and Boston: Emerald; Brill.CrossRefGoogle Scholar
  22. Ehsani, F., & Knodt, E. (1998). Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Language Learning and Technology, 2(1), 54–73.
  23. Elliot, R. (1995). Foreign language phonology: Field independence, attitude and the success of formal instruction in Spanish pronunciation. Modern Language Journal, 79(4), 530–542. Scholar
  24. Eskenazi, M. (2009). An overview of spoken language technology for education. Speech Communication, 51(10), 832–844. Scholar
  25. Felps, D., Bortfeld, H., & Gutierrez-Osuna, R. (2009). Foreign accent conversion in computer assisted pronunciation training. Speech Communication, 51(10), 920–932. Scholar
  26. Foote, J. A., & McDonough, K. (2017). Using shadowing with mobile technology to improve L2 pronunciation. Journal of Second Language Pronunciation, 3(1), 34–56. Scholar
  27. Fouz-González, J. (2012). Can Apple’s iPhone help to improve English pronunciation autonomously? State of the app. In L. Bradley & S. Thouësny (Eds.), CALL: Using, Learning, Knowing, EUROCALL Conference Proceedings (pp. 81–87), Gothenburg, Sweden, 22–25 August 2012. Dublin, Ireland:
  28. Fouz-González, J. (2015a). Foreign language pronunciation training with affordable and easily accessible technologies: Podcasts, smartphone apps and social networking services (Twitter). PhD thesis, University of Murcia (Spain).Google Scholar
  29. Fouz-González, J. (2015b). Trends and directions in computer assisted pronunciation training. In J. Mompean & J. Fouz-González (Eds.), Investigating English pronunciation: Trends and directions (pp. 314–342). Basingstoke, UK and New York: Palgrave Macmillan.CrossRefGoogle Scholar
  30. Fouz-González, J. (2017). Pronunciation instruction through Twitter: The case of commonly mispronounced words. Computer-Assisted Language Learning, 30(7), 631–663. Scholar
  31. Fryer, L., & Carpenter, R. (2006). Emerging technologies: Bots as language learning tools. Language Learning & Technology, 10(3), 8–14. Retrieved January 1, 2018, from
  32. Germain-Rutherford, A., & Martin, P. (2002). Integration of speech technology in oral expression courses: Focus on feedback. In Proceedings of the conference on Technology in Language Education: Meeting the Challenges of Research and Practice (pp. 39–44). Hong Kong and Nanjing. Retrieved August 17, 2017, from
  33. Gerosa, M., & Giuliani, D. (2004). Preliminary investigations in automatic recognition of English sentences uttered by Italian children. In R. Delmonte, P. Delcloque, & S. Tonelli (Eds.), Proceedings of NLP and Speech Technologies in Advanced Language Learning Systems Symposium, Venice, Italy (pp. 9–12). Padova: Unipress.Google Scholar
  34. Gibbon, F. & Lee, A. (2007). Electropalatography as a research and clinical tool. In L. Vallino-Napoli (Ed.), Perspectives on Speech Science and Orofacial Disorders (ASHA Division 5), 17, 7–13.Google Scholar
  35. Gilbert, J. B. (2012). Clear speech: Pronunciation and listening comprehension in North American English (4th ed.). New York: Cambridge University Press.CrossRefGoogle Scholar
  36. Godwin-Jones, R. (2014a). Games in language learning: Opportunities and challenges. Language Learning & Technology, 18(2), 9–19.Retrieved January 1, 2018, from
  37. Godwin-Jones, R. (2014b). Global reach and local practice: The promise of MOOCS. Language Learning & Technology, 18(3), 5–15. Retrieved January 1, 2018, from
  38. Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, D. L., & Freynik, S. (2014). Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, 27(1), 70–105.
  39. Goronzy, S., Rapp, S., & Kompe, R. (2004). Generating non-native pronunciation variants for lexicon adaptation. Speech Communication, 42(1), 109–123.
  40. Goronzy, S., Tomokiyo, L. M., Barnard, E., & Davel, M. (2006). Other challenges: Non-native speech, dialects, accents, and local interfaces. In T. Schultz & K. Kirchhoff (Eds.), Multilingual speech processing (pp. 273–316). San Diego and London: Academic Press.CrossRefGoogle Scholar
  41. Grant, K., & Greenberg, S. (2001). Speech intelligibility derived from asynchronous processing of auditory visual information. Paper presented at the Audio-visual Speech Processing Workshop 2001. Retrieved February 22, 2018, from
  42. Hacker, C., Batliner, A., Steidl, S., Nöth, E., Niemann, H., & Cincarek, T. (2005). Assessment of non-native children’s pronunciation: Human marking and automatic scoring. In G. Kokkinakis, N. Fakotakis, & E. Dermatas (Eds.), Proceedings of the 10th International Conference on Speech and Computer (SPECOM 2005), Patras, Greece, Vol. 1 (pp. 123–126). Moscow: Moskow State Linguistic University.Google Scholar
  43. Han, J. (2012). Emerging technologies: Robot-assisted language learning. Language Learning & Technology, 16(3), 1–9. Retrieved January 1, 2018, from
  44. Han, J., Jo, M., Park, S., & Kim, S. (2005). The educational use of home robots for children. In Proceedings of the 14th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN 2005), Nashville, TN (pp. 378–383). Piscataway, NJ: IEEE.
  45. Han, J., & Kim, D. (2009). r-Learning services for elementary school students with a teaching assistant robot. In Proceedings of the 4th ACM/IEEE Human Robot Interaction, La Jolla, CA (pp. 255–256). New York: ACM.
  46. Harashima, H. D. (1999). Software review: Tracy [sic] Talk—The Mystery. Computer Assisted Language Learning, 12(3), 271–274.
  47. Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning and Technology, 8(1), 34–52. Retrieved January 1, 2018, from
  48. Hardison, D. M. (2007). The visual element in phonological perception and learning. In M. C. Pennington (Ed.), Phonology in context (pp. 135–158). New York: Palgrave Macmillan.CrossRefGoogle Scholar
  49. Hill, C. J. (2013). Apple’s dictation software: A voice solution for writers whose hands need a rest. Writing & Pedagogy, 5(2), 346–355. Scholar
  50. Hincks, R. (2001). Using speech recognition to evaluate skills in spoken English. Lund University, Dept. of Linguistics Working Papers, 49, 58–61. Retrieved September 23, 2017, from
  51. Hincks, R. (2005). Measures and perceptions of liveliness in student oral presentation speech: A proposal for automatic feedback mechanism. System, 33(4), 575–591. Scholar
  52. Hincks, R., & Edlund, J. (2009). Promoting increased pitch variation in oral presentations with transient visual feedback. Language Learning and Technology, 13(3), 32–50
  53. Hirata, Y. (2004). Computer-assisted pronunciation training for native English speakers learning Japanese pitch and duration contrasts. Computer Assisted Language Learning, 17(3–4), 357–376. Scholar
  54. Holland, V., Kaplan, J., & Sabol, M. (1999). Preliminary tests of language learning in a speech-interactive graphics microworld. CALICO Journal, 16(3), 339–359.
  55. Hyun, E-j., Kim, S-y., Jang, S., & Park, S. (2008). Comparative study of effects of language education program using intelligence robot and multimedia on linguistic ability of young children. In Proceedings of the 17th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN 2008), Munich, Germany (pp. 87–192). Piscataway, NJ: IEEE.
  56. Jesse, A., Vrignaud, N., Cohen, M. M., & Massaro, D. W. (2000). The processing of information from multiple sources in simultaneous interpreting. Interpreting, 5(2), 95–115. Scholar
  57. Kachru, B. B. (1985). Standards, codification and sociolinguistic realism: The English language in the outer circle. In R. Quirk & H. Widdowson (Eds.), English in the world: Teaching and learning the language and literatures (pp. 11–30). Cambridge: Cambridge University Press.Google Scholar
  58. Kaltenboeck, G. (2002). Computer-based intonation teaching: Problems and potential. In Talking Computers, Proceedings of the IATEFL Pronunciation and Computer Special Interest Groups (pp. 11–17). Whitstable, UK: IATEFL.Google Scholar
  59. Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94(4), 554–566.
  60. Kawai, G., & Hirose, K. (2000). Teaching the pronunciation of Japanese double-mora phonemes using speech recognition technology. Speech Communication, 30(2–3), 131–143.
  61. Kormos, J., & Dénes, M. (2004). Exploring measures and perceptions of fluency in the speech of second language learners. System, 32(2), 145–164.
  62. Krajka, J. (2001). English for Kids. CALICO Software Review. Retrieved January 1, 2018,
  63. Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36(3), 345–366. Scholar
  64. Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27, 184–202. Scholar
  65. Levis, J., & Pickering, L. (2004). Teaching intonation in discourse using speech visualization technology. System, 32(4), 505–524. Scholar
  66. Lewis Johnson, W. (2010). Serious use of a serious game for language learning. International Journal of Artificial Intelligence in Education, 20(2), 175–195.
  67. Lin, C.-H., Warschauer, M., & Blake, R. (2016). Language learning through social networks: Perceptions and reality. Language Learning & Technology, 20(1), 124–147. Retrieved February 22, 2018, from
  68. Lord, G. (2015). I don’t know how to use words in Spanish: Rosetta Stone and learner proficiency outcomes. Modern Language Journal, 99(2), 401–405. Scholar
  69. Mackey, A., & Choi, J-Y. (1998). Review of TriplePlayPlus. Language Learning & Technology, 2(1), 19–20. Retrieved February 22, 2018, from
  70. Massaro, D. W., & Light, J. (2004). Using visible speech for training perception and production of speech for hard of hearing individuals. Journal of Speech, Language, and Hearing Research, 47(2), 304–320. Scholar
  71. Mich, O., Neri, A., & Giuliani, D. (2006). The effectiveness of a computer assisted pronunciation training system for young foreign language learners. Proceedings of CALL 2006 (pp. 135–143). Antwerp, Belgium: University of Antwerp. Retrieved February 22, 2018, from 
  72. Mompean, J. A., & Fouz-González, J. (2016). Twitter based ELF pronunciation instruction. Language Learning & Technology, 20(1), 166–190. Retrieved April 22, 2018, from
  73. Movellan, J. R., Eckhardt, M., Virnes, M., & Rodriguez A. (2009). Sociable robot improves toddler vocabulary skills. In Proceedings of the 4th ACM/IEEE Human Robot Interaction, La Jolla, CA (pp. 307–308). New York, NY: ACM.
  74. Nagano, K., & Ozawa, K. (1990). English speech training using voice conversion. In 1st International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan (pp. 1169–1172). Retrieved February 22, 2018, from
  75. Nardi, B., Ly, S., & Harris, J. (2007). Learning conversations in World of Warcraft. In 40th Annual Hawaii International conference on system sciences (pp. 1–10). Washington, DC: IEEE Computer Society. Retrieved February 22, 2018, from
  76. Neri, A., Cucchiarini, C., & Strik, H. (2006). Selecting segmental errors in L2 Dutch for optimal pronunciation training. International Review of Applied Linguistics, 44(4), 357–404.
  77. Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer-assisted pronunciation training. Computer-Assisted Language Learning, 15(5), 441–467.
  78. Neri, A., Mich, O., Gerosa, M., & Giuliani, D. (2008). The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning, 21(5), 393–408. Scholar
  79. Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation quality. Speech Communication, 30(2-3), 83–93.
  80. Oh, Y. R., Yoon, J. S., & Kim, H. K. (2007). Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Communication, 49(1), 59–70.
  81. Ouni, S., Cohen, M. M., & Massaro, D. W. (2005). Training Baldi to be multilingual: A case study for an Arabic Badr. Speech Communication, 45(2), 115–137. Scholar
  82. Park, S., Han, J., Kang, B., & Shin, K. (2011). Teaching assistant robot, ROBOSEM, in English class and practical issues for its diffusion. In Proceedings of IEEE A Workshop on Advanced Robotics and its Social Impacts. Retrieved February 22, 2018, from
  83. Peachey, N. (2009). 20 WebCam activities for EFL ESL students. Nik’s Learning Technology Blog. Retrieved February 22, 2018, from
  84. Pearson Education Inc. (2011). Versant English Test. Test description and validation summary. Retrieved February 22, 2018, from
  85. Pearson, P., Pickering, L., & Da Silva, R. (2011). The impact of computer assisted pronunciation training on the improvement of Vietnamese learner production of English syllable margins. In. J. Levis & K. LeVelle (Eds.), Proceedings of the 2nd Pronunciation in Second Language Learning and Teaching Conference, September 2010 (pp. 169–180). Ames, IA: Iowa State University.Google Scholar
  86. Pennington, M. C. (1990). The context of L2 phonology. In H. Burmeister & P. L. Rounds (Eds.), Variability in second language acquisition, Vol 2 (pp. 541–564). Eugene, OR: University of Oregon. Available at
  87. Pennington, M. C. (1992). Discourse factors in second language phonology: An exploratory study. In J. Leather & A. James (Eds.), New sounds 92, Proceedings of the 1992 Amsterdam Symposium on the Acquisition of Second-Language Speech, University of Amsterdam, April 1992 (pp. 137–155). Amsterdam: University of Amsterdam.Google Scholar
  88. Pennington, M. C. (1999). Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning, 12(5), 427–440.
  89. Rama, P., Black, R., van Es, E., & Warschauer, M. (2012). Affordances for second language learning in world of Warcraft. ReCALL, 24(3), 322–338. Scholar
  90. Rogerson-Revell, P. (2010). Phonology and phonetics review. Subject Centre for Languages, Linguistics and Area Studies, Higher Education Academy. Retrieved February 22, 2018, from
  91. Rogerson-Revell, P. (2011). English phonology and pronunciation teaching. London: Bloomsbury.Google Scholar
  92. Scharenborg, O. (2007). Reaching over the gap: A review of efforts to link human and automatic speech recognition research. Speech Communication, 49(5), 336–347.
  93. Schultz, T., & Kirchhoff, K. (Eds.). (2006). Multilingual speech processing. San Diego & London: Academic Press.Google Scholar
  94. Seedhouse, P. (Ed.). (2017). Task-based language learning in a real world digital environment: The European digital kitchen. London: Bloomsbury.Google Scholar
  95. Sumby, W., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26(2), 212–215.
  96. Surface, E., & Dierdorff E. (2007). Special operations language training software measurement of effectiveness study: Tactical Iraqi study final report. Special Operations Forces Language Office, Tampa, FL.Google Scholar
  97. Swain, M. (2000). The output hypothesis and beyond: Mediating acquisition through collaborative dialogue. In J. Lantolf (Ed.), Sociocultural theory and second language learning (pp. 97–114). Oxford: Oxford University Press.Google Scholar
  98. Swain, M., & Lapkin, S. (1998). Interaction and second language learning: Two adolescent French immersion students working together. Modern Language Journal, 82(3), 320–337.
  99. Sykes, J. M. (2013). Synthetic immersive environments and second language pragmatic development. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 5549–5505). London: Blackwell.Google Scholar
  100. Thorne, S., Fischer, I., & Lu, X. (2012). The semiotic ecology and linguistic complexity of an online game world. ReCALL, 24(3), 279–301.
  101. Wachowicz, K. A., & Scott, B. (1999). Software that listens: It’s not a question of whether, it’s a question of how. CALICO Journal, 16(3), 253–276.
  102. Wang, X., & Munro, M. J. (2004). Computer-based training for learning English vowel contrasts. System, 32(4), 539–552.
  103. Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. Scholar
  104. Wik, P., & Hjalmarsson, A. (2009). Embodied conversational agents in computer assisted language learning. Speech Communication, 51(10), 1024–1037. Scholar
  105. Wildner, S. (2002). Learn German now! Software review. CALICO Journal, 20(1), 161–174. Retrieved January 1, 2018, from

Copyright information

© The Author(s) 2019

Authors and Affiliations

  • Martha C. Pennington
    • 1
  • Pamela Rogerson-Revell
    • 2
  1. 1.SOAS and Birkbeck CollegeUniversity of LondonLondonUK
  2. 2.EnglishUniversity of LeicesterLeicesterUK

Personalised recommendations