Widespread use of English in the academia and in business is leading an increasing number of people to learn it as a second or a foreign language. Computer aided pronunciation training (CAPT) systems are used by non-native English speakers for improving their English pronunciation. A typical CAPT tool records the speech of a learner, detects and diagnoses mispronunciations in it, and suggests a way for correcting them. We classified the CAPT systems for English into four categories on the basis of the technology used in them and studied the salient features of each such category. We observed that visual simulation based systems are suitable for young and naive learners, game based systems are advantageous as they can be personalized as per the requirements of the learners, comparative phonetics based systems are suitable for adult learners fluent in another language, and artificial neural network based systems have the highest accuracy in mispronunciation diagnosis and are suitable for experienced and professional learners. We identified the state-of-the-art practices used in CAPT systems, and observed that CAPT systems can detect up to 86% mispronunciations in a speech and help learners to lessen mispronouncing by up to 23%. We recommend collaboration between language teachers and software developers to develop CAPT tools, their wide dissemination and integration with the curriculum at school and university levels, and further investigation on mobile and collaborative CAPT systems.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Abdou, S. M., Hamid, S. E., Rashwan, M., Samir, A., Abdel-Hamid, O., Shahin, M., & Nazih, W. (2006). Computer aided pronunciation learning system using speech recognition techniques. In: Proceedings of the ninth international conference on spoken language processing, pp. 849–852.
Abe, S., Nakata, S., Kigoshi, T., & Mochizuki, H. (2003). Designing and developing multilingual e-learning materials: TUFS language education pronunciation module - introduction of a system for learning Japanese language pronunciation. In: Proceedings of the Third IEEE International Conference on Advanced Learning Technologies, pp. 462–462.
Akima, Y., Watanabe, S., Tsubota, A., & Sone, M. (1992). Application of neural networks to the teaching of English pronunciation. In: Proceedings of the Singapore ICCS/ISITA Conference, vol. 2, pp. 553–557.
Athanasopoulos, G., Hagihara, K., Cierro, A., Guerit, R., Chatelain, J., Lucas, C., & Macq, B. (2017). 3D immersive karaoke for the learning of foreign language pronunciation. In: Proceedings of the international conference on 3D immersion, pp. 1–8.
Chen, L. -Y., & Jang, J. -S. R. (2015). Automatic pronunciation scoring with score combination by learning to rank and class-normalized DP-based quantization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1737–1749.
Chiu, C. -F., Lee, G. C., & Yang, J. -H. (2007). Design and implementation of video-enabled web-based pronunciation debugging system. In: Proceedings of the Seventh IEEE International Conference on Advanced Learning Technologies, pp. 374–378.
Giuliani, D., Mich, O., & Nardon, M. (2003). A study on the use of a voice interactive system for teaching English to Italian children. In: Proceedings of the Third IEEE International Conference on Advanced Learning Technologies, pp. 376–377.
Jain, D., Patil, A. P., Nawal, D. J., & Chakraborty, P. (2018). ARWAK: An augmented reality wordbook smartphone app for kindergarteners. Journal of Multi Disciplinary Engineering Technologies, 12(2), 59–66.
Jing, X., & Yong, L. (2014). The speech evaluation method of English phoneme mobile learning system. In: Proceedings of the IEEE Workshop on Advanced Research and Technology in Industry Applications, pp. 546–550.
Juang, B. -H., & Furui, S. (2000). Automatic recognition and understanding of spoken language – A first step toward natural human-machine communication. Proceedings of the IEEE, 88(8), 1142–1165.
Kalikow, D. N., & Swets, J. A. (1972). Experiments with computer-controlled displays in second language learning. IEEE Transactions on Audio and Electro Acoustics, 20(1), 23–28.
Lee, H. -Y., Tseng, B. -H., Wen, T. -H., & Tsao, Y. (2017). Personalizing recurrent-neural-network based language model by social network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(3), 519–530.
Li, K., Qian, X., & Meng, H. (2017). Mispronunciation detection and diagnosis in L2 English speech using multi-distribution deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 193–207.
Liaw, M. -L. (2014). The affordance of speech recognition technology for EFL learning in an elementary school setting. Innovation in Language Learning and Teaching, 8(1), 79–93.
Nakai, S., Beavan, D., Lawson, E., Leplâtre, G., Scobbie, J. M., & Smith, J. S. (2018). Viewing speech in action: Speech articulation videos in the public domain that demonstrate the sounds of the international phonetic alphabet (IPA). Innovation in Language Learning and Teaching, 12(3), 212–220.
Nyugen, V. A., Pham, V. C., & Ho, S. D. (2010). A context aware mobile learning adaptive system for supporting foreigner learning English. In: Proceedings of the IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future, pp. 1–6.
Qian, X., Soong, F., & Meng, H. (2010). Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). In: Proceedings of the eleventh annual conference of the international speech communication association, 757–760.
Qian, X., Meng, H., & Soong, F. (2012). The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training. In: Proceedings of the thirteenth annual conference of the international speech communication association, pp. 775–778.
Qian, X., Meng, H., & Soong, F. (2016). A two-pass framework of mispronunciation detection and diagnosis for computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6), 1020–1028.
Samsudin, N. S. B., & Mano, K. (2017). Animated texts application in visualizing speech features for foreign language learning. In: Proceedings of the IEEE region 10 conference, pp. 1778–1783.
Satria, F., Aditra, H., Wibowo, M. D. A., Luthfiansyah, H., Suryani, M., Paulus, E., & Suryana, I. (2017). EFL learning media for early childhood through speech recognition application. In: Proceedings of Third International Conference on Science in Information Technology, pp. 568–572.
Shum, S. H., Harwath, D. F., Dehak, N., & Glass, J. R. (2016). On the use of acoustic unit discovery for language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9), 1665–1676.
Su, P. -H., Wu, C. -H., & Lee, L. -S. (2015). A recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 127–141.
Tianli, Z., Jia, L., Yanfeng, L., Shunping, H., & Chaolei, L. (2003). An automatic pronunciation teaching system for Chinese to learn English. In: Proceedings of the IEEE international conference on robotics intelligent systems and signal processing, vol. 2, pp. 1157–1161.
Wang, Y. B., & Lee, L. S. (2015). Supervised detection and unsupervised discovery of pronunciation error patterns for computer-assisted language learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 564–579.
Wang, L., Feng, X., & Helen, M. (2008). Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training. In: Proceedings of the ninth annual conference of the international speech communication association, pp. 1729–1732.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agarwal, C., Chakraborty, P. A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Educ Inf Technol 24, 3731–3743 (2019). https://doi.org/10.1007/s10639-019-09955-7
- Educational software
- Computer aided pronunciation training (CAPT)
- English as a second language
- English as a foreign language