Abstract
We present the first end-to-end recipe of Arabic speech recognition using lexicon free Connection Temporal Classification (CTC) and Recurrent Neural Networks (RNN). The study describes in details the decisions made, step by step, in building Arabic system including transcription method, feature extraction, training process and decoding optimization. The results are compared with Hidden Markov Models (HMM), Gaussian mixture models (GMM), and tandem baseline in Arabic using the same data set. The corpus is Aljazeera broadcast and language model extracted from the Aljazeera corpus, web and twitter crawling using different n-grams. We measure both word error rate (WER) and character error rate (CER) for each n-gram order. The results achieved are very close to the baseline with some recommendations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfred, M.: Signal Analysis Wavelets, Filter Banks, Time-Frequency Transforms and Applications. Wiley, New York (1999)
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: Spoken Language Technology Workshop (SLT), IEEE 2014. IEEE (2014a)
Ali, A., Zhang, Y., Vogel, S.: QCRI advanced transcription system (QATS). In: Spoken Language Technology Workshop (SLT) (2014b)
Attia, M., Samih, Y., Shaalan, K.F., van Genabith, J.: The floating Arabic dictionary: an automatic method for updating a lexical database through the detection and lemmatization of unknown words. In: COLING (2012)
Bourlard, H., Morgan, N.: Hybrid HMM/ANN systems for speech recognition: overview and new research directions. In: Giles, C.L., Gori, M. (eds.) NN 1997. LNCS, vol. 1387, pp. 389–417. Springer, Heidelberg (1998). doi:10.1007/BFb0054006
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1996)
Diehl, F., Gales, M.J., Tomalin, M., Woodland, P.C.: Morphological decomposition in Arabic ASR systems. Comput. Speech Lang. 26(4), 229–243 (2012)
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics(2011)
Graves, A., Fernndez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning. ACM (2006)
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014) (2014)
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint. arXiv:1412.5567
Hermansky, H., Ellis, D. W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000. IEEE (2000)
Hifny, Y.: Unified acoustic modeling using deep conditional random fields. Trans. Mach. Learn. Artif. Intell. 3(2), 65 (2015)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Maas, A. L., Xie, Z., Jurafsky, D., Ng, A.Y.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)
Motlicek, P., Imseng, D., Potard, B., Garner, P.N., Himawan, I.: Exploiting foreign resources for DNN-based ASR. EURASIP J. Audio Speech Music Process. 2015(1), 1–10 (2015)
Nadeu, C., Hernando, J., Gorricho, M.: On the decorrelation of filter-bank energies in speech recognition. In: Eurospeech. Citeseer (1995)
Othman, E., Shaalan, K., Rafea, A.: Towards resolving ambiguity in understanding arabic sentence, In: International Conference on Arabic Language Resources and Tools, NEMLAR. Citeseer (2004)
Radha, V., Vimala, C.: A review on speech recognition challenges and approaches. World Comput. Sci. Inf. Technol. J. 2(1), 1–7 (2012)
Raschka, S.: Python Machine Learning. Packt Publishing, Birmingham (2015)
Shaalan, K.: A survey of arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)
Shaalan, K., Bakr, H.M.A., Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics (2009)
Shaalan, K., Magdy, M., Fahmy, A.: Analysis and feedback of erroneous Arabic verbs. Nat. Lang. Eng. 21(02), 271–323 (2015)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D.: The HTK Book (for HTK Version 3.5). Cambridge University Engineering Department, Cambridge (2015)
Young, S., Evermann, G., Gales, M., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, D., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Cambridge (2013)
Yu, D., Deng, L.: Automatic Speech Recognition. Springer, London (2012)
Acknowledgment
Many thanks for Luminous technology center (info@luminous-technologies.com) for having full access to server NVidia based setup. Special thanks for QCRI for proving Aljazeera corpus. We thank Ziang Xie, Standard University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ahmed, A., Hifny, Y., Shaalan, K., Toral, S. (2017). Lexicon Free Arabic Speech Recognition Recipe. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-48308-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)