Abstract
Urdu is amongst the five largest languages of the world and enjoys extreme importance by sharing its vocabulary with several other languages of the Indo-Pak. However, there has not been any significant research in the area of Automatic Speech Recognition of Urdu. This paper presents the statistical based classification technique to achieve the task of Automatic Speech Recognition of isolated words in Urdu. For each isolated word, 52 Mel Frequency Cepstral Coefficients have been extracted and based upon these coefficients; the classification has been achieved using Linear Discriminant Analysis. As a prototype, the system has been trained with audio samples of seven speakers including male/female, native/non-native and speakers with different ages while the testing has been done using audio samples of three speakers. It was determined that majority of words exhibit a percentage error of less than 33 %. Words with 100 % error were declared to be bad words. The work reported in this paper may serve as a strong baseline for future research work on Urdu ASR, especially for continuous speech recognition of Urdu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Gagnon, L., Foucher, S., Laliberte, F., Boulianne, G.: A simplified audiovisual fusion model with application to large-vocabulary recognition of French Canadian speech. Can. J. Electr. Comput. Eng. (Spring) 33(2), 109–119 (2008)
Morii, S., Niyada, K., Fujii, S., Hoshimi, M.: Large vocabulary speaker-independent Japanese speech recognition system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 866–869 (1985)
Shimizu, T., Ashikari, Y., Sumita, E., Zhang, J.: NICT/ATR Chinese-Japanese-English speech-to-speech translation system. Tshingua Sci. Technol. 13(4), 540–544 (2008)
Mao, J., Chen, Q., Gao, F., Guo, R., Lu, R.: SHTQS: a telephone-based Chinese spoken dialogue system. J. Syst. Eng. Electron. 16(4), 881–885 (2005)
Khadivi, S., Ney, S.: Integration of speech recognition and machine translation in computer-assisted translation. IEEE Trans. Audio Speech Lang. Process. 16(8), 1551–1564 (2008)
Ghai, W., Singh, N.: Analysis of automatic speech recognition systems for Indo-Aryan languages: Punjabi a case study. Int. J. Soft Comput. Eng. (IJSCE) 2(1), 379–385 (2012)
Akram, M.U., Arif, M.: Design of an Urdu speech recognizer based upon acoustic phonetic modeling. In: 8th International Multitopic Conference, pp. 91–96 (2004)
Azam, S.M., Mansoor, Z.A., Mughal, M.S., Mohsin, S.: Urdu spoken digits recognition using classified MFCC and backpropgation neural network. In: Computer Graphics, Imaging and Visualization, CGIV’07, pp. 414–418 (2007)
Ahad, A., Fayyaz, A., Mehmood, T.: Speech recognition using multilayer perceptron. In: Proceedings of IEEE Students Conference, ISCON’02, pp. 103–109 (2002)
Hasnain, S.K., Awan, M.S.: Recognizing spoken Urdu numbers using fourier descriptor and neural networks with Matlab. In: Second International Conference on Electrical Engineering (ICEE 2008), pp. 1–6 (2008)
Ashraf, J., Iqbal, N., Khattak, N.S., Zaidi, A.M.: Speaker independent Urdu speech recognition using HMM. In: The 7th International Conference on Informatics and Systems (INFOS 2010), pp. 1–5, March (2010)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Sarfraz, H., et al.: Large vocabulary continuous speech recognition for Urdu. In: 8th International Conference on Frontiers of Information Technology (FIT’10) (2010)
Ali, H., Ahmad, N., Yahya, K.M., Farooq, O.: A medium vocabulary Urdu isolated words balanced corpus for automatic speech recognition. In: 2012 International Conference on Electronics Computer Technology (ICECT 2012), pp. 473–476 (2012)
Center for Language Engineering, May 2012. http://www.cle.org.pk/
Molau, S., Ptiz, M., Schluter, R., Ney, H.: Computing mel-frequency cepstral coefficients on the power spectrum. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’01), pp. 73–76 (2001)
Han, W., Chan, C.-F., Choy, C.-S., Pun, K.-P.: An efficient MFCC extraction method in speech recognition. In: IEEE International Symposium on Circuits and Systems (ISCAS 2006) (2006)
Kotnik, B., Vlaj, D., Horvat, B.: Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems. Int. J. Speech Technol. 6(3), 205–219 (2003)
Proakis, J.G., Manolakis, D.G.: Digital Signal Processing; Principles, Algorithms & Applications, 4th edn. Pearson Education Inc., Prentice Hall (2007)
Ingle, V.K., Proakis, J.G.: Digital Signal Processing Using Matlab, 3rd edn. Cengage Learning, Standford (2010)
Salomon, D.: Data Compression: The Complete Reference, 4th edn. Springer, London (2007)
Balakrishnama, S., Ganapathiraju, A., Picone, J.: Linear discriminant analysis for signal processing problems. In: Proceedings of the IEEE Southeastcon, pp. 36–39 March (1999)
Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis; a brief tutorial. http://www.music.mcgill.ca/~ich. Accessed March 2012
Acknowledgment
The authors are thankful to the supporting staff of the Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Pakistan. It is due to their efforts which they put into keeping the Computer Lab open for extra hours and providing the authors with opportunity to use it. The authors also extend their gratitude to Engr. Salman Ilahi, Department of Electrical Engineering and Engr. Irfan Ahmad, Department of Industrial Engineering, UET Peshawar, for their valuable input and suggestions throughout this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ali, H., Ahmad, N., Zhou, X., Ali, M., Manjotho, A.A. (2014). Linear Discriminant Analysis Based Approach for Automatic Speech Recognition of Urdu Isolated Words. In: Shaikh, F., Chowdhry, B., Zeadally, S., Hussain, D., Memon, A., Uqaili, M. (eds) Communication Technologies, Information Security and Sustainable Development. IMTIC 2013. Communications in Computer and Information Science, vol 414. Springer, Cham. https://doi.org/10.1007/978-3-319-10987-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-10987-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10986-2
Online ISBN: 978-3-319-10987-9
eBook Packages: Computer ScienceComputer Science (R0)