Urdu Speech Corpus and Preliminary Results on Speech Recognition

Ali, Hazrat; Ahmad, Nasir; Hafeez, Abdul

doi:10.1007/978-3-319-44188-7_24

Urdu Speech Corpus and Preliminary Results on Speech Recognition

Hazrat Ali¹²,
Nasir Ahmad¹³ &
Abdul Hafeez¹³

Conference paper
First Online: 19 August 2016

2138 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Abstract

Language resources for Urdu language are not well developed. In this work, we summarize our work on the development of Urdu speech corpus for isolated words. The Corpus comprises of 250 isolated words of Urdu recorded by ten individuals. The speakers include both native and non-native, male and female individuals. The corpus can be used for both speech and speaker recognition tasks. We also report our results on automatic speech recognition task for the said corpus. The framework extracts Mel Frequency Cepstral Coefficients along with the velocity and acceleration coefficients, which are then fed to different classifiers to perform recognition task. The classifiers used are Support Vector Machines, Random Forest and Linear Discriminant Analysis. Experimental results show that the best results are provided by the Support Vector Machines with a test set accuracy of 73 %. The results reported in this work may provide a useful baseline for future research on automatic speech recognition of Urdu.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ethnologue. http://www.ethnologue.com/show_country.asp?name=PK
Sarfraz, H., et al.: Speech corpus development for a speaker independent spontaneous Urdu speech recognition system. In: Proceedings of the O-COCOSDA, Kathmandu, Nepal (2010). doi:10.1109/ivtta.1994.341535
Raza, A.A., Hussain, S., Sarfraz, H., Ullah, I., Sarfraz, Z.: Design and development of phonetically rich Urdu speech corpus. In: Proceeding of International Conference on Speech Database and Assessments, COCOSDA, pp. 38–43 (2009). doi:10.1109/icsda.2009.5278380
Akram, M.U., Arif, M.: Design of an Urdu speech recognizer based upon acoustic phonetic modeling approach. In: Proceedings of 8th International Multitopic Conference (INMIC 2004), pp. 91–96, December 2004. doi:10.1109/inmic.2004.1492852
Ahad, A., Fayyaz, A., Mehmood, T.: Speech recognition using multilayer perceptron. In: Proceedings. IEEE Students Conference, ISCON 2002, pp. 103–109, August 2002. doi:10.1109/iscon.2002.1215948
Hasnain, S., Awan, M.: Recognizing spoken Urdu numbers using fourier descriptor and neural networks with matlab. In: Second International Conference on Electrical Engineering (ICEE 2008), pp. 1–6, March 2008. doi:10.1109/icee.2008.4553937
Ashraf, J., Iqbal, N., Sarfraz Khattak, N., Mohsin Zaidi, A.: Speaker independent Urdu speech recognition using HMM. In: The 7th International Conference on Informatics and Systems (INFOS 2010), pp. 1–5, March 2010. doi:10.1007/978-3-642-13881-2_14
Google Scholar
Ali, H., Ahmad, N., Zhou, X., Iqbal, K., Ali, S.M.: DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3(1), 204 (2014). doi:10.1186/2193-1801-3-204
Article Google Scholar
Ali, H., Ahmad, N., Zhou, X.: Automatic speech recognition of Urdu words using linear discriminant analysis. J. Intell. Fuzzy Syst. 28(5), 2369–2375 (2015). doi:10.3233/ifs-151554
Article Google Scholar
Ali, H., Jianwei, A., Iqbal, K.: Automatic speech recognition of Urdu digits with optimal classification approach. Int. J. Comput. Appl. 118(9), 1–5 (2015). doi:10.5120/20770-3275
Google Scholar
Center for Language Engineering. www.cle.org.pk
Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-frequency cepstral coefficients on the power spectrum. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), pp. 73–76 (2001). doi:10.1109/icassp.2001.940770
Han, W., Chan, C.F., Choy, C.S., Pun, K.P.: An efficient MFCC extraction method in speech recognition. In: Proceedings. IEEE International Symposium on Circuits and Systems, ISCAS 2006, May 2006. doi:10.1109/iscas.2006.1692543
Kotnik, B., Vlaj, D., Horvat, B.: Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems. Int. J. Speech Technol. 6(3), 205–219 (2003)
Article Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152 (1992). doi:10.1145/130385.130401
Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., Vapnik, V.: Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77-82, October 1994. doi:10.1109/icpr.1994.576879
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). doi:10.1145/1961189.1961199. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Google Scholar
Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol 1. pp. 278–282, August 1995. doi:10.1109/icdar.1995.598994
Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 96–103 (2008). doi:10.1145/1390156.1390169
Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis: a brief tutorial. http://www.music.mcgill.ca. Accessed 10 Feb 2016
Ali, H., Zhou, X., Tie, S.: Comparison of MFCC and DWT features for automatic speech recognition of Urdu. In International Conference on Cyberspace Technology (CCT 2013), Beijing, China, pp. 154–158, November 2013. doi:10.1049/cp.2013.2112
Ali, H., d’Avila Garcez, A.S., Tran, S.N., Zhou, X., Iqbal, K.: Unimodal late fusion for NIST i-vector challenge on speaker detection. Electron. Lett. 50(15), 1098–1100 (2014). doi:10.1049/el.2014.1207
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, COMSATS Institute of Information Technology, Abbottabad, Pakistan
Hazrat Ali
Department of Computer Systems Engineering, University of Engineering and Technology, Peshawar, Pakistan
Nasir Ahmad & Abdul Hafeez

Authors

Hazrat Ali
View author publications
You can also search for this author in PubMed Google Scholar
Nasir Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Hafeez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hazrat Ali .

Editor information

Editors and Affiliations

Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Lab of Forest Informatics (FiLAB), Democritus University of Thrace Lab of Forest Informatics (FiLAB), Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ali, H., Ahmad, N., Hafeez, A. (2016). Urdu Speech Corpus and Preliminary Results on Speech Recognition. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-44188-7_24
Published: 19 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics