Skip to main content

Urdu Speech Corpus and Preliminary Results on Speech Recognition

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Abstract

Language resources for Urdu language are not well developed. In this work, we summarize our work on the development of Urdu speech corpus for isolated words. The Corpus comprises of 250 isolated words of Urdu recorded by ten individuals. The speakers include both native and non-native, male and female individuals. The corpus can be used for both speech and speaker recognition tasks. We also report our results on automatic speech recognition task for the said corpus. The framework extracts Mel Frequency Cepstral Coefficients along with the velocity and acceleration coefficients, which are then fed to different classifiers to perform recognition task. The classifiers used are Support Vector Machines, Random Forest and Linear Discriminant Analysis. Experimental results show that the best results are provided by the Support Vector Machines with a test set accuracy of 73 %. The results reported in this work may provide a useful baseline for future research on automatic speech recognition of Urdu.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ethnologue. http://www.ethnologue.com/show_country.asp?name=PK

  2. Sarfraz, H., et al.: Speech corpus development for a speaker independent spontaneous Urdu speech recognition system. In: Proceedings of the O-COCOSDA, Kathmandu, Nepal (2010). doi:10.1109/ivtta.1994.341535

  3. Raza, A.A., Hussain, S., Sarfraz, H., Ullah, I., Sarfraz, Z.: Design and development of phonetically rich Urdu speech corpus. In: Proceeding of International Conference on Speech Database and Assessments, COCOSDA, pp. 38–43 (2009). doi:10.1109/icsda.2009.5278380

  4. Akram, M.U., Arif, M.: Design of an Urdu speech recognizer based upon acoustic phonetic modeling approach. In: Proceedings of 8th International Multitopic Conference (INMIC 2004), pp. 91–96, December 2004. doi:10.1109/inmic.2004.1492852

  5. Ahad, A., Fayyaz, A., Mehmood, T.: Speech recognition using multilayer perceptron. In: Proceedings. IEEE Students Conference, ISCON 2002, pp. 103–109, August 2002. doi:10.1109/iscon.2002.1215948

  6. Hasnain, S., Awan, M.: Recognizing spoken Urdu numbers using fourier descriptor and neural networks with matlab. In: Second International Conference on Electrical Engineering (ICEE 2008), pp. 1–6, March 2008. doi:10.1109/icee.2008.4553937

  7. Ashraf, J., Iqbal, N., Sarfraz Khattak, N., Mohsin Zaidi, A.: Speaker independent Urdu speech recognition using HMM. In: The 7th International Conference on Informatics and Systems (INFOS 2010), pp. 1–5, March 2010. doi:10.1007/978-3-642-13881-2_14

    Google Scholar 

  8. Ali, H., Ahmad, N., Zhou, X., Iqbal, K., Ali, S.M.: DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3(1), 204 (2014). doi:10.1186/2193-1801-3-204

    Article  Google Scholar 

  9. Ali, H., Ahmad, N., Zhou, X.: Automatic speech recognition of Urdu words using linear discriminant analysis. J. Intell. Fuzzy Syst. 28(5), 2369–2375 (2015). doi:10.3233/ifs-151554

    Article  Google Scholar 

  10. Ali, H., Jianwei, A., Iqbal, K.: Automatic speech recognition of Urdu digits with optimal classification approach. Int. J. Comput. Appl. 118(9), 1–5 (2015). doi:10.5120/20770-3275

    Google Scholar 

  11. Center for Language Engineering. www.cle.org.pk

  12. Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-frequency cepstral coefficients on the power spectrum. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), pp. 73–76 (2001). doi:10.1109/icassp.2001.940770

  13. Han, W., Chan, C.F., Choy, C.S., Pun, K.P.: An efficient MFCC extraction method in speech recognition. In: Proceedings. IEEE International Symposium on Circuits and Systems, ISCAS 2006, May 2006. doi:10.1109/iscas.2006.1692543

  14. Kotnik, B., Vlaj, D., Horvat, B.: Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems. Int. J. Speech Technol. 6(3), 205–219 (2003)

    Article  Google Scholar 

  15. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152 (1992). doi:10.1145/130385.130401

  16. Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., Vapnik, V.: Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77-82, October 1994. doi:10.1109/icpr.1994.576879

  17. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). doi:10.1145/1961189.1961199. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Google Scholar 

  18. Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol 1. pp. 278–282, August 1995. doi:10.1109/icdar.1995.598994

  19. Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 96–103 (2008). doi:10.1145/1390156.1390169

  20. Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis: a brief tutorial. http://www.music.mcgill.ca. Accessed 10 Feb 2016

  21. Ali, H., Zhou, X., Tie, S.: Comparison of MFCC and DWT features for automatic speech recognition of Urdu. In International Conference on Cyberspace Technology (CCT 2013), Beijing, China, pp. 154–158, November 2013. doi:10.1049/cp.2013.2112

  22. Ali, H., d’Avila Garcez, A.S., Tran, S.N., Zhou, X., Iqbal, K.: Unimodal late fusion for NIST i-vector challenge on speaker detection. Electron. Lett. 50(15), 1098–1100 (2014). doi:10.1049/el.2014.1207

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hazrat Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ali, H., Ahmad, N., Hafeez, A. (2016). Urdu Speech Corpus and Preliminary Results on Speech Recognition. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44188-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44187-0

  • Online ISBN: 978-3-319-44188-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics