Skip to main content

Using Gaussian Mixtures on Triphone Acoustic Modelling-Based Punjabi Continuous Speech Recognition

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1086))

Abstract

Continuous speech recognition for a particular language is always an area which relies, for its performance, on these major aspects: acoustic modelling and language modelling. Gaussian mixture model-hidden Markov model (GMM–HMM) is a part of acoustic modelling. These components are applied at the back end of ASR design to accurately and efficiently convert continuous speech signal to corresponding text. Triphone-based acoustic modelling makes use of two different context-dependent triphone models: word-internal and cross-word models. In spite of active research in the field of automatic speech recognition for a number of Indian and foreign languages, only few attempts have been made for Punjabi language, specially, in the area of continuous speech recognition. This research paper is aimed at analysing the impact of GMM–HMM-based acoustic model on the Punjabi speaker-independent continuous speech recognition. Recognition accuracy has been determined at word and sentence levels, respectively, with PLP and MFCC features by varying Gaussian mixtures from 2 to 32.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. R.K. Aggarwal, M. Dave, Using Gaussian mixtures for Hindi speech recognition system. Int. J. Signal Process. Image Process. Pattern Recogn. 4(4) (2011)

    Google Scholar 

  2. Audacity 2.0.0, retrieved June 15, 2012 from http://download.cnet.com/Audacity/

  3. S. Lata, Challenges for design of pronunciation lexicon specification (PLS) for Punjabi language (2011). http://hnk.ffzg.hr/bibl/ltc2011/book/papers/MPLRL-4.pdf

  4. HTK Book, Retrieved on Mar 18, 2012 from http://htk.eng.cam.ac.uk

  5. L. Rabiner, et al., Fundamentals of Speech Recognition (Pearson Publishers, 2010)

    Google Scholar 

  6. N. Souto, et al., Building language models for continuous speech recognition systems. L2 F—Spoken Language Systems Laboratory, Portugal, 2001. http://12f.inesc-id.pt/

  7. B.J. Hsu, Generalized linear interpolation of language models, in ASRU (2007). ISBN: 978-1-4244-1746-9/07

    Google Scholar 

  8. M. Sanda et al., Acoustic modelling for croatian speech recognition and synthesis. INFORMATICA 19(2), 227–254 (2008)

    Article  Google Scholar 

  9. H. Ney et al., On structuring probabilistic dependences in stochastic language modeling. Comput. Speech Lang. 8(1), 38 (1994)

    Article  Google Scholar 

  10. M.N. Stuttle, A Gaussian Mixture Model Spectral Representation for Speech Recognition (University Engineering Department, Hughes Hall and Cambridge, 2003)

    Google Scholar 

  11. W. Ghai, N. Singh, Continuous speech recognition for Punjabi language. Int. J. Comput. Appl. 72(14), 422–431 (2013)

    Google Scholar 

  12. S. Sinha, et al., Continuous density hidden markov model for hindi speech recognition. GSTF Int. J. Comput. (JoC), 3(2) (2013). https://doi.org/10.7603/s40601-013-0015-z

  13. M. Vyas, A gaussian mixture model based speech recognition system using MATLAB. Signal Image Process. Int. J. 4(4) (2013)

    Google Scholar 

  14. G.S. Sharma et al., Development of application specific continuous speech recognition system in Hindi. J. Sign. Inf. Process. 3, 394–401 (2012)

    Google Scholar 

  15. M. Dua et al., Punjabi automatic speech recognition using HTK. Int. J. Comput. Sci. Issues (IJCSI) 9(4), 359 (2012)

    Google Scholar 

  16. V. Kadyan et al., Refinement of HMM model parameters for Punjabi automatic speech recognition (PASR) system. IETE J. Res. 64(5), 673–688 (2018)

    Article  Google Scholar 

  17. S. Saraswathi, T.V. Geetha, Building language models for tamil speech recognition system. Springer 3285, 161–168 (2004)

    Google Scholar 

  18. J.B. Graber, Language models. March 2011, Creative Commons Attribution-non Commercial-share Alike 3.0 United States. http://creativecommons.org/licenses/by-nc-sa/3.0/us/

  19. E.W.D. Whittaker, Statistical language modelling for automatic speech recognition of Russian & English, Thesis, Trinity College, University of Cambridge, 1998

    Google Scholar 

  20. T.R. Niesler, P.C. Woodland, A variable-length category-based n-gram language model, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Atlanta, USA, 1996)

    Google Scholar 

  21. HTK-3.4.1, retrieved July 7, 2012 from http://htk.eng.cam.ac.uk

  22. P.P. Singh, Sidhantak Bhasha Vigiyaan (Madaan Publication, Patiala, 2010)

    Google Scholar 

  23. R. Weerasinghe, T. Nadungodage, Continuous Sinhala speech recognition, in Conference on Human Language Technology for Development (Alexandria, Egypt, 2011), 2–5

    Google Scholar 

Download references

Acknowledgements

Our study aimed at investigating the impact of Gaussian mixtures on triphone-based acoustic model with two different types of features: MFCC and PLP. In spite of active research in the field of automatic speech recognition for number of Indian and foreign languages, only few attempts have been made for Punjabi language, specially, in the area of continuous speech recognition. All participants (speakers) involved are authors of the paper and given their consent for the study done. It is not important to increase the number of speakers with reference to presented work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wiqas Ghai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghai, W., Kumar, S., Athavale, V.A. (2021). Using Gaussian Mixtures on Triphone Acoustic Modelling-Based Punjabi Continuous Speech Recognition. In: Gao, XZ., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Computational Intelligence and Communication Technology. Advances in Intelligent Systems and Computing, vol 1086. Springer, Singapore. https://doi.org/10.1007/978-981-15-1275-9_32

Download citation

Publish with us

Policies and ethics