Skip to main content
Log in

Discriminative Training for Phonetic Recognition of the Holy Quran

  • Research Article - Electrical Engineering
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

This paper presents the development of Holy Quran recitation recognizer. The decoder of recognizer performs sub-word level recognition at phoneme. The paper demonstrates high recognition accuracies achieved by applying incremental refinements to the HMM models of the phonemes during the training stage. The Maximum- likelihood (ML) criterion is first applied for HMMs parameter estimation, which produces average recognition accuracies of up to 83 %. This is followed by discriminative technique of minimum phone error (MPE), which is applied to minimize recognition error at phoneme level. Investigation shows that MPE based acoustic models improve generalization. The results show 3–4 % improvement in recognition accuracies, which are promising when compared with the case of ML approach applied alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Huang X., Acero A., Hon H.-W.: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

  2. Lee K.-F., Hon H.-W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)

    Article  Google Scholar 

  3. AbuZeina D., Al-Khatib W., Elshafei M., Al-Muhtaseb H.: Cross-word Arabic pronunciation variation modeling for speech recognition. Int. J. Speech Technol. 14, 227–236 (2011)

    Article  Google Scholar 

  4. He X., Deng L.: Discriminative learning for speech recognition: theory and practice. Synth. Lect. Speech Audio Process. 4, 1–112 (2008)

    Article  Google Scholar 

  5. Selouani S.A., Boudraa M.: Algerian Arabic speech database (ALGASD): corpus design and automatic speech recognition application. Arab. J. Sci. Eng. 35, 158 (2010)

    Google Scholar 

  6. Kirchhoff, K.; Bilmes, J.; Das, S.; Duta, N.; Egan, M.; Ji, G.; He, F.; Henderson, J.; Liu, D.; Noamany, M.: Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP’03), vol. 1, pp. I-344–I-347 (2003)

  7. Tabbal, H.; El Falou, W.; Monla, B.: Analysis and Implementation of a “Quranic” verses Delimitation System in Audio Files Using Speech Recognition Techniques. In: Information and Communication Technologies, 2006. ICTTA’06. 2nd, pp. 2979–2984 (2006)

  8. Abdou, S.M.; Hamid, S.E.; Rashwan, M.; Samir, A.; Abdel-Hamid, O.; Shahin, M.; Nazih, W.: Computer Aided Pronunciation Learning System Using Speech Recognition Techniques. In: INTERSPEECH (2006)

  9. Elhadj, Y.O.M.; Alghamdi, M.; Alkanhal, M.: Phoneme-based recognizer to assist reading the Holy Quran. In: Thampi, S.M., et al. (eds.) Recent Advances in Intelligent Informatics. Springer, Switzerland, pp. 141–152 (2014). doi:10.1007/978-3-319-01778-5_15

  10. Sara S.I.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM publishers, Munich (2009)

    Google Scholar 

  11. Alghamdi M.M., Ajami Alotaibi Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)

    Google Scholar 

  12. Alotaibi Y.A., Muhammad G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)

    Article  Google Scholar 

  13. Newman D.: The phonetic status of Arabic within the world’s languages: the uniqueness of the lughat al-daad. Antwerp Pap. Linguist. 100, 65–75 (2002)

    Google Scholar 

  14. Alkhouli, M.: “Alaswaat Alaghawaiyah,” Daar Alfalah, Jordan (1990)

  15. El-Imam Y.A.: An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoust. Speech Signal Process. 37, 1829–1845 (1989)

    Article  Google Scholar 

  16. Ahmed, M.E.: Toward an Arabic text-to-speech system. Arab. J. Sci. Eng. 16, 565–583 (1991)

  17. Hamid, S.: Computer aided pronunciation learning system using statistical based automatic speech recognition. Ph.D, Electronics and Communication Engineering, Cairo University (2005)

  18. al-Hashmi, S.A.: The Phonology of Nasal n in the language of Holy Quran. Thesis of Masters of Art, Department of Linguistics, University of Victoria (2004)

  19. Harrag A., Mohamadi T.: QSDAS: new quranic speech database for arabic speaker recognition. Arab. J. Sci. Eng. 35, 7 (2010)

    Google Scholar 

  20. Jelinek F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)

    Google Scholar 

  21. Rabiner L.R., Schafer R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)

    Google Scholar 

  22. Gold B., Morgan N., Ellis D.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, Hoboken (2011)

    Book  Google Scholar 

  23. Davis S., Mermelstein P.: Of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)

    Article  Google Scholar 

  24. Alsulaiman, M.; Muhammad, G.; Ali, Z.: Comparison of voice features for Arabic speech recognition. In: 2011 Sixth International Conference on Digital Information Management (ICDIM), pp. 90–95 (2011)

  25. Ahmed N., Natarajan T., Rao K.R.: Discrete cosine transform. Comput. IEEE Trans. 100, 90–93 (1974)

    Article  MathSciNet  Google Scholar 

  26. Furui S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Signal Process. 34, 52–59 (1986)

    Article  Google Scholar 

  27. Fujimura O.: Syllable as a unit of speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23, 82–87 (1975)

    Article  Google Scholar 

  28. Schwartz, R.; Chow, Y.; Kimball, O.; Roucos, S.; Krasner, M.; Makhoul, J.: Context-dependent modeling for acoustic–phonetic recognition of continuous speech. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’85, pp. 1205–1208 (1985)

  29. Rabiner L., Juang B.H., Levinson S., Sondhi M.: Recognition of isolated digits using hidden Markov models with continuous mixture densities. AT&T Tech. J. 64, 1211–1234 (1985)

    Article  MathSciNet  Google Scholar 

  30. Rabiner L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)

    Article  MATH  Google Scholar 

  31. Rabiner L., Juang B.-H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3, 4–16 (1986)

    Article  Google Scholar 

  32. Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)

  33. Rabiner LR, Juang B.-H.: Fundamentals of speech recognition, vol. 14. PTR Prentice Hall, Englewood Cliffs (1993)

    Google Scholar 

  34. Chow, Y.-L.; Schwartz, R.: “The n-best algorithm: An efficient procedure for finding top n sentence hypotheses.” In: Proceedings of the Workshop on Speech and Natural Language, pp. 199–202 (1989)

  35. Young, S.J.; Russell, N.; Thornton, J.: Token passing: a simple conceptual model for connected speech recognition systems: Citeseer (1989)

  36. Nadas A.: A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans. Acoust. Speech Signal Process. 31, 814–817 (1983)

    Article  Google Scholar 

  37. Bahl, L.B.; de Souza, P.; Mercer, R.P: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’86, vol. 11, pp. 49–52 (1986)

  38. Juang B.-H., Katagiri S.: Discriminative learning for minimum error classification [rpattern recognition]. IEEE Trans. Signal Process. 40, 3043–3054 (1992)

    Article  MATH  Google Scholar 

  39. Robinson A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5, 298–305 (1994)

    Article  Google Scholar 

  40. Povey, D.; Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. I-105–I-108 (2002)

  41. Bahl, L.; Brown, P.; de Souza, P.V.; Mercer, R.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86, pp. 49–52 (1986)

  42. Woodland P.C., Povey D.: Large scale discriminative training of hidden Markov models for speech recognition. Comput. Speech Lang. 16, 25–47 (2002)

    Article  Google Scholar 

  43. Steve, Y.; Gunnar, E.; MARK, A.; Thomas, H.; Dan, K.; Xunying, A.L.; Gareth, M.; Julian, O.; Dave, O.; Dan, P.: The HTK book (for HTK Version 3.4) (2009)

  44. Povey D.: Discriminative Training for Large Vocabulary Speech Recognition. University of Cambridge, Cambridge (2003)

    Google Scholar 

  45. Nahamoo, D.: An inequality for rational functions with applications to some statistical estimation problems (1991)

  46. The Holy Quran

  47. Elhadj, Y.O.M.; Alghamdi, M.; Alkanhal, M.: Approach for Recognizing Allophonic Sounds of the Classical Arabic Based on Quran Recitations. In: Theory and Practice of Natural Computing. Springer, pp. 57–67 (2013)

  48. Xiao X., Li J., Chng E.S., Li H., Lee C.-H.: A study on the generalization capability of acoustic models for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 18, 1158–1169 (2010)

    Article  Google Scholar 

  49. Li J., Yuan M., Lee C.-H.: Approximate test risk bound minimization through soft margin estimation. IEEE Trans. Audio Speech Lang. Process. 15, 2393–2404 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirza Muhammad Ali Baig.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baig, M.M.A., Qazi, S.A. & Kadri, M.B. Discriminative Training for Phonetic Recognition of the Holy Quran. Arab J Sci Eng 40, 2629–2640 (2015). https://doi.org/10.1007/s13369-015-1693-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-015-1693-y

Keywords

Navigation