Discriminative Training for Phonetic Recognition of the Holy Quran

Baig, Mirza Muhammad Ali; Qazi, Saad Ahmed; Kadri, Muhammad Bilal

doi:10.1007/s13369-015-1693-y

Discriminative Training for Phonetic Recognition of the Holy Quran

Research Article - Electrical Engineering
Published: 11 June 2015

Volume 40, pages 2629–2640, (2015)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Mirza Muhammad Ali Baig¹,
Saad Ahmed Qazi¹ &
Muhammad Bilal Kadri²

236 Accesses
9 Citations
Explore all metrics

Abstract

This paper presents the development of Holy Quran recitation recognizer. The decoder of recognizer performs sub-word level recognition at phoneme. The paper demonstrates high recognition accuracies achieved by applying incremental refinements to the HMM models of the phonemes during the training stage. The Maximum- likelihood (ML) criterion is first applied for HMMs parameter estimation, which produces average recognition accuracies of up to 83 %. This is followed by discriminative technique of minimum phone error (MPE), which is applied to minimize recognition error at phoneme level. Investigation shows that MPE based acoustic models improve generalization. The results show 3–4 % improvement in recognition accuracies, which are promising when compared with the case of ML approach applied alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Huang X., Acero A., Hon H.-W.: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Lee K.-F., Hon H.-W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)
Article Google Scholar
AbuZeina D., Al-Khatib W., Elshafei M., Al-Muhtaseb H.: Cross-word Arabic pronunciation variation modeling for speech recognition. Int. J. Speech Technol. 14, 227–236 (2011)
Article Google Scholar
He X., Deng L.: Discriminative learning for speech recognition: theory and practice. Synth. Lect. Speech Audio Process. 4, 1–112 (2008)
Article Google Scholar
Selouani S.A., Boudraa M.: Algerian Arabic speech database (ALGASD): corpus design and automatic speech recognition application. Arab. J. Sci. Eng. 35, 158 (2010)
Google Scholar
Kirchhoff, K.; Bilmes, J.; Das, S.; Duta, N.; Egan, M.; Ji, G.; He, F.; Henderson, J.; Liu, D.; Noamany, M.: Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP’03), vol. 1, pp. I-344–I-347 (2003)
Tabbal, H.; El Falou, W.; Monla, B.: Analysis and Implementation of a “Quranic” verses Delimitation System in Audio Files Using Speech Recognition Techniques. In: Information and Communication Technologies, 2006. ICTTA’06. 2nd, pp. 2979–2984 (2006)
Abdou, S.M.; Hamid, S.E.; Rashwan, M.; Samir, A.; Abdel-Hamid, O.; Shahin, M.; Nazih, W.: Computer Aided Pronunciation Learning System Using Speech Recognition Techniques. In: INTERSPEECH (2006)
Elhadj, Y.O.M.; Alghamdi, M.; Alkanhal, M.: Phoneme-based recognizer to assist reading the Holy Quran. In: Thampi, S.M., et al. (eds.) Recent Advances in Intelligent Informatics. Springer, Switzerland, pp. 141–152 (2014). doi:10.1007/978-3-319-01778-5_15
Sara S.I.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM publishers, Munich (2009)
Google Scholar
Alghamdi M.M., Ajami Alotaibi Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)
Google Scholar
Alotaibi Y.A., Muhammad G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)
Article Google Scholar
Newman D.: The phonetic status of Arabic within the world’s languages: the uniqueness of the lughat al-daad. Antwerp Pap. Linguist. 100, 65–75 (2002)
Google Scholar
Alkhouli, M.: “Alaswaat Alaghawaiyah,” Daar Alfalah, Jordan (1990)
El-Imam Y.A.: An unrestricted vocabulary Arabic speech synthesis system. IEEE Trans. Acoust. Speech Signal Process. 37, 1829–1845 (1989)
Article Google Scholar
Ahmed, M.E.: Toward an Arabic text-to-speech system. Arab. J. Sci. Eng. 16, 565–583 (1991)
Hamid, S.: Computer aided pronunciation learning system using statistical based automatic speech recognition. Ph.D, Electronics and Communication Engineering, Cairo University (2005)
al-Hashmi, S.A.: The Phonology of Nasal n in the language of Holy Quran. Thesis of Masters of Art, Department of Linguistics, University of Victoria (2004)
Harrag A., Mohamadi T.: QSDAS: new quranic speech database for arabic speaker recognition. Arab. J. Sci. Eng. 35, 7 (2010)
Google Scholar
Jelinek F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Google Scholar
Rabiner L.R., Schafer R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)
Google Scholar
Gold B., Morgan N., Ellis D.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, Hoboken (2011)
Book Google Scholar
Davis S., Mermelstein P.: Of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)
Article Google Scholar
Alsulaiman, M.; Muhammad, G.; Ali, Z.: Comparison of voice features for Arabic speech recognition. In: 2011 Sixth International Conference on Digital Information Management (ICDIM), pp. 90–95 (2011)
Ahmed N., Natarajan T., Rao K.R.: Discrete cosine transform. Comput. IEEE Trans. 100, 90–93 (1974)
Article MathSciNet Google Scholar
Furui S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Signal Process. 34, 52–59 (1986)
Article Google Scholar
Fujimura O.: Syllable as a unit of speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23, 82–87 (1975)
Article Google Scholar
Schwartz, R.; Chow, Y.; Kimball, O.; Roucos, S.; Krasner, M.; Makhoul, J.: Context-dependent modeling for acoustic–phonetic recognition of continuous speech. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’85, pp. 1205–1208 (1985)
Rabiner L., Juang B.H., Levinson S., Sondhi M.: Recognition of isolated digits using hidden Markov models with continuous mixture densities. AT&T Tech. J. 64, 1211–1234 (1985)
Article MathSciNet Google Scholar
Rabiner L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
Article MATH Google Scholar
Rabiner L., Juang B.-H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3, 4–16 (1986)
Article Google Scholar
Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
Rabiner LR, Juang B.-H.: Fundamentals of speech recognition, vol. 14. PTR Prentice Hall, Englewood Cliffs (1993)
Google Scholar
Chow, Y.-L.; Schwartz, R.: “The n-best algorithm: An efficient procedure for finding top n sentence hypotheses.” In: Proceedings of the Workshop on Speech and Natural Language, pp. 199–202 (1989)
Young, S.J.; Russell, N.; Thornton, J.: Token passing: a simple conceptual model for connected speech recognition systems: Citeseer (1989)
Nadas A.: A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans. Acoust. Speech Signal Process. 31, 814–817 (1983)
Article Google Scholar
Bahl, L.B.; de Souza, P.; Mercer, R.P: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’86, vol. 11, pp. 49–52 (1986)
Juang B.-H., Katagiri S.: Discriminative learning for minimum error classification [rpattern recognition]. IEEE Trans. Signal Process. 40, 3043–3054 (1992)
Article MATH Google Scholar
Robinson A.J.: An application of recurrent nets to phone probability estimation. IEEE Trans. Neural Netw. 5, 298–305 (1994)
Article Google Scholar
Povey, D.; Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. I-105–I-108 (2002)
Bahl, L.; Brown, P.; de Souza, P.V.; Mercer, R.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86, pp. 49–52 (1986)
Woodland P.C., Povey D.: Large scale discriminative training of hidden Markov models for speech recognition. Comput. Speech Lang. 16, 25–47 (2002)
Article Google Scholar
Steve, Y.; Gunnar, E.; MARK, A.; Thomas, H.; Dan, K.; Xunying, A.L.; Gareth, M.; Julian, O.; Dave, O.; Dan, P.: The HTK book (for HTK Version 3.4) (2009)
Povey D.: Discriminative Training for Large Vocabulary Speech Recognition. University of Cambridge, Cambridge (2003)
Google Scholar
Nahamoo, D.: An inequality for rational functions with applications to some statistical estimation problems (1991)
The Holy Quran
Elhadj, Y.O.M.; Alghamdi, M.; Alkanhal, M.: Approach for Recognizing Allophonic Sounds of the Classical Arabic Based on Quran Recitations. In: Theory and Practice of Natural Computing. Springer, pp. 57–67 (2013)
Xiao X., Li J., Chng E.S., Li H., Lee C.-H.: A study on the generalization capability of acoustic models for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 18, 1158–1169 (2010)
Article Google Scholar
Li J., Yuan M., Lee C.-H.: Approximate test risk bound minimization through soft margin estimation. IEEE Trans. Audio Speech Lang. Process. 15, 2393–2404 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

NED University of Engineering and Technology, Karachi, Pakistan
Mirza Muhammad Ali Baig & Saad Ahmed Qazi
College of Engineering, PAF Karachi Institute of Economics and Technology (PAF-KIET), Karachi, 75190, Pakistan
Muhammad Bilal Kadri

Authors

Mirza Muhammad Ali Baig
View author publications
You can also search for this author in PubMed Google Scholar
Saad Ahmed Qazi
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Bilal Kadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirza Muhammad Ali Baig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baig, M.M.A., Qazi, S.A. & Kadri, M.B. Discriminative Training for Phonetic Recognition of the Holy Quran. Arab J Sci Eng 40, 2629–2640 (2015). https://doi.org/10.1007/s13369-015-1693-y

Download citation

Received: 25 July 2014
Accepted: 17 May 2015
Published: 11 June 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s13369-015-1693-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Training for Phonetic Recognition of the Holy Quran

Abstract

Access this article

Similar content being viewed by others

Phoneme-Based Recognizer to Assist Reading the Holy Quran

AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi Language

Phoneme Set and Pronouncing Dictionary Creation for Large Vocabulary Continuous Speech Recognition of Vietnamese

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative Training for Phonetic Recognition of the Holy Quran

Abstract

Access this article

Similar content being viewed by others

Phoneme-Based Recognizer to Assist Reading the Holy Quran

AutoSSR: an efficient approach for automatic spontaneous speech recognition model for the Punjabi Language

Phoneme Set and Pronouncing Dictionary Creation for Large Vocabulary Continuous Speech Recognition of Vietnamese

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation