Arabic broadcast news transcription system

Alghamdi, Mansour; Elshafei, Moustafa; Al-Muhtaseb, Husni

doi:10.1007/s10772-009-9026-8

Arabic broadcast news transcription system

Published: 01 April 2009

Volume 10, pages 183–195, (2007)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Mansour Alghamdi¹,
Moustafa Elshafei^1,2 &
Husni Al-Muhtaseb^1,2

261 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

This paper describes the development of an Arabic broadcast news transcription system. The presented system is a speaker-independent large vocabulary natural Arabic speech recognition system, and it is intended to be a test bed for further research into the open ended problem of achieving natural language man-machine conversation. The system addresses a number of challenging issues pertaining to the Arabic language, e.g. generation of fully vocalized transcription, and rule-based spelling dictionary. The developed Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. The Cambridge HTK tools were also utilized at various testing stages.

The system was trained on 7.0 hours of a 7.5 hours of Arabic broadcast news corpus and tested on the remaining half an hour. The corpus was made to focus on economics and sport news. At this experimental stage, the Arabic news transcription system uses five-state HMM for triphone acoustic models, with 8 and 16 Gaussian mixture distributions. The state distributions were tied to about 1680 senons. The language model uses both bi-grams and tri-grams. The test set consisted of 400 utterances containing 3585 words. The Word Error Rate (WER) came initially to 10.14 percent. After extensive testing and tuning of the recognition parameters the WER was reduced to about 8.61% for non-vocalized text transcription.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alghamdi, M. (2000). Arabic phonetics. Riyadh: Attaoobah.
Google Scholar
Algamdi, M. (2003). KACST Arabic phonetics database. In The fifteenth international congress of phonetics science (pp. 3109–3112). Barcelona.
Alghamdi, M., Elshafei, M., & Almuhtasib, H. (2002). Speech units for Arabic text-to-speech. In The fourth workshop on computer and information sciences (pp. 199–212).
Ali, M., Elshafei, M., Alghamdi, M., Al-Muhtaseb, H., & Al-Najjar, A. (2008). Generation of Arabic phonetic dictionaries for speech recognition. In The 5th international conference on innovations in information technology, United Arab Emirates, December 2008.
Alimi, A. M., & Ben Jemaa, M. (2002). Beta fuzzy neural network application in recognition of spoken isolated Arabic words. International Journal of Control and Intelligent Systems, Special Issue on Speech Processing Techniques and Applications, 30(2).
Alotaibi, Y. A. (2004). Spoken Arabic digits recognizer using recurrent neural networks. In Proceedings of the fourth IEEE international symposium on signal processing and information technology (pp. 195–199), 18–21 Dec. 2004.
Al-Otaibi, F. A. H. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. Thesis, King Saud University.
Bahi, H., & Sellami, M. (2003). A hybrid approach for Arabic speech recognition. In ACS/IEEE international conference on computer systems and applications, 14–18 July 2003.
Baker, J. K. (1975). Stochastic modeling for automatic speech understanding. In R. Reddy (Ed.), Speech recognition (pp. 521–542). New York: Academic Press.
Google Scholar
Bellagarda, J., & Nahamoo, D. (1988). Tied-mixture continuous parameter models for large vocabulary isolated speech recognition. In Proc. IEEE international conference on acoustics, speech, and signal processing.
Billa, J., Noamany, M., Srivastava, A., Liu, D., Stone, R., Xu, J., Makhoul, J., & Kubala, F. (2002). Audio indexing of Arabic broadcast news. In Proceedings (ICASSP ’02). IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-5–I-8).
Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece, Sept. 1997.
Digalakis, V., Monaco, P., & Murveit, H. (1996). Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers. IEEE Transactions on Speech and Audio Processing, 4(4), 281–289.
Article Google Scholar
El Choubassi, M. M., El Khoury, H. E., Alagha, C. E. J., Skaf, J. A., & Al-Alaoui, M. A. (2003). Arabic speech recognition using recurrent neural networks. In Proceedings of the 3rd IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 543–547), Dec. 2003.
El-Ramly, S. H., Abdel-Kader, N. S., & El-Adawi, R. (2002). Neural networks used for speech recognition. In Radio science conference (NRSC 2002). Proceedings of the nineteenth national (pp. 200–207), March 2002.
Elshafei-Ahmed, M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal of Science and Engineering, 16(4B), 565–583.
Google Scholar
Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2002). Techniques for high quality text-to-speech. Information Science, 140(3–4), 255–267.
Article MATH Google Scholar
Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006a). Statistical methods for automatic diacritization of Arabic text. In Proceedings 18th national computer conference NCC’18, Riyadh, March 26–29, 2006.
Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006b). Machine generation of Arabic diacritical marks. In Proceedings of the 2006 international conference on machine learning; models, technologies, and applications (MLMTA’06), June 2006, USA.
Garofolo, J., Voorhees, E., Auzanne, C., Stanford, V., & Lund, B. (1997). Design and preparation of the 1996 HUB-4 broadcast news benchmark test corpora. In Proceedings of the DARPA speech recognition workshop (pp. 15–21). Chantilly: Morgan Kaufmann.
Google Scholar
Hagen, S. (2007). The IBM 2006 GALE Arabic ASR system. In ICASSP, 2007.
Huang, X., Alleva, F., Hon, H. W., Hwang, M. Y., & Rosenfeld, R. (1993). The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2), 137–148.
Article Google Scholar
Huang, X., Acero, A., & Hon, H. (2001). Spoken language processing. Englewood Cliffs: Prentice-Hall.
Google Scholar
Hwang, M. Y., & Huang, X. (1993). Shared-distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.
Article Google Scholar
Hwang, M. Y., Huang, X. D., & Alleva, F. (1993). Predicting unseen triphones with senones. In Proc. IEEE international conference on acoustics, speech, and signal processing.
Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–555.
Article Google Scholar
Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press.
Google Scholar
Kirchhofl, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schoner, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to Arabic speech recognition: report from the 2002 John-Hopkins summer workshop. In ICASSP 2003 (pp. I-344–I-347).
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In Proceedings of the 8th European conference on speech communication and technology (pp. 1181–1184), Geneve, Switzerland, Sept. 2003.
Lee, K. F. (1988). Large vocabulary speaker-independent continuous speech recognition: the SPHINX system. PhD Thesis, Carnegie Mellon University.
Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(1), 35–45.
Article Google Scholar
Ortmanns, S., Eiden, A., & Ney, H. (1998). Improved lexical tree search for large vocabulary speech recognition. In Proc. IEEE int. conf. on acoustics, speech and signal proc.
Placeway, P., Chen, S., Eskenazi, M., Jain, U., Parikh, V., Raj, B., Ravishankar, M., Rosenfeld, R., Seymore, K., Siegler, M., Stern, R., & Thayer, E. (1997). The 1996 HUB-4 Sphinx-3 system. In Proceedings of the DARPA speech recognition workshop. Chantilly: DARPA, Feb. 1997. http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf.
Price, P., Fisher, W. M., Bernstein, J., & Pallett, D. S. (1988). The DARPA 1000-word resource management database for continuous speech recognition. In Proceedings of the international conference on acoustics, speech and signal processing (Vol. 1, pp. 651–654). New York: IEEE.
Google Scholar
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2).
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Ravishankar, M. K. (1996). Efficient algorithms for speech recognition. PhD Thesis (CMU Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh, PA.
Siegler, M., Jain, U., Raj, B., & Stern, R. M. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proc. DARPA speech recognition workshop, Feb. 1997.
Singh, R., Raj, B., & Stern, R. M. (1999). Automatic clustering and generation of contextual questions for tied states in hidden Markov models. In Proc. IEEE int. conf. on acoustics, speech and signal proc.
Sphinx-4 trainer design (2003). http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign.
Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/FINFENG/, TR152). Cambridge University Engineering Department, UK, Sept. 1994.
Young, S. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Processing Magazine, 45–57.
Young, S. J., Kershaw, D., Odell, J. J., Ollason, D., Valtchev, V., & Woodland, P. C. (1999). The HTK book. Entropic.

Download references

Author information

Authors and Affiliations

King Abdulaziz City of Science and Technology, Riyadh, Saudi Arabia
Mansour Alghamdi, Moustafa Elshafei & Husni Al-Muhtaseb
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Moustafa Elshafei & Husni Al-Muhtaseb

Authors

Mansour Alghamdi
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Elshafei
View author publications
You can also search for this author in PubMed Google Scholar
Husni Al-Muhtaseb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansour Alghamdi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alghamdi, M., Elshafei, M. & Al-Muhtaseb, H. Arabic broadcast news transcription system. Int J Speech Technol 10, 183–195 (2007). https://doi.org/10.1007/s10772-009-9026-8

Download citation

Received: 31 January 2009
Accepted: 11 March 2009
Published: 01 April 2009
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10772-009-9026-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arabic broadcast news transcription system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Conventional and contemporary approaches used in text to speech synthesis: a review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arabic broadcast news transcription system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Conventional and contemporary approaches used in text to speech synthesis: a review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation