Skip to main content
Log in

Arabic broadcast news transcription system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper describes the development of an Arabic broadcast news transcription system. The presented system is a speaker-independent large vocabulary natural Arabic speech recognition system, and it is intended to be a test bed for further research into the open ended problem of achieving natural language man-machine conversation. The system addresses a number of challenging issues pertaining to the Arabic language, e.g. generation of fully vocalized transcription, and rule-based spelling dictionary. The developed Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. The Cambridge HTK tools were also utilized at various testing stages.

The system was trained on 7.0 hours of a 7.5 hours of Arabic broadcast news corpus and tested on the remaining half an hour. The corpus was made to focus on economics and sport news. At this experimental stage, the Arabic news transcription system uses five-state HMM for triphone acoustic models, with 8 and 16 Gaussian mixture distributions. The state distributions were tied to about 1680 senons. The language model uses both bi-grams and tri-grams. The test set consisted of 400 utterances containing 3585 words. The Word Error Rate (WER) came initially to 10.14 percent. After extensive testing and tuning of the recognition parameters the WER was reduced to about 8.61% for non-vocalized text transcription.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alghamdi, M. (2000). Arabic phonetics. Riyadh: Attaoobah.

    Google Scholar 

  • Algamdi, M. (2003). KACST Arabic phonetics database. In The fifteenth international congress of phonetics science (pp. 3109–3112). Barcelona.

  • Alghamdi, M., Elshafei, M., & Almuhtasib, H. (2002). Speech units for Arabic text-to-speech. In The fourth workshop on computer and information sciences (pp. 199–212).

  • Ali, M., Elshafei, M., Alghamdi, M., Al-Muhtaseb, H., & Al-Najjar, A. (2008). Generation of Arabic phonetic dictionaries for speech recognition. In The 5th international conference on innovations in information technology, United Arab Emirates, December 2008.

  • Alimi, A. M., & Ben Jemaa, M. (2002). Beta fuzzy neural network application in recognition of spoken isolated Arabic words. International Journal of Control and Intelligent Systems, Special Issue on Speech Processing Techniques and Applications, 30(2).

  • Alotaibi, Y. A. (2004). Spoken Arabic digits recognizer using recurrent neural networks. In Proceedings of the fourth IEEE international symposium on signal processing and information technology (pp. 195–199), 18–21 Dec. 2004.

  • Al-Otaibi, F. A. H. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. Thesis, King Saud University.

  • Bahi, H., & Sellami, M. (2003). A hybrid approach for Arabic speech recognition. In ACS/IEEE international conference on computer systems and applications, 14–18 July 2003.

  • Baker, J. K. (1975). Stochastic modeling for automatic speech understanding. In R. Reddy (Ed.), Speech recognition (pp. 521–542). New York: Academic Press.

    Google Scholar 

  • Bellagarda, J., & Nahamoo, D. (1988). Tied-mixture continuous parameter models for large vocabulary isolated speech recognition. In Proc. IEEE international conference on acoustics, speech, and signal processing.

  • Billa, J., Noamany, M., Srivastava, A., Liu, D., Stone, R., Xu, J., Makhoul, J., & Kubala, F. (2002). Audio indexing of Arabic broadcast news. In Proceedings (ICASSP ’02). IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. I-5–I-8).

  • Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece, Sept. 1997.

  • Digalakis, V., Monaco, P., & Murveit, H. (1996). Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers. IEEE Transactions on Speech and Audio Processing, 4(4), 281–289.

    Article  Google Scholar 

  • El Choubassi, M. M., El Khoury, H. E., Alagha, C. E. J., Skaf, J. A., & Al-Alaoui, M. A. (2003). Arabic speech recognition using recurrent neural networks. In Proceedings of the 3rd IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 543–547), Dec. 2003.

  • El-Ramly, S. H., Abdel-Kader, N. S., & El-Adawi, R. (2002). Neural networks used for speech recognition. In Radio science conference (NRSC 2002). Proceedings of the nineteenth national (pp. 200–207), March 2002.

  • Elshafei-Ahmed, M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal of Science and Engineering, 16(4B), 565–583.

    Google Scholar 

  • Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2002). Techniques for high quality text-to-speech. Information Science, 140(3–4), 255–267.

    Article  MATH  Google Scholar 

  • Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006a). Statistical methods for automatic diacritization of Arabic text. In Proceedings 18th national computer conference NCC’18, Riyadh, March 26–29, 2006.

  • Elshafei, M., Al-Muhtaseb, H., & Alghamdi, M. (2006b). Machine generation of Arabic diacritical marks. In Proceedings of the 2006 international conference on machine learning; models, technologies, and applications (MLMTA’06), June 2006, USA.

  • Garofolo, J., Voorhees, E., Auzanne, C., Stanford, V., & Lund, B. (1997). Design and preparation of the 1996 HUB-4 broadcast news benchmark test corpora. In Proceedings of the DARPA speech recognition workshop (pp. 15–21). Chantilly: Morgan Kaufmann.

    Google Scholar 

  • Hagen, S. (2007). The IBM 2006 GALE Arabic ASR system. In ICASSP, 2007.

  • Huang, X., Alleva, F., Hon, H. W., Hwang, M. Y., & Rosenfeld, R. (1993). The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2), 137–148.

    Article  Google Scholar 

  • Huang, X., Acero, A., & Hon, H. (2001). Spoken language processing. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Hwang, M. Y., & Huang, X. (1993). Shared-distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.

    Article  Google Scholar 

  • Hwang, M. Y., Huang, X. D., & Alleva, F. (1993). Predicting unseen triphones with senones. In Proc. IEEE international conference on acoustics, speech, and signal processing.

  • Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–555.

    Article  Google Scholar 

  • Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press.

    Google Scholar 

  • Kirchhofl, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schoner, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to Arabic speech recognition: report from the 2002 John-Hopkins summer workshop. In ICASSP 2003 (pp. I-344–I-347).

  • Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In Proceedings of the 8th European conference on speech communication and technology (pp. 1181–1184), Geneve, Switzerland, Sept. 2003.

  • Lee, K. F. (1988). Large vocabulary speaker-independent continuous speech recognition: the SPHINX system. PhD Thesis, Carnegie Mellon University.

  • Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(1), 35–45.

    Article  Google Scholar 

  • Ortmanns, S., Eiden, A., & Ney, H. (1998). Improved lexical tree search for large vocabulary speech recognition. In Proc. IEEE int. conf. on acoustics, speech and signal proc.

  • Placeway, P., Chen, S., Eskenazi, M., Jain, U., Parikh, V., Raj, B., Ravishankar, M., Rosenfeld, R., Seymore, K., Siegler, M., Stern, R., & Thayer, E. (1997). The 1996 HUB-4 Sphinx-3 system. In Proceedings of the DARPA speech recognition workshop. Chantilly: DARPA, Feb. 1997. http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf.

  • Price, P., Fisher, W. M., Bernstein, J., & Pallett, D. S. (1988). The DARPA 1000-word resource management database for continuous speech recognition. In Proceedings of the international conference on acoustics, speech and signal processing (Vol. 1, pp. 651–654). New York: IEEE.

    Google Scholar 

  • Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2).

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Ravishankar, M. K. (1996). Efficient algorithms for speech recognition. PhD Thesis (CMU Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh, PA.

  • Siegler, M., Jain, U., Raj, B., & Stern, R. M. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proc. DARPA speech recognition workshop, Feb. 1997.

  • Singh, R., Raj, B., & Stern, R. M. (1999). Automatic clustering and generation of contextual questions for tied states in hidden Markov models. In Proc. IEEE int. conf. on acoustics, speech and signal proc.

  • Sphinx-4 trainer design (2003). http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign.

  • Young, S. (1994). The HTK hidden Markov model toolkit: design and philosophy (Tech. Rep. CUED/FINFENG/, TR152). Cambridge University Engineering Department, UK, Sept. 1994.

  • Young, S. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Processing Magazine, 45–57.

  • Young, S. J., Kershaw, D., Odell, J. J., Ollason, D., Valtchev, V., & Woodland, P. C. (1999). The HTK book. Entropic.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mansour Alghamdi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alghamdi, M., Elshafei, M. & Al-Muhtaseb, H. Arabic broadcast news transcription system. Int J Speech Technol 10, 183–195 (2007). https://doi.org/10.1007/s10772-009-9026-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9026-8

Keywords

Navigation