Collecting Data for Automatic Speech Recognition Systems in Dialectal Arabic Using Games with a Purpose

El-Sakhawy, Dayna; Abdennadher, Slim; Hamed, Injy

doi:10.1007/978-3-319-15557-9_10

Dayna El-Sakhawy⁸,
Slim Abdennadher⁸ &
Injy Hamed⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8757))

Included in the following conference series:

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

855 Accesses

Abstract

Building Automatic Speech Recognition (ASR) systems for spoken languages usually suffer from the problem of limited available transcriptions. Automatic Speech Recognition (ASR) systems require large speech corpora that contain speech and their corresponding transcriptions for training acoustic models. In this paper, we target the Egyptian dialectal Arabic. As other spoken languages, it is mainly used for spoken rather than writing purposes. Transcriptions are usually collected manually by experts. However, this proved to be a time-consuming and expensive process. In this paper, we introduce Games With a Purpose as a cheap and fast approach to gather transcriptions for Egyptian dialectal Arabic. Furthermore, Arabic orthographic transcriptions lack diacritizations, which leads to ambiguity. On the other hand, transcriptions written in Arabic Chat Alphabet are widely used, and include the pronunciation effects given by diacritics. In this work, we present the game

(pronouced as makhamekho) that aims at collecting transcriptions in Arabic orthography, as well as in Arabic Chat Alphabet. It also gathers mappings of words from Arabic orthography to Arabic Chat Alphabet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.mturk.com.

References

Suendermann, D., Liscombe, J., Pieraccini, R.: How to drink from a fire hose: one person can annoscribe 693 thousand utterances in one month (2010)
Google Scholar
Novotney, S., Callison-Burch, C.: Cheap, automatic speech recognition with non-expert transcription, fast and good enough (2009)
Google Scholar
Delendik, Y.: What is Automatic Speech Recognition? June 2009
Google Scholar
Furui, S.: Automatic speech recognition and its application to information extraction (2001)
Google Scholar
Verguria, D., Kirchhoff, K.: Automatic diacritization of arabic for acoustic modeling in speech recognition (2014)
Google Scholar
Macmillan, P.: Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt. Palgrave Macmillan, New York (2003)
Google Scholar
von Ahn, L., Dabbish, L.: Designing games with a purpose (2008). https://www.cs.cmu.edu/~biglou/GWAP_CACM.pdf
Parent, G., Eskenazi, M.: Toward better crowdsourced transcription: transcription of a year of the let’s go bus information system data (2010)
Google Scholar
Akasaka, R.: Foreign accented speech transcription and accent recognition using a game-based approach (2009)
Google Scholar
Marge, M.R., Satanjeev, B., Rudnicky, A.I.: Using the amazon mechanical turk to transcribe and annotate meeting speech for extractive summarization (2010)
Google Scholar
Marge, M.R., Satanjeev, B., Rudnicky, A.I.: Using the amazon mechanical turk for transcription of spoken language (2010)
Google Scholar
Evanini, K., Higgins, D., Zechner, K.: Using amazon mechanical turk for transcription of non-native speech (2010)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of EMNLP, vol. 1, pp. 254–263 (2008)
Google Scholar
Denkowski, M., Al-Haj, H., Lavie, A.: Turker-assisted paraphrasing for English-Arabic machine translation. In: Proceedings of NAACL-HLT, pp. 66–70 (2010)
Google Scholar
Ambati, V., Vogel, S.: Can crowds build parallel corpora for machine translation systems? In: Proceedings of NAACL-HLT, pp. 62–65 (2010)
Google Scholar
Elmahdy, M., Gruhn, R., Abdennadher, S., Minker, W.: Rapid phonetic transcription using everyday life natural Chat Alphabet orthography for dialectal Arabic speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4936, 4939, 22–27 May 2011
Google Scholar
McGonigal, J.: Reality is Broken: Why Games Make Us Better and How they Can Change the World. Penguin Press, New York (2011)
Google Scholar
Wieser, C., et al.: ARTigo: Building an artwork search engine with games and higher-order latent semantic analysis. In: First AAAI Conference on Human Computation and Crowdsourcing (2013)
Google Scholar
Law, L.M.: TagATune: A game for music and sound annotation. In: ISMIR, vol. 3 (2007)
Google Scholar
Parent, G., Eskenazi, M.: Speaking to the crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: INTERSPEECH (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Media Engineering and Technology Faculty, German University in Cairo, New Cairo, Egypt
Dayna El-Sakhawy, Slim Abdennadher & Injy Hamed

Authors

Dayna El-Sakhawy
View author publications
You can also search for this author in PubMed Google Scholar
Slim Abdennadher
View author publications
You can also search for this author in PubMed Google Scholar
Injy Hamed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Injy Hamed .

Editor information

Editors and Affiliations

Otto von Guericke University, Magdeburg, Germany
Ronald Böck
Trinity College, Dublin, Ireland
Francesca Bonin
Trinity College, Dublin, Ireland
Nick Campbell
Utrecht University, Utrecht, The Netherlands
Ronald Poppe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El-Sakhawy, D., Abdennadher, S., Hamed, I. (2015). Collecting Data for Automatic Speech Recognition Systems in Dialectal Arabic Using Games with a Purpose. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-15557-9_10
Published: 12 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15556-2
Online ISBN: 978-3-319-15557-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics