Abstract
Building Automatic Speech Recognition (ASR) systems for spoken languages usually suffer from the problem of limited available transcriptions. Automatic Speech Recognition (ASR) systems require large speech corpora that contain speech and their corresponding transcriptions for training acoustic models. In this paper, we target the Egyptian dialectal Arabic. As other spoken languages, it is mainly used for spoken rather than writing purposes. Transcriptions are usually collected manually by experts. However, this proved to be a time-consuming and expensive process. In this paper, we introduce Games With a Purpose as a cheap and fast approach to gather transcriptions for Egyptian dialectal Arabic. Furthermore, Arabic orthographic transcriptions lack diacritizations, which leads to ambiguity. On the other hand, transcriptions written in Arabic Chat Alphabet are widely used, and include the pronunciation effects given by diacritics. In this work, we present the game
(pronouced as makhamekho) that aims at collecting transcriptions in Arabic orthography, as well as in Arabic Chat Alphabet. It also gathers mappings of words from Arabic orthography to Arabic Chat Alphabet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Suendermann, D., Liscombe, J., Pieraccini, R.: How to drink from a fire hose: one person can annoscribe 693 thousand utterances in one month (2010)
Novotney, S., Callison-Burch, C.: Cheap, automatic speech recognition with non-expert transcription, fast and good enough (2009)
Delendik, Y.: What is Automatic Speech Recognition? June 2009
Furui, S.: Automatic speech recognition and its application to information extraction (2001)
Verguria, D., Kirchhoff, K.: Automatic diacritization of arabic for acoustic modeling in speech recognition (2014)
Macmillan, P.: Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt. Palgrave Macmillan, New York (2003)
von Ahn, L., Dabbish, L.: Designing games with a purpose (2008). https://www.cs.cmu.edu/~biglou/GWAP_CACM.pdf
Parent, G., Eskenazi, M.: Toward better crowdsourced transcription: transcription of a year of the let’s go bus information system data (2010)
Akasaka, R.: Foreign accented speech transcription and accent recognition using a game-based approach (2009)
Marge, M.R., Satanjeev, B., Rudnicky, A.I.: Using the amazon mechanical turk to transcribe and annotate meeting speech for extractive summarization (2010)
Marge, M.R., Satanjeev, B., Rudnicky, A.I.: Using the amazon mechanical turk for transcription of spoken language (2010)
Evanini, K., Higgins, D., Zechner, K.: Using amazon mechanical turk for transcription of non-native speech (2010)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of EMNLP, vol. 1, pp. 254–263 (2008)
Denkowski, M., Al-Haj, H., Lavie, A.: Turker-assisted paraphrasing for English-Arabic machine translation. In: Proceedings of NAACL-HLT, pp. 66–70 (2010)
Ambati, V., Vogel, S.: Can crowds build parallel corpora for machine translation systems? In: Proceedings of NAACL-HLT, pp. 62–65 (2010)
Elmahdy, M., Gruhn, R., Abdennadher, S., Minker, W.: Rapid phonetic transcription using everyday life natural Chat Alphabet orthography for dialectal Arabic speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4936, 4939, 22–27 May 2011
McGonigal, J.: Reality is Broken: Why Games Make Us Better and How they Can Change the World. Penguin Press, New York (2011)
Wieser, C., et al.: ARTigo: Building an artwork search engine with games and higher-order latent semantic analysis. In: First AAAI Conference on Human Computation and Crowdsourcing (2013)
Law, L.M.: TagATune: A game for music and sound annotation. In: ISMIR, vol. 3 (2007)
Parent, G., Eskenazi, M.: Speaking to the crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: INTERSPEECH (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
El-Sakhawy, D., Abdennadher, S., Hamed, I. (2015). Collecting Data for Automatic Speech Recognition Systems in Dialectal Arabic Using Games with a Purpose. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-15557-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15556-2
Online ISBN: 978-3-319-15557-9
eBook Packages: Computer ScienceComputer Science (R0)