Abstract
In this paper, we are interested in a sociolinguistic phenomenon that occurs in daily conversations of Maghrebi people, commonly known as code-switching or also code-mixing. This problem consists of alternating languages during communication or writing. In this work, we measure the importance of this phenomenon in the Maghrebi languages. To this end, we harvested from YouTube, comments written in Algerian, Moroccan and Tunisian dialects. Each of which contains at least 17 million words. Although there are several metrics in the literature to measure the code-switching, to the best of our knowledge, there isn’t yet a measure that takes into account the degree of mixture according to a reference language. In contrast to the existing measures, we propose a new metric named CESAR (CodE-Switching According to a Reference language) that estimates the degree of the language mixtures, in accordance with a reference language. Experiments are carried out on the three collected corpora by considering the local dialects as reference languages. Experimental results show that CESAR is well adapted to this purpose and allows to compare the three Maghrebi dialects according to their level of code-switching.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Available at: https://developers.google.com/YouTube.
References
Abidi, K., Menacer, M.A., Smaïli, K.: CALYOU: a comparable spoken ALgerian corpus extracted from YOUTube. In: 18th Annual Conference of the International Speech Communication Association, Stockholm Sweden. Interspeech (2017)
Abidi, K., Smaïli, K.: In International Conference on Natural Language, Signal and Speech Processing (ICNLSSP), Casablanca, Morocco (2017)
Al-Badrashiny, M., Elfardy, H., Diab, M.: AIDA2: a hybrid approach for token and sentence level dialect identification in Arabic. In: Proceedings of the 19th Conference on Computational Natural Language Learning, CoNLL, Beijing, China, pp. 42–51 (2015)
Alhazmi, A.: Linguistic aspects of Arabic-English code switching on Facebook and radio in Australia. Int. J. Appl. Linguist. Engl. Liter. 5(3), 184–198 (2015)
Amazouz, D., Adda-Decker, M., Lamel, L.: The French-Algerian code-switching triggered audio corpus (FACST). In: 11th Edition of the Language Resources and Evaluation Conference, LREC (2018)
Auer, P.: From codes-witching via language mixing to fused LECTS: toward a dynamic typology of bilingual speech. Int. J. Bilingual. 3(4), 309–332 (1999)
Çetinoglu, Ö.: A Turkish-german code-switching corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)
Çetinoglu, Ö.: A code-switching corpus of Turkish-German conversations. In: Proceedings of the 11th Linguistic Annotation Workshop, Valencia, Spain, pp. 34–40 (2017)
Gafaranga, J., Torras, M.-C.: Interactional otherness: towards a redefinition of code-switching. Int. J. Bilingual. 6(1), 1–22 (2002)
Gambäck, B., Das, A.: Comparing the level of code-switching in corpora. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož Slovenia (2016)
Ghosh, S., Ghosh, S., Das, D.: Complexity metric for code-mixed social media text. CoRR, abs/1707.01183 (2017)
Jaech, A., Mulcaire, G., Ostendorf, M., Smith, N.A.: A neural model for language identification in code-switched tweets. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, Austin, Texas, USA, EMNLP, pp. 60–64 (2016)
Joshi, A.K.: Processing of Sentences with Intra-sentential Code-switching. In: Proceedings of the 9th Conference on Computational Linguistics - Volume 1, COLING 1982, pp. 145–150 (1982)
Menacer, M.A., Mella, O., Fohr, D., Jouvet, Langlois, D., Smaïli, K.: Development of the arabic loria automatic speech recognition system (ALASR) and its evaluation for Algerian dialect. In: Third International Conference On Arabic Computational Linguistics, Dubai, (2017)
Rallabandi, S.K., Black, A.W.: On building mixed lingual speech synthesis systems. In: 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, Interspeech, pp. 52–56 (2017)
Ramanarayanan, V., Suendermann-Oeft, D., Haan, J.: I’d like both, por favor: elicitation of a code-switched corpus of Hindi-English and Spanish-English human-machine dialog. In: 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, Interspeech, pp. 47–51 (2017)
Redouan, R.: Linguistic constraints on code-switching and code-mixing of bilingual Moroccan Arabic-French Speakers in Canada . In: Proceedings of the 4th International Symposium on Bilingualism. Cascadilla Press (2005)
Samih, Y., Maier, W.: An Arabic-moroccan Darija code-switched corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož, Slovenia (2016)
Tawwab, A.A., Eldin, S.: Socio linguistic study of code switching of the Arabic language speakers on social networking. Int. J. Engl. Linguist. 4(6) (2014)
Yeong, Y.-L., Tan, T.-P.: Language identification of code switching sentences and multilingual sentences of under-resourced languages by using multi structural word information. In: 15th Annual Conference of the International Speech Communication Association, Singapore, Interspeech, pp. 3052–3055 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Abidi, K., Smaïli, K. (2022). CESAR: A New Metric to Measure the Level of Code-Switching in Corpora - Application to Maghrebian Dialects. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_58
Download citation
DOI: https://doi.org/10.1007/978-3-030-82196-8_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82195-1
Online ISBN: 978-3-030-82196-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)