Skip to main content

CESAR: A New Metric to Measure the Level of Code-Switching in Corpora - Application to Maghrebian Dialects

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 295))

Included in the following conference series:

  • 939 Accesses

Abstract

In this paper, we are interested in a sociolinguistic phenomenon that occurs in daily conversations of Maghrebi people, commonly known as code-switching or also code-mixing. This problem consists of alternating languages during communication or writing. In this work, we measure the importance of this phenomenon in the Maghrebi languages. To this end, we harvested from YouTube, comments written in Algerian, Moroccan and Tunisian dialects. Each of which contains at least 17 million words. Although there are several metrics in the literature to measure the code-switching, to the best of our knowledge, there isn’t yet a measure that takes into account the degree of mixture according to a reference language. In contrast to the existing measures, we propose a new metric named CESAR (CodE-Switching According to a Reference language) that estimates the degree of the language mixtures, in accordance with a reference language. Experiments are carried out on the three collected corpora by considering the local dialects as reference languages. Experimental results show that CESAR is well adapted to this purpose and allows to compare the three Maghrebi dialects according to their level of code-switching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sourceforge.net/projects/halef/.

  2. 2.

    Available at: https://developers.google.com/YouTube.

References

  1. Abidi, K., Menacer, M.A., Smaïli, K.: CALYOU: a comparable spoken ALgerian corpus extracted from YOUTube. In: 18th Annual Conference of the International Speech Communication Association, Stockholm Sweden. Interspeech (2017)

    Google Scholar 

  2. Abidi, K., Smaïli, K.: In International Conference on Natural Language, Signal and Speech Processing (ICNLSSP), Casablanca, Morocco (2017)

    Google Scholar 

  3. Al-Badrashiny, M., Elfardy, H., Diab, M.: AIDA2: a hybrid approach for token and sentence level dialect identification in Arabic. In: Proceedings of the 19th Conference on Computational Natural Language Learning, CoNLL, Beijing, China, pp. 42–51 (2015)

    Google Scholar 

  4. Alhazmi, A.: Linguistic aspects of Arabic-English code switching on Facebook and radio in Australia. Int. J. Appl. Linguist. Engl. Liter. 5(3), 184–198 (2015)

    Google Scholar 

  5. Amazouz, D., Adda-Decker, M., Lamel, L.: The French-Algerian code-switching triggered audio corpus (FACST). In: 11th Edition of the Language Resources and Evaluation Conference, LREC (2018)

    Google Scholar 

  6. Auer, P.: From codes-witching via language mixing to fused LECTS: toward a dynamic typology of bilingual speech. Int. J. Bilingual. 3(4), 309–332 (1999)

    Article  Google Scholar 

  7. Çetinoglu, Ö.: A Turkish-german code-switching corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož (2016)

    Google Scholar 

  8. Çetinoglu, Ö.: A code-switching corpus of Turkish-German conversations. In: Proceedings of the 11th Linguistic Annotation Workshop, Valencia, Spain, pp. 34–40 (2017)

    Google Scholar 

  9. Gafaranga, J., Torras, M.-C.: Interactional otherness: towards a redefinition of code-switching. Int. J. Bilingual. 6(1), 1–22 (2002)

    Article  Google Scholar 

  10. Gambäck, B., Das, A.: Comparing the level of code-switching in corpora. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož Slovenia (2016)

    Google Scholar 

  11. Ghosh, S., Ghosh, S., Das, D.: Complexity metric for code-mixed social media text. CoRR, abs/1707.01183 (2017)

    Google Scholar 

  12. Jaech, A., Mulcaire, G., Ostendorf, M., Smith, N.A.: A neural model for language identification in code-switched tweets. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, Austin, Texas, USA, EMNLP, pp. 60–64 (2016)

    Google Scholar 

  13. Joshi, A.K.: Processing of Sentences with Intra-sentential Code-switching. In: Proceedings of the 9th Conference on Computational Linguistics - Volume 1, COLING 1982, pp. 145–150 (1982)

    Google Scholar 

  14. Menacer, M.A., Mella, O., Fohr, D., Jouvet, Langlois, D., Smaïli, K.: Development of the arabic loria automatic speech recognition system (ALASR) and its evaluation for Algerian dialect. In: Third International Conference On Arabic Computational Linguistics, Dubai, (2017)

    Google Scholar 

  15. Rallabandi, S.K., Black, A.W.: On building mixed lingual speech synthesis systems. In: 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, Interspeech, pp. 52–56 (2017)

    Google Scholar 

  16. Ramanarayanan, V., Suendermann-Oeft, D., Haan, J.: I’d like both, por favor: elicitation of a code-switched corpus of Hindi-English and Spanish-English human-machine dialog. In: 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, Interspeech, pp. 47–51 (2017)

    Google Scholar 

  17. Redouan, R.: Linguistic constraints on code-switching and code-mixing of bilingual Moroccan Arabic-French Speakers in Canada . In: Proceedings of the 4th International Symposium on Bilingualism. Cascadilla Press (2005)

    Google Scholar 

  18. Samih, Y., Maier, W.: An Arabic-moroccan Darija code-switched corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC, Portorož, Slovenia (2016)

    Google Scholar 

  19. Tawwab, A.A., Eldin, S.: Socio linguistic study of code switching of the Arabic language speakers on social networking. Int. J. Engl. Linguist. 4(6) (2014)

    Google Scholar 

  20. Yeong, Y.-L., Tan, T.-P.: Language identification of code switching sentences and multilingual sentences of under-resourced languages by using multi structural word information. In: 15th Annual Conference of the International Speech Communication Association, Singapore, Interspeech, pp. 3052–3055 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karima Abidi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abidi, K., Smaïli, K. (2022). CESAR: A New Metric to Measure the Level of Code-Switching in Corpora - Application to Maghrebian Dialects. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-030-82196-8_58

Download citation

Publish with us

Policies and ethics