Skip to main content

Low-Resourced Machine Translation for Senegalese Wolof Language

  • Conference paper
  • First Online:
Proceedings of Eighth International Congress on Information and Communication Technology (ICICT 2023)

Abstract

Natural language processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 100,000 sentences on which we conducted experiments on machine translation models based on recurrent neural networks (RNNs) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Lexical Markup Framework.

  2. 2.

    http://pagesperso.ls2n.fr/~enguehard-c/DiLAF/index.php.

  3. 3.

    https://ai.facebook.com/research/no-language-left-behind/.

  4. 4.

    https://www.masakhane.io/.

  5. 5.

    https://www.axl.cefan.ulaval.ca/afrique/senegal.htm.

  6. 6.

    http://www.jo.gouv.sn/spip.php?article4802.

  7. 7.

    Version 2.0.0.

References

  1. Koehn P (2009) Statistical machine translation. Cambridge University Press. https://doi.org/10.1017/CBO9780511815829

  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  3. Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Association for Computational Linguistics, Vancouver, pp 28–39. https://doi.org/10.18653/v1/W17-3204. https://aclanthology.org/W17-3204

  4. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421. https://doi.org/10.18653/v1/D15-1166. https://aclanthology.org/D15-1166

  5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  6. Hedderich M, Lange L, Adel H, Strötgen J, Klakow D (2021) A survey on recent approaches for natural language processing in low-resource scenarios, pp 2545–2568. https://doi.org/10.18653/v1/2021.naacl-main.201

  7. Adebara I, Abdul-Mageed M (2022) Towards afrocentric NLP for African languages: where we are and where we can go. In: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Dublin, Ireland, pp 3814–3841. https://doi.org/10.18653/v1/2022.acl-long.265. https://aclanthology.org/2022.acl-long.265

  8. van Esch D, Lucassen T, Ruder S, Caswell I, Rivera CE (2022) Writing system and speaker metadata for 2,800+ language varieties. In: Proceedings of the language resources and evaluation conference, Marseille, France, pp 5035–5046. https://aclanthology.org/2022.lrec-1.538

  9. Gauthier E, Besacier L, Voisin S, Melese M, Elingui UP (2016) Collecting resources in sub-Saharan African languages for automatic speech recognition: a case study of Wolof. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, pp 3863–3867. https://aclanthology.org/L16-1611

  10. Nguer EH, Khoulé M, Thiaré O, Cissé MT, Mangeot M (2016) Dictionnaires wolof en ligne: État de l’art et perspectives. Working paper or preprint. https://hal.archives-ouvertes.fr/hal-01311413

  11. Dione CMB (2012) A morphological analyzer for Wolof using finite-state techniques. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey, pp 894–901. http://www.lrec-conf.org/proceedings/lrec2012/pdf/572_Paper.pdf

  12. Dione CMB (2020) Implementation and evaluation of an LFG-based parser for Wolof. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 5128–5136. https://aclanthology.org/2020.lrec-1.631

  13. Dione CB (2019) Developing universal dependencies for Wolof. In: Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019). Association for Computational Linguistics, Paris, France, pp 12–23. https://doi.org/10.18653/v1/W19-8003. https://aclanthology.org/W19-8003

  14. Lo A, Nguer EHM, Abdoulaye N, Dione CB, Mangeot M, Khoule M, Bao-Diop S, Cissé MT (2016) Correction orthographique pour la langue wolof: état de l’art et perspectives. In: JEP-TALN-RECITAL 2016: Traitement Automatique des Langues Africaines TALAF 2016, Paris, France. https://hal.archives-ouvertes.fr/hal-02054917

  15. Nguer EM, Lo A, Dione CMB, Ba SO, Lo M (2020) SENCORPUS: a French-Wolof parallel corpus. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 2803–2811. https://aclanthology.org/2020.lrec-1.341

  16. Adelani D, Alabi J, Fan A, Kreutzer J, Shen X, Reid M, Ruiter D, Klakow D, Nabende P, Chang E, Gwadabe T, Sackey F, Dossou BFP, Emezue C, Leong C, Beukman M, Muhammad S, Jarso G, Yousuf O, Niyongabo Rubungo A, Hacheme G, Wairagala EP, Nasir MU, Ajibade B, Ajayi T, Gitau Y, Abbott J, Ahmed M, Ochieng M, Aremu A, Ogayo P, Mukiibi J, Ouoba Kabore F, Kalipe G, Mbaye D, Tapo AA, Memdjokam Koagne V, Munkoh-Buabeng E, Wagner V, Abdulmumin I, Awokoya A, Buzaaba H, Sibanda B, Bukula A, Manthalu S (2022) A few thousand translations go a long way! Leveraging pre-trained models for African news translation. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Seattle, United States, pp 3053–3070. https://doi.org/10.18653/v1/2022.naacl-main.223. https://aclanthology.org/2022.naacl-main.223

  17. Nekoto W, Marivate V, Matsila T, Fasubaa T, Fagbohungbe T, Akinola SO, Muhammad S, Kabongo Kabenamualu S, Osei S, Sackey F, Niyongabo RA, Macharm R, Ogayo P, Ahia O, Berhe MM, Adeyemi M, Mokgesi-Selinga M, Okegbemi L, Martinus L, Tajudeen K, Degila K, Ogueji K, Siminyu K, Kreutzer J, Webster J, Ali JT, Abbott J, Orife I, Ezeani I, Dangana IA, Kamper H, Elsahar H, Duru G, Kioko G, Espoir M, van Biljon E, Whitenack D, Onyefuluchi C, Emezue CC, Dossou BFP, Sibanda B, Bassey B, Olabiyi A, Ramkilowan A, Öktem A, Akinfaderin A, Bashir A (2020) Participatory research for low-resourced machine translation: a case study in African languages. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 2144–2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195. https://aclanthology.org/2020.findings-emnlp.195

  18. Ranathunga S, Lee ESA, Skenduli MP, Shekhar R, Alam M, Kaur R (2021) Neural machine translation for low-resource languages: a survey. arXiv:2106.15115

  19. Jónsson HP, Símonarson HB, Snæbjarnarson V, Steingrímsson S, Loftsson H (2020) Experimenting with different machine translation models in medium-resource settings. In: Sojka P, Kopeček I, Pala K, Horák A (eds) Text, speech, and dialogue. Springer International Publishing, Cham, pp 95–103

    Chapter  Google Scholar 

  20. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318. https://doi.org/10.3115/1073083.1073135. https://aclanthology.org/P02-1040

  21. Domingo M, Garcıa-Martınez M, Helle A, Casacuberta F, Herranz M (2018) How much does tokenization affect neural machine translation? arXiv e-prints arXiv:1812.08621

  22. Eberhard D, Simons G, Fennig C (2019) Ethnologue: languages of the world, 22nd edn

    Google Scholar 

  23. Adelani DI, Abbott J, Neubig G, D’souza D, Kreutzer J, Lignos C, Palen-Michel C, Buzaaba H, Rijhwani S, Ruder S, Mayhew S, Azime IA, Muhammad SH, Emezue CC, Nakatumba-Nabende J, Ogayo P, Anuoluwapo A, Gitau C, Mbaye D, Alabi J, Yimam SM, Gwadabe TR, Ezeani I, Niyongabo RA, Mukiibi J, Otiende V, Orife I, David D, Ngom S, Adewumi T, Rayson P, Adeyemi M, Muriuki G, Anebi E, Chukwuneke C, Odu N, Wairagala EP, Oyerinde S, Siro C, Bateesa TS, Oloyede T, Wambui Y, Akinode V, Nabagereka D, Katusiime M, Awokoya A, MBOUP M, Gebreyohannes D, Tilaye H, Nwaike K, Wolde D, Faye A, Sibanda B, Ahia O, Dossou BFP, Ogueji K, DIOP TI, Diallo A, Akinfaderin A, Marengereke T, Osei S (2021) MasakhaNER: named entity recognition for African languages. Trans Assoc Comput Linguist 9:1116–1131. https://doi.org/10.1162/tacl_a_00416. https://aclanthology.org/2021.tacl-1.66

  24. Pinnis M (2018) Tilde’s parallel corpus filtering methods for WMT 2018. In: Proceedings of the third conference on machine translation: shared task papers. Association for Computational Linguistics, Belgium, Brussels, pp 939–945. https://doi.org/10.18653/v1/W18-6486. https://aclanthology.org/W18-6486

  25. Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations. Association for Computational Linguistics, Vancouver, Canada, pp 67–72. https://www.aclweb.org/anthology/P17-4012

  26. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  27. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980

  28. Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, Brussels, Belgium, pp 66–71. https://doi.org/10.18653/v1/D18-2012. https://aclanthology.org/D18-2012

  29. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, Germany, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162. https://aclanthology.org/P16-1162

  30. Post M (2018) A call for clarity in reporting BLEU scores. In: Proceedings of the third conference on machine translation: research papers. Association for Computational Linguistics, Brussels, Belgium, pp 186–191. https://doi.org/10.18653/v1/W18-6319. https://aclanthology.org/W18-6319

  31. Wieting J, Berg-Kirkpatrick T, Gimpel K, Neubig G (2019) Beyond BLEU: training neural machine translation with semantic similarity. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4344–4355. https://doi.org/10.18653/v1/P19-1427. https://aclanthology.org/P19-1427

  32. Araabi A, Monz C (2020) Optimizing transformer for low-resource neural machine translation. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Barcelona, Spain, pp 3429–3435. https://doi.org/10.18653/v1/2020.coling-main.304. https://aclanthology.org/2020.coling-main.304

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derguene Mbaye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mbaye, D., Diallo, M., Diop, T.I. (2024). Low-Resourced Machine Translation for Senegalese Wolof Language. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-99-3236-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3236-8_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3235-1

  • Online ISBN: 978-981-99-3236-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics