Low-Resourced Machine Translation for Senegalese Wolof Language

Mbaye, Derguene; Diallo, Moussa; Diop, Thierno Ibrahima

doi:10.1007/978-981-99-3236-8_19

Derguene Mbaye ORCID: orcid.org/0000-0002-7490-2731¹³,
Moussa Diallo¹³ &
Thierno Ibrahima Diop¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 696))

Included in the following conference series:

International Congress on Information and Communication Technology

301 Accesses

Abstract

Natural language processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 100,000 sentences on which we conducted experiments on machine translation models based on recurrent neural networks (RNNs) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Lexical Markup Framework.
2.
http://pagesperso.ls2n.fr/~enguehard-c/DiLAF/index.php.
3.
https://ai.facebook.com/research/no-language-left-behind/.
4.
https://www.masakhane.io/.
5.
https://www.axl.cefan.ulaval.ca/afrique/senegal.htm.
6.
http://www.jo.gouv.sn/spip.php?article4802.
7.
Version 2.0.0.

References

Koehn P (2009) Statistical machine translation. Cambridge University Press. https://doi.org/10.1017/CBO9780511815829
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Association for Computational Linguistics, Vancouver, pp 28–39. https://doi.org/10.18653/v1/W17-3204. https://aclanthology.org/W17-3204
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421. https://doi.org/10.18653/v1/D15-1166. https://aclanthology.org/D15-1166
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Hedderich M, Lange L, Adel H, Strötgen J, Klakow D (2021) A survey on recent approaches for natural language processing in low-resource scenarios, pp 2545–2568. https://doi.org/10.18653/v1/2021.naacl-main.201
Adebara I, Abdul-Mageed M (2022) Towards afrocentric NLP for African languages: where we are and where we can go. In: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Dublin, Ireland, pp 3814–3841. https://doi.org/10.18653/v1/2022.acl-long.265. https://aclanthology.org/2022.acl-long.265
van Esch D, Lucassen T, Ruder S, Caswell I, Rivera CE (2022) Writing system and speaker metadata for 2,800+ language varieties. In: Proceedings of the language resources and evaluation conference, Marseille, France, pp 5035–5046. https://aclanthology.org/2022.lrec-1.538
Gauthier E, Besacier L, Voisin S, Melese M, Elingui UP (2016) Collecting resources in sub-Saharan African languages for automatic speech recognition: a case study of Wolof. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, pp 3863–3867. https://aclanthology.org/L16-1611
Nguer EH, Khoulé M, Thiaré O, Cissé MT, Mangeot M (2016) Dictionnaires wolof en ligne: État de l’art et perspectives. Working paper or preprint. https://hal.archives-ouvertes.fr/hal-01311413
Dione CMB (2012) A morphological analyzer for Wolof using finite-state techniques. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey, pp 894–901. http://www.lrec-conf.org/proceedings/lrec2012/pdf/572_Paper.pdf
Dione CMB (2020) Implementation and evaluation of an LFG-based parser for Wolof. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 5128–5136. https://aclanthology.org/2020.lrec-1.631
Dione CB (2019) Developing universal dependencies for Wolof. In: Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019). Association for Computational Linguistics, Paris, France, pp 12–23. https://doi.org/10.18653/v1/W19-8003. https://aclanthology.org/W19-8003
Lo A, Nguer EHM, Abdoulaye N, Dione CB, Mangeot M, Khoule M, Bao-Diop S, Cissé MT (2016) Correction orthographique pour la langue wolof: état de l’art et perspectives. In: JEP-TALN-RECITAL 2016: Traitement Automatique des Langues Africaines TALAF 2016, Paris, France. https://hal.archives-ouvertes.fr/hal-02054917
Nguer EM, Lo A, Dione CMB, Ba SO, Lo M (2020) SENCORPUS: a French-Wolof parallel corpus. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 2803–2811. https://aclanthology.org/2020.lrec-1.341
Adelani D, Alabi J, Fan A, Kreutzer J, Shen X, Reid M, Ruiter D, Klakow D, Nabende P, Chang E, Gwadabe T, Sackey F, Dossou BFP, Emezue C, Leong C, Beukman M, Muhammad S, Jarso G, Yousuf O, Niyongabo Rubungo A, Hacheme G, Wairagala EP, Nasir MU, Ajibade B, Ajayi T, Gitau Y, Abbott J, Ahmed M, Ochieng M, Aremu A, Ogayo P, Mukiibi J, Ouoba Kabore F, Kalipe G, Mbaye D, Tapo AA, Memdjokam Koagne V, Munkoh-Buabeng E, Wagner V, Abdulmumin I, Awokoya A, Buzaaba H, Sibanda B, Bukula A, Manthalu S (2022) A few thousand translations go a long way! Leveraging pre-trained models for African news translation. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Seattle, United States, pp 3053–3070. https://doi.org/10.18653/v1/2022.naacl-main.223. https://aclanthology.org/2022.naacl-main.223
Nekoto W, Marivate V, Matsila T, Fasubaa T, Fagbohungbe T, Akinola SO, Muhammad S, Kabongo Kabenamualu S, Osei S, Sackey F, Niyongabo RA, Macharm R, Ogayo P, Ahia O, Berhe MM, Adeyemi M, Mokgesi-Selinga M, Okegbemi L, Martinus L, Tajudeen K, Degila K, Ogueji K, Siminyu K, Kreutzer J, Webster J, Ali JT, Abbott J, Orife I, Ezeani I, Dangana IA, Kamper H, Elsahar H, Duru G, Kioko G, Espoir M, van Biljon E, Whitenack D, Onyefuluchi C, Emezue CC, Dossou BFP, Sibanda B, Bassey B, Olabiyi A, Ramkilowan A, Öktem A, Akinfaderin A, Bashir A (2020) Participatory research for low-resourced machine translation: a case study in African languages. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 2144–2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195. https://aclanthology.org/2020.findings-emnlp.195
Ranathunga S, Lee ESA, Skenduli MP, Shekhar R, Alam M, Kaur R (2021) Neural machine translation for low-resource languages: a survey. arXiv:2106.15115
Jónsson HP, Símonarson HB, Snæbjarnarson V, Steingrímsson S, Loftsson H (2020) Experimenting with different machine translation models in medium-resource settings. In: Sojka P, Kopeček I, Pala K, Horák A (eds) Text, speech, and dialogue. Springer International Publishing, Cham, pp 95–103
Chapter Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318. https://doi.org/10.3115/1073083.1073135. https://aclanthology.org/P02-1040
Domingo M, Garcıa-Martınez M, Helle A, Casacuberta F, Herranz M (2018) How much does tokenization affect neural machine translation? arXiv e-prints arXiv:1812.08621
Eberhard D, Simons G, Fennig C (2019) Ethnologue: languages of the world, 22nd edn
Google Scholar
Adelani DI, Abbott J, Neubig G, D’souza D, Kreutzer J, Lignos C, Palen-Michel C, Buzaaba H, Rijhwani S, Ruder S, Mayhew S, Azime IA, Muhammad SH, Emezue CC, Nakatumba-Nabende J, Ogayo P, Anuoluwapo A, Gitau C, Mbaye D, Alabi J, Yimam SM, Gwadabe TR, Ezeani I, Niyongabo RA, Mukiibi J, Otiende V, Orife I, David D, Ngom S, Adewumi T, Rayson P, Adeyemi M, Muriuki G, Anebi E, Chukwuneke C, Odu N, Wairagala EP, Oyerinde S, Siro C, Bateesa TS, Oloyede T, Wambui Y, Akinode V, Nabagereka D, Katusiime M, Awokoya A, MBOUP M, Gebreyohannes D, Tilaye H, Nwaike K, Wolde D, Faye A, Sibanda B, Ahia O, Dossou BFP, Ogueji K, DIOP TI, Diallo A, Akinfaderin A, Marengereke T, Osei S (2021) MasakhaNER: named entity recognition for African languages. Trans Assoc Comput Linguist 9:1116–1131. https://doi.org/10.1162/tacl_a_00416. https://aclanthology.org/2021.tacl-1.66
Pinnis M (2018) Tilde’s parallel corpus filtering methods for WMT 2018. In: Proceedings of the third conference on machine translation: shared task papers. Association for Computational Linguistics, Belgium, Brussels, pp 939–945. https://doi.org/10.18653/v1/W18-6486. https://aclanthology.org/W18-6486
Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations. Association for Computational Linguistics, Vancouver, Canada, pp 67–72. https://www.aclweb.org/anthology/P17-4012
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980
Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, Brussels, Belgium, pp 66–71. https://doi.org/10.18653/v1/D18-2012. https://aclanthology.org/D18-2012
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, Germany, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162. https://aclanthology.org/P16-1162
Post M (2018) A call for clarity in reporting BLEU scores. In: Proceedings of the third conference on machine translation: research papers. Association for Computational Linguistics, Brussels, Belgium, pp 186–191. https://doi.org/10.18653/v1/W18-6319. https://aclanthology.org/W18-6319
Wieting J, Berg-Kirkpatrick T, Gimpel K, Neubig G (2019) Beyond BLEU: training neural machine translation with semantic similarity. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4344–4355. https://doi.org/10.18653/v1/P19-1427. https://aclanthology.org/P19-1427
Araabi A, Monz C (2020) Optimizing transformer for low-resource neural machine translation. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Barcelona, Spain, pp 3429–3435. https://doi.org/10.18653/v1/2020.coling-main.304. https://aclanthology.org/2020.coling-main.304

Download references

Author information

Authors and Affiliations

Université Cheikh Anta Diop, Dakar, Senegal
Derguene Mbaye & Moussa Diallo
Baamtu, Dakar, Senegal
Thierno Ibrahima Diop

Authors

Derguene Mbaye
View author publications
You can also search for this author in PubMed Google Scholar
Moussa Diallo
View author publications
You can also search for this author in PubMed Google Scholar
Thierno Ibrahima Diop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derguene Mbaye .

Editor information

Editors and Affiliations

Department of Design Engineering and Mathematics, Middlesex University London, London, UK
Xin-She Yang
Department of Biomedical Engineering, University of Reading, England, UK
R. Simon Sherratt
Department of Computer Science and Engineering, Techno International Newtown, Chakpachuria, West Bengal, India
Nilanjan Dey
Global Knowledge Research Foundation, Ahmedabad, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mbaye, D., Diallo, M., Diop, T.I. (2024). Low-Resourced Machine Translation for Senegalese Wolof Language. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-99-3236-8_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-3236-8_19
Published: 15 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3235-1
Online ISBN: 978-981-99-3236-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics