Abstract
Natural language processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 100,000 sentences on which we conducted experiments on machine translation models based on recurrent neural networks (RNNs) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Lexical Markup Framework.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
Version 2.0.0.
References
Koehn P (2009) Statistical machine translation. Cambridge University Press. https://doi.org/10.1017/CBO9780511815829
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Association for Computational Linguistics, Vancouver, pp 28–39. https://doi.org/10.18653/v1/W17-3204. https://aclanthology.org/W17-3204
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421. https://doi.org/10.18653/v1/D15-1166. https://aclanthology.org/D15-1166
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Hedderich M, Lange L, Adel H, Strötgen J, Klakow D (2021) A survey on recent approaches for natural language processing in low-resource scenarios, pp 2545–2568. https://doi.org/10.18653/v1/2021.naacl-main.201
Adebara I, Abdul-Mageed M (2022) Towards afrocentric NLP for African languages: where we are and where we can go. In: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Dublin, Ireland, pp 3814–3841. https://doi.org/10.18653/v1/2022.acl-long.265. https://aclanthology.org/2022.acl-long.265
van Esch D, Lucassen T, Ruder S, Caswell I, Rivera CE (2022) Writing system and speaker metadata for 2,800+ language varieties. In: Proceedings of the language resources and evaluation conference, Marseille, France, pp 5035–5046. https://aclanthology.org/2022.lrec-1.538
Gauthier E, Besacier L, Voisin S, Melese M, Elingui UP (2016) Collecting resources in sub-Saharan African languages for automatic speech recognition: a case study of Wolof. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, pp 3863–3867. https://aclanthology.org/L16-1611
Nguer EH, Khoulé M, Thiaré O, Cissé MT, Mangeot M (2016) Dictionnaires wolof en ligne: État de l’art et perspectives. Working paper or preprint. https://hal.archives-ouvertes.fr/hal-01311413
Dione CMB (2012) A morphological analyzer for Wolof using finite-state techniques. In: Proceedings of the eighth international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey, pp 894–901. http://www.lrec-conf.org/proceedings/lrec2012/pdf/572_Paper.pdf
Dione CMB (2020) Implementation and evaluation of an LFG-based parser for Wolof. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 5128–5136. https://aclanthology.org/2020.lrec-1.631
Dione CB (2019) Developing universal dependencies for Wolof. In: Proceedings of the third workshop on universal dependencies (UDW, SyntaxFest 2019). Association for Computational Linguistics, Paris, France, pp 12–23. https://doi.org/10.18653/v1/W19-8003. https://aclanthology.org/W19-8003
Lo A, Nguer EHM, Abdoulaye N, Dione CB, Mangeot M, Khoule M, Bao-Diop S, Cissé MT (2016) Correction orthographique pour la langue wolof: état de l’art et perspectives. In: JEP-TALN-RECITAL 2016: Traitement Automatique des Langues Africaines TALAF 2016, Paris, France. https://hal.archives-ouvertes.fr/hal-02054917
Nguer EM, Lo A, Dione CMB, Ba SO, Lo M (2020) SENCORPUS: a French-Wolof parallel corpus. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 2803–2811. https://aclanthology.org/2020.lrec-1.341
Adelani D, Alabi J, Fan A, Kreutzer J, Shen X, Reid M, Ruiter D, Klakow D, Nabende P, Chang E, Gwadabe T, Sackey F, Dossou BFP, Emezue C, Leong C, Beukman M, Muhammad S, Jarso G, Yousuf O, Niyongabo Rubungo A, Hacheme G, Wairagala EP, Nasir MU, Ajibade B, Ajayi T, Gitau Y, Abbott J, Ahmed M, Ochieng M, Aremu A, Ogayo P, Mukiibi J, Ouoba Kabore F, Kalipe G, Mbaye D, Tapo AA, Memdjokam Koagne V, Munkoh-Buabeng E, Wagner V, Abdulmumin I, Awokoya A, Buzaaba H, Sibanda B, Bukula A, Manthalu S (2022) A few thousand translations go a long way! Leveraging pre-trained models for African news translation. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Seattle, United States, pp 3053–3070. https://doi.org/10.18653/v1/2022.naacl-main.223. https://aclanthology.org/2022.naacl-main.223
Nekoto W, Marivate V, Matsila T, Fasubaa T, Fagbohungbe T, Akinola SO, Muhammad S, Kabongo Kabenamualu S, Osei S, Sackey F, Niyongabo RA, Macharm R, Ogayo P, Ahia O, Berhe MM, Adeyemi M, Mokgesi-Selinga M, Okegbemi L, Martinus L, Tajudeen K, Degila K, Ogueji K, Siminyu K, Kreutzer J, Webster J, Ali JT, Abbott J, Orife I, Ezeani I, Dangana IA, Kamper H, Elsahar H, Duru G, Kioko G, Espoir M, van Biljon E, Whitenack D, Onyefuluchi C, Emezue CC, Dossou BFP, Sibanda B, Bassey B, Olabiyi A, Ramkilowan A, Öktem A, Akinfaderin A, Bashir A (2020) Participatory research for low-resourced machine translation: a case study in African languages. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 2144–2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195. https://aclanthology.org/2020.findings-emnlp.195
Ranathunga S, Lee ESA, Skenduli MP, Shekhar R, Alam M, Kaur R (2021) Neural machine translation for low-resource languages: a survey. arXiv:2106.15115
Jónsson HP, Símonarson HB, Snæbjarnarson V, Steingrímsson S, Loftsson H (2020) Experimenting with different machine translation models in medium-resource settings. In: Sojka P, Kopeček I, Pala K, Horák A (eds) Text, speech, and dialogue. Springer International Publishing, Cham, pp 95–103
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318. https://doi.org/10.3115/1073083.1073135. https://aclanthology.org/P02-1040
Domingo M, Garcıa-Martınez M, Helle A, Casacuberta F, Herranz M (2018) How much does tokenization affect neural machine translation? arXiv e-prints arXiv:1812.08621
Eberhard D, Simons G, Fennig C (2019) Ethnologue: languages of the world, 22nd edn
Adelani DI, Abbott J, Neubig G, D’souza D, Kreutzer J, Lignos C, Palen-Michel C, Buzaaba H, Rijhwani S, Ruder S, Mayhew S, Azime IA, Muhammad SH, Emezue CC, Nakatumba-Nabende J, Ogayo P, Anuoluwapo A, Gitau C, Mbaye D, Alabi J, Yimam SM, Gwadabe TR, Ezeani I, Niyongabo RA, Mukiibi J, Otiende V, Orife I, David D, Ngom S, Adewumi T, Rayson P, Adeyemi M, Muriuki G, Anebi E, Chukwuneke C, Odu N, Wairagala EP, Oyerinde S, Siro C, Bateesa TS, Oloyede T, Wambui Y, Akinode V, Nabagereka D, Katusiime M, Awokoya A, MBOUP M, Gebreyohannes D, Tilaye H, Nwaike K, Wolde D, Faye A, Sibanda B, Ahia O, Dossou BFP, Ogueji K, DIOP TI, Diallo A, Akinfaderin A, Marengereke T, Osei S (2021) MasakhaNER: named entity recognition for African languages. Trans Assoc Comput Linguist 9:1116–1131. https://doi.org/10.1162/tacl_a_00416. https://aclanthology.org/2021.tacl-1.66
Pinnis M (2018) Tilde’s parallel corpus filtering methods for WMT 2018. In: Proceedings of the third conference on machine translation: shared task papers. Association for Computational Linguistics, Belgium, Brussels, pp 939–945. https://doi.org/10.18653/v1/W18-6486. https://aclanthology.org/W18-6486
Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations. Association for Computational Linguistics, Vancouver, Canada, pp 67–72. https://www.aclweb.org/anthology/P17-4012
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980
Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, Brussels, Belgium, pp 66–71. https://doi.org/10.18653/v1/D18-2012. https://aclanthology.org/D18-2012
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Berlin, Germany, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162. https://aclanthology.org/P16-1162
Post M (2018) A call for clarity in reporting BLEU scores. In: Proceedings of the third conference on machine translation: research papers. Association for Computational Linguistics, Brussels, Belgium, pp 186–191. https://doi.org/10.18653/v1/W18-6319. https://aclanthology.org/W18-6319
Wieting J, Berg-Kirkpatrick T, Gimpel K, Neubig G (2019) Beyond BLEU: training neural machine translation with semantic similarity. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 4344–4355. https://doi.org/10.18653/v1/P19-1427. https://aclanthology.org/P19-1427
Araabi A, Monz C (2020) Optimizing transformer for low-resource neural machine translation. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Barcelona, Spain, pp 3429–3435. https://doi.org/10.18653/v1/2020.coling-main.304. https://aclanthology.org/2020.coling-main.304
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mbaye, D., Diallo, M., Diop, T.I. (2024). Low-Resourced Machine Translation for Senegalese Wolof Language. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 696. Springer, Singapore. https://doi.org/10.1007/978-981-99-3236-8_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-3236-8_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3235-1
Online ISBN: 978-981-99-3236-8
eBook Packages: EngineeringEngineering (R0)