Skip to main content
Log in

The South African Human Language Technology Audit

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities of HLT and create a thriving HLT industry. One of the key challenges is the fact that there is insufficient codified knowledge about the current South African HLT components, their attributes and existing relationships. Hence a technology audit was conducted for the South African HLT landscape, to create a systematic and detailed inventory of the status of the HLT components across the eleven official languages. Based on the Basic Language Resource Kit (BLaRK) framework Krauwer (ELRA Newslett 3(2), 1998), we used various data collection methods (such as focus groups, questionnaires and personal consultations with HLT experts) to gather detailed information. The South African HLT landscape is analysed using a number of complementary approaches and based on the interpretations of the results, recommendations are made on how to accelerate HLT development in South Africa, as well as on how to conduct similar audits in other countries and contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Roughly twenty-five languages are spoken in South Africa; eleven of these have been declared official languages based on the grounds that their usage includes about 98% of the total population (DAC 2002). These official languages are Afrikaans (Afr), English (Eng), isiNdebele (Ndb), isiXhosa (Xho), isiZulu (Zul), Sepedi (Sesotho sa Leboa) (Sep), Sesotho (Ses), Setswana (Sts), Siswati (Ssw), Tshivenda (Tsv) and Xitsonga (Xit). The South African government has launched various initiatives and mechanisms to ensure a truly multilingual society (e.g. establishment of the Pan South African Language Board; http://www.pansalb.org.za).

  2. See for example the contribution by Badenhorst et al. (2011) in this volume.

  3. http://www.flarenet.eu.

  4. http://www.meta-net.eu.

  5. tinyurl.com/6lb8z6x.

  6. tinyurl.com/6lb8z6x.

  7. In our work ‘English (Eng)’ refers to ‘South African English (SAE)’ which has significant linguistic differences (e.g. pronunciation of words) from other accents of English such as ‘British’ or ‘American’ English.

  8. Not available (NA) items refer to proprietary resources or contract R&D resources which may not be fully available (e.g. resources from the defence environment). In a resource-scarce environment we found it significant that a resource exists even if NA, so that the HLT community is aware of it. Since UN items have more uncertainty with regard to their accessibility status (which may take significant time to get resolved), NA items are given a higher score than UN.

  9. The Maturity Index and Accessibility Index used here are calculated for each grouping of HLT components within data, modules and applications.

  10. tinyurl.com/6lb8z6x.

  11. tinyurl.com/6lb8z6x.

  12. The Maturity Index and the Accessibility Index here is on a per language basis, taken across all data, modules, applications as discussed in Sects. 4.2.1 and 4.2.2 respectively.

  13. taalunieversum.org/taal/technologie/stevin/.

  14. ixa2.si.ehu.es/saltmil.

  15. http://www.aflat.org.

References

  • Badenhorst, J., Van Heerden, C., Davel, M., & Barnard, E. (2011). Collecting and evaluating speech recognition corpora for 11 South African languages. Language resources and evaluation. Special Issue: African Language Technology. Springer.

  • Bross, U. (1999). Technology audit as a policy instrument to improve innovations and industrial competitiveness in countries in transition. Innovation, 12(3), 397–412.

    Google Scholar 

  • Binnenpoorte, D., De Vriend, F., Sturm, J., Daelemans, W., Strik, H., & Cucchiarini, C. (2002). A field survey for establishing priorities in the development of HLT resources for Dutch, In Proceedings LREC 2002, 3rd international conference on language resources and evaluation, Las Palmas, Spain (pp. 1862–1866).

  • D’Halleweyn, E., Odijk, J., Teunissen, L. M., & Cucchiarini, C. (2006). Dutch-Flemish HLT Programme STEVIN: Essential Speech and Language Technology Resources. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy (pp. 761–766).

  • Davel, M., & Barnard, E. (2003). Bootstrapping in Language Resource Generation. In: Proceedings of the Symposium of Pattern Recognition Society of South Africa, Langebaan, South Africa, November (pp. 97–100).

  • Department of Arts and Culture (DAC). (2002). National Language Policy Framework. Department of Arts and Culture, Pretoria, South Africa. http://www.info.gov.za/otherdocs/2002/langpolicyfinal.pdf. Accessed February 2008.

  • Elenius, K., Forsbom, E., & Megyesi, B. (2008). Language resources and tools for Swedish: A Survey. In Proceedings of the 6th international conference on language resources and evaluation (LREC 2008), Marrakesh, Morocco (pp. 600–604).

  • Joscelyne, A., & Lockwood, R. (2003). Benchmarking HLT progress in Europe. EUROMAP Language Technologies. Center for Sprogteknologi, Copenhagen. http://www.cervantes.es/seg_nivel/lect_ens/oesi/EUROMAP-Final-Report-Full-May-2003.pdf. Accessed June 2009.

  • Khalil, T. M. (2000). Management of technology–the key to competitiveness and wealth creation. McGraw-Hill: New York.

    Google Scholar 

  • Krauwer, S. (1998). ELSNET and ELRA: A common past and a common future. In:The ELRA Newsletter, 3(2), 4–5.

  • Krauwer, S. (2006). Strengthening the smaller languages in Europe. In Proceedings of the 5th Slovenian and 1st International language technologies conference, October 9–10, Ljubljana, Slovenia.

  • Maegaard, B., Krauwer, S., & Choukri, K. (2009). BLaRK for Arabic. MEDAR—Mediterranean Arabic Language and Speech Technology. http://www.medar.info/MEDAR_BLARK_I.pdf. Accessed June 2009.

  • Maegaard, B., Krauwer, S., Choukri, K., & Jørgensen, L. (2006). The BLARK concept and BLARK for Arabic. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy (pp. 773–778).

  • Mapelli, V., & Choukri, K. (2003). Report on a (minimal) set of LRs to be made available for as many languages as possible, and map of the actual gaps. European National Activities for Basic Language Resources (ENABLER) Thematic Network. http://www.ilc.cnr.it/enabler-network/reports.htm. Accessed June 2009.

  • Martino, J. P. (1994). A technology audit: Key to technology planning. In Proceedings of the IEEE national aerospace and electronics conference NAECON 1994, Dayton, Ohio, USA (pp. 1241–1247).

  • Pilon, S., Van Huyssteen, G. B., & Van Rooy, B. (2005). Teaching Language Technology at the North-West University, In proceedings of the second acl-tnlp workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, Michigan, Ann Arbor, USA (pp. 57–61).

  • Probert, D., Farrukh, C., Gregory, M., & Robinson, N. (1999). Linking technology to business planning: theory and practice. International Journal of Technology Management, 18(1–2), 11–30.

    Article  Google Scholar 

  • Sharma Grover, A. (2009). A Technology Audit: The State of Human Language Technologies R&D in South Africa (Masters research report). University of Pretoria: Graduate School of Technology Management.

    Google Scholar 

  • Sharma Grover, A., van Huyssteen, G. B., & Pretorius, M. W. (2010a). The South African Human Language Technologies Audit. In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010), Malta (pp. 2847–2850).

  • Sharma Grover, A, Van Huyssteen, G. B., & Pretorius, M. W. (2010b). A technological profile of the official South African languages. In 2nd Workshop on African Language Technology: AfLaT 2010 at the 7th international conference on language resources and evaluation (LREC 2010), Malta (pp. 3–7).

  • Simov, K., Osenova, P., Kolkovska, S., Balabanova, E., & Doikoff D. (2004). A language resources infrastructure for Bulgarian. In Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal (pp. 1685–1688).

Download references

Acknowledgments

The Department of Science and Technology of the South African Government is acknowledged hereby for financial support of the SAHLTA. We would also like to express our gratitude to anonymous reviewers for their detailed feedback. All fallacies remain ours.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditi Sharma Grover.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 704 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grover, A.S., van Huyssteen, G.B. & Pretorius, M.W. The South African Human Language Technology Audit. Lang Resources & Evaluation 45, 271–288 (2011). https://doi.org/10.1007/s10579-011-9151-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9151-2

Keywords

Navigation