The South African Human Language Technology Audit
- 250 Downloads
Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities of HLT and create a thriving HLT industry. One of the key challenges is the fact that there is insufficient codified knowledge about the current South African HLT components, their attributes and existing relationships. Hence a technology audit was conducted for the South African HLT landscape, to create a systematic and detailed inventory of the status of the HLT components across the eleven official languages. Based on the Basic Language Resource Kit (BLaRK) framework Krauwer (ELRA Newslett 3(2), 1998), we used various data collection methods (such as focus groups, questionnaires and personal consultations with HLT experts) to gather detailed information. The South African HLT landscape is analysed using a number of complementary approaches and based on the interpretations of the results, recommendations are made on how to accelerate HLT development in South Africa, as well as on how to conduct similar audits in other countries and contexts.
KeywordsTechnology audit Human language technology Language resources BLaRK Language audit Language resource infrastructure Resource-scarce languages
The Department of Science and Technology of the South African Government is acknowledged hereby for financial support of the SAHLTA. We would also like to express our gratitude to anonymous reviewers for their detailed feedback. All fallacies remain ours.
- Badenhorst, J., Van Heerden, C., Davel, M., & Barnard, E. (2011). Collecting and evaluating speech recognition corpora for 11 South African languages. Language resources and evaluation. Special Issue: African Language Technology. Springer.Google Scholar
- Bross, U. (1999). Technology audit as a policy instrument to improve innovations and industrial competitiveness in countries in transition. Innovation, 12(3), 397–412.Google Scholar
- Binnenpoorte, D., De Vriend, F., Sturm, J., Daelemans, W., Strik, H., & Cucchiarini, C. (2002). A field survey for establishing priorities in the development of HLT resources for Dutch, In Proceedings LREC 2002, 3rd international conference on language resources and evaluation, Las Palmas, Spain (pp. 1862–1866).Google Scholar
- D’Halleweyn, E., Odijk, J., Teunissen, L. M., & Cucchiarini, C. (2006). Dutch-Flemish HLT Programme STEVIN: Essential Speech and Language Technology Resources. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy (pp. 761–766).Google Scholar
- Davel, M., & Barnard, E. (2003). Bootstrapping in Language Resource Generation. In: Proceedings of the Symposium of Pattern Recognition Society of South Africa, Langebaan, South Africa, November (pp. 97–100).Google Scholar
- Department of Arts and Culture (DAC). (2002). National Language Policy Framework. Department of Arts and Culture, Pretoria, South Africa. http://www.info.gov.za/otherdocs/2002/langpolicyfinal.pdf. Accessed February 2008.
- Elenius, K., Forsbom, E., & Megyesi, B. (2008). Language resources and tools for Swedish: A Survey. In Proceedings of the 6th international conference on language resources and evaluation (LREC 2008), Marrakesh, Morocco (pp. 600–604).Google Scholar
- Joscelyne, A., & Lockwood, R. (2003). Benchmarking HLT progress in Europe. EUROMAP Language Technologies. Center for Sprogteknologi, Copenhagen. http://www.cervantes.es/seg_nivel/lect_ens/oesi/EUROMAP-Final-Report-Full-May-2003.pdf. Accessed June 2009.
- Khalil, T. M. (2000). Management of technology–the key to competitiveness and wealth creation. McGraw-Hill: New York.Google Scholar
- Krauwer, S. (1998). ELSNET and ELRA: A common past and a common future. In:The ELRA Newsletter, 3(2), 4–5.Google Scholar
- Krauwer, S. (2006). Strengthening the smaller languages in Europe. In Proceedings of the 5th Slovenian and 1st International language technologies conference, October 9–10, Ljubljana, Slovenia.Google Scholar
- Maegaard, B., Krauwer, S., & Choukri, K. (2009). BLaRK for Arabic. MEDAR—Mediterranean Arabic Language and Speech Technology. http://www.medar.info/MEDAR_BLARK_I.pdf. Accessed June 2009.
- Maegaard, B., Krauwer, S., Choukri, K., & Jørgensen, L. (2006). The BLARK concept and BLARK for Arabic. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy (pp. 773–778).Google Scholar
- Mapelli, V., & Choukri, K. (2003). Report on a (minimal) set of LRs to be made available for as many languages as possible, and map of the actual gaps. European National Activities for Basic Language Resources (ENABLER) Thematic Network. http://www.ilc.cnr.it/enabler-network/reports.htm. Accessed June 2009.
- Martino, J. P. (1994). A technology audit: Key to technology planning. In Proceedings of the IEEE national aerospace and electronics conference NAECON 1994, Dayton, Ohio, USA (pp. 1241–1247).Google Scholar
- Pilon, S., Van Huyssteen, G. B., & Van Rooy, B. (2005). Teaching Language Technology at the North-West University, In proceedings of the second acl-tnlp workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, Michigan, Ann Arbor, USA (pp. 57–61).Google Scholar
- Sharma Grover, A. (2009). A Technology Audit: The State of Human Language Technologies R&D in South Africa (Masters research report). University of Pretoria: Graduate School of Technology Management.Google Scholar
- Sharma Grover, A., van Huyssteen, G. B., & Pretorius, M. W. (2010a). The South African Human Language Technologies Audit. In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010), Malta (pp. 2847–2850).Google Scholar
- Sharma Grover, A, Van Huyssteen, G. B., & Pretorius, M. W. (2010b). A technological profile of the official South African languages. In 2nd Workshop on African Language Technology: AfLaT 2010 at the 7th international conference on language resources and evaluation (LREC 2010), Malta (pp. 3–7).Google Scholar
- Simov, K., Osenova, P., Kolkovska, S., Balabanova, E., & Doikoff D. (2004). A language resources infrastructure for Bulgarian. In Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal (pp. 1685–1688).Google Scholar