Abstract
The Grammatical Framework (GF) not only offers state of the art grammar-based machine translation support between an increasing number of languages through its so-called Resource Grammar Library, but is also fast becoming a de facto framework for developing multilingual controlled natural languages (CNLs). For a natural language to share maximally in the opportunities that GF-based multilingual CNL support presents, it has to have a GF resource grammar. Tswana, an agglutinating Bantu language, spoken in Southern Africa as one of the eleven official languages of South Africa, does not yet have such a grammar. This article reports on the development of a so-called miniature resource grammar, a first step towards a full resource grammar for Tswana. The focus is on the modelling of the Tswana proper verb as it occurs in simple sentences. The (proper) verb is the morphologically most complex word category in Tswana, and therefore constitutes a notable contribution towards the development of a GF resource grammar for Tswana. The computational model is discussed in some detail, implemented and tested on a systematically constructed treebank.
Similar content being viewed by others
References
Angelov, K., & Ranta, A. (2010). Implementing controlled languages in GF. In N. Fuchs (Ed.), Controlled natural language, Lecture Notes in Computer Science (Vol. 5972, pp. 82–101). Berlin: Springer.
Beesley, K., Bosch, S., & Pretorius, L. (2013). The impact of language technologies on South Africa’s lesser-studied official languages. In The impact of language technology on society, Invited Workshop, Tromsø International Conference on Language Diversity.
Berg, A., Pretorius, R., & Pretorius, L. (2012). Exploring the treatment of selected typological characteristics of Tswana in LFG. In Butt, M., Holloway King T. (Ed.), Proceedings of the 17th international lexical functional grammar conference, pp. 85–98.
Berg, A., Pretorius, R., & Pretorius, L. (2013). The representation of Setswana double objects in LFG. In Butt, M., Holloway King, T. (Ed.), Proceedings of the LFG13 conference, pp. 111–130.
Bosch, S.E., Pretorius, L., & Fleisch, A. (2008). The experimental bootstrapping of morphological analysers for Nguni languages. Nordic Journal of African Studies 17(2).
Botswana Central Statistics Office (2009) Botswana Demographic Survey 2006. Government Printer. Gaberone.
Caprotti, O. (2006). Webalt! deliver mathematics everywhere. Society for Information Technology & Teacher Education International Conference, 2006, 2164–2168.
Cole, D. T. (1955). An Introduction to Tswana grammar. Cape Town: Longman.
Cole, D. T., & Moncho-Warren, L. M. (2011). Setswana and english illustrated dictionary. Gauteng: McMillan.
Dannélls, D., Damova, M., Enache, R., & Chechev, M. (2011). A framework for improved access to museum databases in the semantic web. In Language technologies for digital humanities and cultural heritage (RANLPDigHum 2011), pp. 3–10. http://www.aclweb.org/anthology-new/W/W11/W11-41.pdf.
Davis, B., Enache, R., van Grondelle, J., & Pretorius, L. (2012). Multilingual verbalisation of modular ontologies using GF and lemon. In T. Kuhn & N. Fuchs (Eds.), Controlled Natural Language, Lecture Notes in Computer Science (Vol. 7427, pp. 167–184). Berlin, Heidelberg: Springer.
Kaljurand, K., & Alume, T. (2012). Controlled natural language in speech recognition based user interfaces. In Kuhn, T., Fuchs, N. (Ed.), Controlled Natural Language, Lecture Notes in Computer Science (Vol. 7427, pp. 79–94). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-32612-7_6.
Kaljurand, K., & Kuhn, T. (2013). A multilingual semantic wiki based on attempto controlled english and grammatical framework. In The Semantic Web: Semantics and Big Data, pp. 427–441. Berlin, Heidelberg: Springer.
Kaljurand, K., Kuhn, T., & Canedo, L. (2014). Collaborative multilingual knowledge management based on controlled natural language. Semantic Web Journal, Special Issue on Semantic Web Interfaces.
Kosch, I. M. (1988). ‘Imperfect tense -a’ of Northern Sotho revisited. South African Journal for African Languages, 8(1), 1–6.
Kosch, I. M. (2006). Topics in morphology in the African language context. Pretoria: Unisa Press.
Krüger, C. J. H. (2006). Introduction to the morphology of Setswana. München: Lincom.
Krüger, C. J. H. (2013). Setswana Syntax: A survey of word group structure (Vol. 1). München: Lincom.
Krüger, C. J. H. (2013). Setswana syntax: A survey of word group structure (Vol. 2). München: Lincom.
Kuhn, T. (2014). A survey and classification of controlled natural languages. Computational Linguistics, 40(1), 121–170.
Lombard, D. P., Van Wyk, E. B., & Mokgokong, P. C. (1993). Introduction to the Grammar of Northern Sotho. Pretoria: JL van Schaik.
Louwrens, L. J. (1994). Dictionary of northern sotho grammatical terms. Pretoria: Via Afrika.
Posthumus, L. C. (1993). The hierarchy of the essential verb category in Zulu. South African Journal for African Languages, 13(3), 95–102.
Poulos, G., & Louwrens, L. J. (1994). A Linguistic analysis of northern Sotho. Pretoria: Via Afrika.
Pretorius, L. (2014). The multilingual semantic web as virtual knowledge commons: The case of the under-resourced South African languages. Berlin: Springer.
Pretorius, L., Viljoen, B., Berg, A., & Pretorius, R. (2014). Tswana finite state tokenisation. Language Resources and Evaluation.
Pretorius, L., Viljoen, B., Pretorius, R., & Berg, A. (2008). Towards a computational morphological analysis of Setswana compounds. Literator, 29(1), 1–20.
Pretorius, L., Viljoen, B., Pretorius, R., & Berg, A. (2010). A finite-state approach to Setswana verb morphology. In Finite-state methods and natural language processing, lecture notes in computer science, Vol. 6260. Berlin: Springer International Publishing.
Pretorius, R., Berg, A., & Pretorius, L. (2012). Towards a computational morphological analysis of Setswana compounds. Southern African Linguistics and Applied Language Studies 30(2).
Pretorius, R., Viljoen, B., & Pretorius, L. (2005). A finite-state morphological analysis of Tswana nouns. South African Journal of African Languages.
Pretorius, R.S. (1997). Auxiliary Verbs as a Subcategory of the Verb in Tswana. Ph.D. thesis, Potchefstroom.
Pretorius, R. S. (2003). Absolute tense forms in Tswana. Journal for Language Teaching, 37(1), 13–25.
Ranta, A. (2009). The GF resource grammar library. : Linguistic Issues in Language Technology.
Ranta, A. (2011). Grammatical framework: Programming with multilingual grammars. Stanford: CSLI.
Ranta, A. (2014). Embedded controlled languages. In Davis, B., Kaljurand, K., Kuhn, T. (Ed.), Controlled Natural Language, Lecture Notes in Computer Science (Vol. 8625, pp. 1–7). Berlin: Springer International Publishing.
Ranta, A., Enache, R., & Détrez, G. (2012). Controlled language for everyday use: the molto phrasebook. In Controlled Natural Language, pp. 115–136. Berlin: Springer.
Safwat, H., & Davis, B. (2014). A brief state of the art of CNLs for ontology authoring. In Davis, B., Kaljurand, K., Kuhn, T. (Ed.), Controlled Natural Language, Lecture Notes in Computer Science (Vol. 8625, pp. 1–7). Berlin: Springer International Publishing.
Sharma Grover, A., van Huyssteen, G.B., & Pretorius, M.W. (2010). The South African Human Language Technologies Audit. In Proceedings of the 7th Language Resource and Evaluation Conference.
Statistics South Africa (2011). Census 2011. http://www.statssa.gov.za.html.
Van Wyk, E. B. (1966). Absolute tense forms in Tswana. Lingua, 17(2), 230–261.
Acknowledgments
The authors would like to thank four anonymous reviewers for their in-depth and constructive feedback towards improving the quality of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pretorius, L., Marais, L. & Berg, A. A GF miniature resource grammar for Tswana: modelling the proper verb. Lang Resources & Evaluation 51, 159–189 (2017). https://doi.org/10.1007/s10579-016-9341-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-016-9341-z
Keywords
- Natural language processing
- Computational morpho-syntax
- Tswana verb structure
- Grammatical Framework (GF)
- Multilingual controlled natural language (CNL)