Information structure in African languages: corpora and tools


In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.

Fig. 1
Fig. 2
Fig. 3


    Tense/ Aspect/ Modality, cf. the discussion of auxiliary focus in Hyman and Watters (1984).

    We use the open source database management system PostgreSQL (

    In the Hausar Baka corpus, nominal chunks are currently not annotated, so \( {\mathsf{CHUNK=}}``{\mathsf{NC}}\text{''}\) substitutes for a variety of templates matching nominal chunks.


Chiarcos, C., Fiedler, I., Grubic, M. et al. Information structure in African languages: corpora and tools. Lang Resources & Evaluation 45, 361–374 (2011).

  • African language resources
  • Pragmatics
  • Corpus search infrastructure