Skip to main content
Log in

Information structure in African languages: corpora and tools

  • Original paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript


In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. Tense/ Aspect/ Modality, cf. the discussion of auxiliary focus in Hyman and Watters (1984).

  2. We use the open source database management system PostgreSQL (

  3. In the Hausar Baka corpus, nominal chunks are currently not annotated, so \( {\mathsf{CHUNK=}}``{\mathsf{NC}}\text{''}\) substitutes for a variety of templates matching nominal chunks.


  • Brants, T., & Plaehn, O. (2000). Interactive corpus annotation. In Proceedings of the second international conference on language resources and evaluation (LREC-2000) (pp. 453–459). Athens, Greece.

  • Busemann, A., & Busemann, K. (2008). Toolbox self-training. tech. rep., Summer Institute of Linguistics (SIL). (Version 1.5.4 Oct 2008).

  • Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics and point of view. In C. N. Li (Ed.) Subject and topic (pp. 27–55). Academic Press, New York.

    Google Scholar 

  • Chiarcos, C., Dipper, S., Götze, M., Leser, U., Lüdeling, A., Ritz, J., & Stede, M. (2008). A flexible framework for integrating annotations from different tools and tag sets. Traitement Automatique des Langues, 49(2), 271–293.

    Google Scholar 

  • Crysmann, B. (2009). Autosegmental representations in an HPSG of Hausa. In Proceedings of the ACL-IJCNLP workshop on grammar engineering across frameworks (GEAF 2009) (pp. 28–36). Singapore.

  • Dipper, S. (2005). XML-based Stand-off representation and exploitation of multi-level linguistic annotation. In R. Eckstein & R. Tolksdorf (Eds.), Proceedings of Berliner XML tage (pp. 39–50).

  • Dipper, S., & Götze, M. (2005). Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization. In Proceedings of the 2nd language and technology conference 2005 (pp. 23–30). Poznan, Poland.

  • Dipper, S., Götze, M., & Skopeteas, S. (Eds.) (2007). Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics, and information structure. Interdisciplinary Studies on Information Structure 7. Potsdam: Universitätsverlag Potsdam.

  • Fiedler, I. (2009). Contrastive topic marking in Gbe. In Current issues in unity and diversity of languages. Collection of papers selected from the CIL 18 (pp. 295–308). Seoul: The Linguistic Society of Korea.

  • Fiedler, I., Hartmann, K., Reineke, B., Schwarz, A., & Zimmermann, M. (2010). Subject Focus in West African Languages. In M. Zimmermann & C. Féry (Eds.), Information structure theoretical, typological, and experimental perspectives (pp. 234–257). Oxford: Oxford University Press.

    Google Scholar 

  • Green, M., & Jaggar, P. (2003). Ex-situ and in-situ focus in Hausa: syntax, semantics and discourse. In J. Lecarme (Ed.), Research in Afroasiatic grammar 2 (current issues in linguistic theory) (pp. 187–213). Amsterdam: John Benjamins.

    Google Scholar 

  • Hartmann, K., & Zimmermann, M. (2007a). Focus strategies in Chadic: The case of tangale revisited. Studia Linguistica, 61(2), 95–129.

    Article  Google Scholar 

  • Hartmann, K., & Zimmermann, M. (2007b). In place—Out of place? Focus in Hausa. In K. Schwabe & S. Winkler (Eds.), On information structure, meaning and form: Generalizing across languages (pp. 365–403). Benjamins: Amsterdam.

    Google Scholar 

  • Hartmann, K., & Zimmermann, M. (2009). Morphological focus marking in Gùrùntùm (West Chadic). Lingua, 119(9), 1340–1365.

    Article  Google Scholar 

  • Hellwig, B., Van Uytvanck, D., & Hulsbosch, M. (2008). ELAN Linguistic annotator. Tech. rep., Max Planck Institute. (June 13, 2011).

  • Hyman, L., & Watters, J. (1984). Auxiliary focus. Studies in African Linguistics, 15, 233–273.

    Google Scholar 

  • Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica, 55, 243–76.

    Article  Google Scholar 

  • Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods (pp. 197–214). Frankfurt: Peter Lang.

    Google Scholar 

  • Newman, P. (2000). The Hausa language. An encyclopedic reference grammar. Interdisciplinary studies on information structure 4. New Haven: Yale University Press.

    Google Scholar 

  • O’Donnell, M. (2000). RSTTool 2.4—A markup tool for rhetorical structure theory. In Proceedings of the international natural language generation conference (INLG’2000) (pp. 253–256). Mitzpe Ramon, Israel.

  • Orasan, C. (2003). PALinkA: a highly customisable tool for discourse annotation. In Proceedings of the 4th SIGdial workshop on discourse and dialogue (pp. 39–43). Sapporo, Japan.

  • Randell, R., Bature, A., & Schuh, R. (1998). Hausar Baka. (June 13, 2011).

  • Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC-workshop on XML based richly annotated corpora, Lisbon 2004 (pp. 69–74). Paris: ELRA.

  • Schwarz, A. (2010). Verb-and-predication focus markers in Gur. In I. Fiedler & A. Schwarz (Eds.) The expression of information structure. A documentation of its diversity across Africa. (Typological Studies in Language 91) (pp. 287–314). Amsterdam Philadelphia: John Benjamins.

    Google Scholar 

  • Schwarz, A., & Fiedler, I. (2007). Narrative focus strategies in Gur and Kwa. In E. Aboh, K. Hartmann, & M. Zimmermann (Eds.), Focus strategies in African languages. The interaction of focus and grammar in Niger-Congo and Afro-Asiatic(pp. 267–286). Berlin: Mouton de Gruyter.

    Google Scholar 

  • Skopeteas, S., Fiedler, I., Hellmuth, S., Schwarz, A., Stoel, R., Fanselow, G., Féry, C., & Krifka, M. (2006). Questionnaire on information structure (QUIS). Interdisciplinary studies on information structure 4. Potsdam: Universitätsverlag Potsdam.

    Google Scholar 

  • Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn). San Francisco: Morgan Kaufman.

    Google Scholar 

  • Zeldes, A., Ritz, J., Lüdeling, A., & Chiarcos, C. (2009). A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009. Liverpool, UK.

  • Zimmermann, M. (2008). Contrastive focus and emphasis. Acta Linguistica Hungarica, 55, 347–360.

    Article  Google Scholar 

  • Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the workshop on language resource and language technology standards, LREC 2010 (pp. 7–18). Malta.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Julia Ritz.

Additional information

The Collaborative Research Centre 632 “Information Structure: the linguistic means for structuring utterances, sentences and texts” is funded by the German Research Foundation. The project associations are as follows: A5 (Focus from a cross-linguistic perspective, Mira Grubic, Malte Zimmermann), B1 (Gur and Kwa languages, Ines Fiedler, Katharina Hartmann, Anne Schwarz), B2 (Chadic languages, Katharina Hartmann), D1 (Linguistic database, Christian Chiarcos, Julia Ritz, Amir Zeldes).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiarcos, C., Fiedler, I., Grubic, M. et al. Information structure in African languages: corpora and tools. Lang Resources & Evaluation 45, 361–374 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI: